furas.pl
# prywatne notatki - Python, Linux, Machine Learning, etc.

BeautifulSoup: get text from tag [GB]

There are different functions to get text from tag.

.text - all text from tag and subtags

.string - only if there is no subtags

.get_text(strip, separator) - you can remove whitespaces and add separators which can be used to split data into list.

from bs4 import BeautifulSoup as BS

soup = BS('''<tag>text
<a>link</a>
other</tag>''', "html.parser")

data = soup.find('tag')
print(data)
print('-----------')
print('    text:', data.text)
print('  string:', data.string)
print('get_text:', data.get_text(strip=False))
print('get_text:', data.get_text(strip=True))
print('get_text:', data.get_text(strip=True, separator='|'))
print('get_text:', data.get_text(strip=True, separator='|').split('|'))
print('-----------')
print('    a.text:', data.a.text)
print('  a.string:', data.a.string)
print('a.get_text:', data.a.get_text(strip=False))
print('a.get_text:', data.a.get_text(strip=True))
print('a.get_text:', data.a.get_text(strip=True, separator='|'))
print('a.get_text:', data.a.get_text(strip=True, separator='|').split('|'))
print('-----------')

Result:

<tag>text
<a>link</a>
other</tag>
-----------
    text: text
link
other
  string: None
get_text: text
link
other
get_text: textlinkother
get_text: text|link|other
get_text: ['text', 'link', 'other']
-----------
    a.text: link
  a.string: link
a.get_text: link
a.get_text: link
a.get_text: link
a.get_text: ['link']
-----------
Książki: python-dla-kazdego-podstawy-programowania python-wprowadzenie python-leksykon-kieszonkowy python-receptury python-programuj-szybko-i-wydajnie python-projekty-do-wykorzystania black-hat-python-jezyk-python-dla-hackerow-i-pentesterow efektywny-python-59-sposobow-na-lepszy-kod tdd-w-praktyce-niezawodny-kod-w-jezyku-python aplikacje-internetowe-z-django-najlepsze-receptury