BeautifulSoup: How to get text from tag
There are different functions to get text from tag.
.text
- all text from tag and subtags
.string
- only if there is no subtags
.get_text(strip, separator)
- you can remove whitespaces and add separators which can be used to split data into list.
from bs4 import BeautifulSoup as BS
soup = BS('''<tag>text
<a>link</a>
other</tag>''', "html.parser")
data = soup.find('tag')
print(data)
print('-----------')
print(' text:', data.text)
print(' string:', data.string)
print('get_text:', data.get_text(strip=False))
print('get_text:', data.get_text(strip=True))
print('get_text:', data.get_text(strip=True, separator='|'))
print('get_text:', data.get_text(strip=True, separator='|').split('|'))
print('-----------')
print(' a.text:', data.a.text)
print(' a.string:', data.a.string)
print('a.get_text:', data.a.get_text(strip=False))
print('a.get_text:', data.a.get_text(strip=True))
print('a.get_text:', data.a.get_text(strip=True, separator='|'))
print('a.get_text:', data.a.get_text(strip=True, separator='|').split('|'))
print('-----------')
Result:
<tag>text
<a>link</a>
other</tag>
-----------
text: text
link
other
string: None
get_text: text
link
other
get_text: textlinkother
get_text: text|link|other
get_text: ['text', 'link', 'other']
-----------
a.text: link
a.string: link
a.get_text: link
a.get_text: link
a.get_text: link
a.get_text: ['link']
-----------
If you like it
Buy a Coffee
Buy a Coffee