Articles for tag: beautifulsoup

Search on blog:

Scraping: How to download tgz file from eogauth.mines.edu.

It it problem from Stackoverflow.

The main problem was wrong url used in POST.

Often form sends data to the same url as page with form but it doesn't have to be true on all pages.

Form may send data to different url which can be defined as action in HTML <form action=...>

I use BeautifulSoup to get this information from HTML.

I don't have username and password to test all elements but at least now POST gets page with login form and message Invalid username or password. instead of page with message Invalid Request.

import requests
from bs4 import BeautifulSoup as BS

s = requests.Session()
#s.headers.update({'User-Agent': 'Mozilla/5.0'})

# --- use tgz to get login page -------

url_tgz = "https://eogdata.mines.edu/wwwdata/viirs_products/dnb_composites/v10//201707/vcmslcfg/SVDNB_npp_20170701-20170731_75N060W_vcmslcfg_v10_c201708061200.tgz"

r = s.get(url_tgz)
#print(r.status_code)
#print(r.history)
print('\n--- url page ---\n')
print(r.url)

# --- find url in form ---

soup = BS(r.text, 'html.parser')
item = soup.find('form')
url = item['action']

print('\n--- url form ---\n')
print(url)

print('\n--- url page == url page ---\n')
print( r.url == url )

# --- login ---

payload = {
    'username': 'salvandi69@gmail.com',
    'password': '123asdzxc',
    'credentialId': '',
}

r = s.post(url, data=payload)
#print(r.status_code)
#print(r.history)
#print(r.url)
#print(r.text)

# --- result ---

print('\n--- login ---\n')
soup = BS(r.text, 'html.parser')
item = soup.find('span', {'class': 'kc-feedback-text'})
if item:
    print('Message:', item.text)
else:
    print("Can't see error message")

print('\n--- end ---\n')

Notes:

Stackoverflow: Title

Scraping: Jak pobrać plik tgz ze strony eogauth.mines.edu.

Oto problem ze Stackoverflow.

Głównym problemem był zły url użyty w POST.

Często form wysyła dane to tego samego adresu jaki ma strona z tym formularze ale nie musi tak być na każdej stronie.

Formularz może wysyłać dane pod inny adres, który jest zdefiniowany jako action w HTML <form action …

read more | czytaj więcej

Python: How to find element next after (previous before) another element with BeautifulSoup.

BeautifulSoup has many functions to search elements - not only find() and find_all() but also

It can also search in other direction using

It has also attributes (for single element)

and iterators (for many elements)

which can work different …

read more | czytaj więcej

Python: Jak w BeautifulSoup znaleść element występujący za (lub przed) innym elementem.

BeautifulSoup ma wiele funkcji do szukania elementów - nie tylko find() i find_all() ale także

Może on też szukać w przeciwnym kierunku używając

Ma także atrybuty (dla pogrania pojedyńczego elementu)

i iteratorory (dla pogrania wielu elementów)

które mogą działać …

read more | czytaj więcej

Scraping: How to use regular expression in BeautifulSoup to scrape Nobel Laureats from table in Wikipedia

I wanted to try to use regex to get links to laureats in table on page List of Nobel Memorial Prize laureates in Economics

First I tried to use r'^/wiki/[A-Z][a-z]*_[A-Z][a-z]*$') because links looks like

/wiki/Paul_Krugman

but this gets also links like

/wiki/United_States …

read more | czytaj więcej

Scraping: Jak użyć wyrażenia regularnego w BeautifulSoup aby pobrać Laureatów Nobla z tabeli w Wikipedii

Chciałem użyć wyrażenia regularnego do pobrania linków do laureatów w tabeli na stronie List of Nobel Memorial Prize laureates in Economics

Najpierw próbowałem użyć r'^/wiki/[A-Z][a-z]*_[A-Z][a-z]*$') ponieważ wyglądało, że linki mają postać

/wiki/Paul_Krugman

ale okazało się, że to znajduje także linki postaci

/wiki/United_States …

read more | czytaj więcej

BeautifulSoup: How to get text from tag

There are different functions to get text from tag.

.text - all text from tag and subtags

.string - only if there is no subtags

.get_text(strip, separator) - you can remove whitespaces and add separators which can be used to split data into list.

from bs4 import BeautifulSoup as BS

soup = BS …

read more | czytaj więcej

BeautifulSoup: Jak pobrać tekst z tagu

Jest kilka róznych funkcji do pobierania tekstu z tagu.

.text - cały tekst z tagu i podtagów

.string - tylko jeśli nie ma podtagów

.get_text(strip, separator) - można usunąć białe znaki i dodać separator, który może być użyty do podzielenia na listę.

from bs4 import BeautifulSoup as BS

soup = BS('''<tag>text …

read more | czytaj więcej

« Page: 1 / 1 »