Search on blog:

Scraping: Jak pobrać plik tgz ze strony eogauth.mines.edu.

Oto problem ze Stackoverflow.

Głównym problemem był zły url użyty w POST.

Często form wysyła dane to tego samego adresu jaki ma strona z tym formularze ale nie musi tak być na każdej stronie.

Formularz może wysyłać dane pod inny adres, który jest zdefiniowany jako action w HTML <form action=...>

W tym przykładzie użyłem BeautifulSoup aby dostać tą informację z HTML.

Nie miałem username i password aby przetestować wszystkie elementy ale przynajmniej teraz POST otrzymuje stronę z formularze logowania i wiadomością Invalid username or password. zamiast strony z wiadomością Invalid Request.

import requests
from bs4 import BeautifulSoup as BS

s = requests.Session()
#s.headers.update({'User-Agent': 'Mozilla/5.0'})

# --- use tgz to get login page -------

url_tgz = "https://eogdata.mines.edu/wwwdata/viirs_products/dnb_composites/v10//201707/vcmslcfg/SVDNB_npp_20170701-20170731_75N060W_vcmslcfg_v10_c201708061200.tgz"

r = s.get(url_tgz)
#print(r.status_code)
#print(r.history)
print('\n--- url page ---\n')
print(r.url)

# --- find url in form ---

soup = BS(r.text, 'html.parser')
item = soup.find('form')
url = item['action']

print('\n--- url form ---\n')
print(url)

print('\n--- url page == url page ---\n')
print( r.url == url )

# --- login ---

payload = {
    'username': 'salvandi69@gmail.com',
    'password': '123asdzxc',
    'credentialId': '',
}

r = s.post(url, data=payload)
#print(r.status_code)
#print(r.history)
#print(r.url)
#print(r.text)

# --- result ---

print('\n--- login ---\n')
soup = BS(r.text, 'html.parser')
item = soup.find('span', {'class': 'kc-feedback-text'})
if item:
    print('Message:', item.text)
else:
    print("Can't see error message")

print('\n--- end ---\n')

Notatki:

Stackoverflow: Title

If you like it

Buy a Coffee