Scraping: How to download tgz file from eogauth.mines.edu.
It it problem from Stackoverflow.
The main problem was wrong url used in POST.
Often form sends data to the same url as page with form but it doesn't have to be true on all pages.
Form may send data to different url which can be defined as action in HTML <form action=...>
I use BeautifulSoup to get this information from HTML.
I don't have username and password to test all elements but at least now POST gets page with login form and message Invalid username or password. instead of page with message Invalid Request.
import requests from bs4 import BeautifulSoup as BS s = requests.Session() #s.headers.update({'User-Agent': 'Mozilla/5.0'}) # --- use tgz to get login page ------- url_tgz = "https://eogdata.mines.edu/wwwdata/viirs_products/dnb_composites/v10//201707/vcmslcfg/SVDNB_npp_20170701-20170731_75N060W_vcmslcfg_v10_c201708061200.tgz" r = s.get(url_tgz) #print(r.status_code) #print(r.history) print('\n--- url page ---\n') print(r.url) # --- find url in form --- soup = BS(r.text, 'html.parser') item = soup.find('form') url = item['action'] print('\n--- url form ---\n') print(url) print('\n--- url page == url page ---\n') print( r.url == url ) # --- login --- payload = { 'username': 'salvandi69@gmail.com', 'password': '123asdzxc', 'credentialId': '', } r = s.post(url, data=payload) #print(r.status_code) #print(r.history) #print(r.url) #print(r.text) # --- result --- print('\n--- login ---\n') soup = BS(r.text, 'html.parser') item = soup.find('span', {'class': 'kc-feedback-text'}) if item: print('Message:', item.text) else: print("Can't see error message") print('\n--- end ---\n')
Notes:
Stackoverflow: Title