Python: How to download PDF on US Department of Commerce using requests.
There is button "View attachment file" on page
https://232app.azurewebsites.net/Forms/ExclusionRequestItem/800
which can download PDF.
Problem is that this button doesn't have href with direct link to PDF but it uses JavaScript to get it.
First idea is to use [Selenium](https://selenium-python.readthedocs.io/) to download it but using DevTools (tab: Network, option: Presist Logs) in Firefox you can see that this button first send request to url
https://232app.azurewebsites.net/Forms/ExclusionRequestItem/800?handler=DownloadDM&ID=800
and it gets JSON data with "downloadURL" which gives url to PDF.
So using requests with first url we can get JSON data with url to PDF and then we can use again requests to download PDF.
import requests import webbrowser number = 800 url = f'https://232app.azurewebsites.net/Forms/ExclusionRequestItem/{number}?handler=DownloadDM&ID={number}' r = requests.get(url) data = r.json() print('url:', data["downloadURL"]) filename = f'output-{number}.pdf' r = requests.get(data["downloadURL"]) with open(filename, 'wb') as fh: fh.write(r.content) # open PDF in default program webbrowser.open(filename)
Using number different then 800 you can download PDF from other pages.
Notes:
Stackoverflow: Python Webscrape: hidden strange url link that is not available in page source
Buy a Coffee