Search on blog:

Python: How to download PDF on US Department of Commerce using requests.

There is button "View attachment file" on page

https://232app.azurewebsites.net/Forms/ExclusionRequestItem/800

which can download PDF.

Problem is that this button doesn't have href with direct link to PDF but it uses JavaScript to get it.

First idea is to use [Selenium](https://selenium-python.readthedocs.io/) to download it but using DevTools (tab: Network, option: Presist Logs) in Firefox you can see that this button first send request to url

https://232app.azurewebsites.net/Forms/ExclusionRequestItem/800?handler=DownloadDM&ID=800

and it gets JSON data with "downloadURL" which gives url to PDF.

So using requests with first url we can get JSON data with url to PDF and then we can use again requests to download PDF.

import requests
import webbrowser

number = 800

url = f'https://232app.azurewebsites.net/Forms/ExclusionRequestItem/{number}?handler=DownloadDM&ID={number}'

r = requests.get(url)
data = r.json()
print('url:', data["downloadURL"])

filename = f'output-{number}.pdf'

r = requests.get(data["downloadURL"])
with open(filename, 'wb') as fh:
    fh.write(r.content)

# open PDF in default program
webbrowser.open(filename)

Using number different then 800 you can download PDF from other pages.

Notes:

Stackoverflow: Python Webscrape: hidden strange url link that is not available in page source

If you like it

Buy a Coffee