Articles for tag: selenium

Search on blog:

Python: How to scrape cnmv.es with requests, BS

It is example code to scrape it:

# author: https://blog.furas.pl
# date: 2020.08.04
# link: https://stackoverflow.com/questions/63246707/python-scraping-create-payload-cnmv-es-and-render-javascript/

import requests
from bs4 import BeautifulSoup

url = 'https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx' # '?lang=en'
search_text = 'aaa' # 'abc'

r = requests.get(url)
#print(response.text …

read more | czytaj więcej

Python: How to scrape cnnvd.org.cn with requests, BS

It is example code to scrape it:

#!/usr/bin/env python3

#
# https://stackoverflow.com/a/47940659/1832058
#

from bs4 import BeautifulSoup
import requests

link = "http://www.cnnvd.org.cn/web/vulnerability/querylist.tag"

req = requests.get(link)
web = req.text
soup = BeautifulSoup(web, "lxml")

cve_name = []
cve_link = []

for par_ in soup …

read more | czytaj więcej

Python: How to scrape coinbase.com with requests, BS

It is example code to scrape it:

#!/usr/bin/env python3 

# date: 2019.12.02
# https://stackoverflow.com/questions/59132449/what-is-the-proper-syntax-for-find-in-bs4

import requests
from bs4 import BeautifulSoup

url = 'https://www.coinbase.com/charts'
r = requests.get(url, headers=headers)

soup = BeautifulSoup(r.text, 'html.parser')

all_tr = soup.find_all('tr')

data …

read more | czytaj więcej

Python: How to scrape coinmarketcap.com (1) with requests

It is example code to scrape it:

import requests
import datetime
import csv

start_date = '2016.01.01'
finish_date = '2017.01.01'

start_date = datetime.datetime.strptime(start_date, '%Y.%m.%d')
finish_date = datetime.datetime.strptime(finish_date, '%Y.%m.%d')

start_timestamp = int(start_date.timestamp() * 1000)
one_day = datetime.timedelta(days=1)
finish_timestamp = int(finish_date …

read more | czytaj więcej

Python: How to scrape coinmarketcap.com (2) with requests, lxml

It is example code to scrape it:

# date: 2019.05.09
# author: Bartłomiej 'furas' Burek
# https://stackoverflow.com/questions/56059703/how-can-i-make-lxml-save-two-pages-to-the-pages-so-it-can-be-read-by-the-tree

from lxml import html
import requests

data = {
    'BTC': 'id-bitcoin',
    'TRX': 'id-tron',
    # ...
    'HC': 'id-hypercash',
    'XZC': 'id-zcoin',
}

all_results = {}

for url in ('https://coinmarketcap.com/', 'https://coinmarketcap.com/2'):
    page = requests.get …

read more | czytaj więcej

Python: How to scrape coinmarketcap.com (3) with pandas

It is example code to scrape it:

# author: https://blog.furas.pl
# date: 2020.07.25
# link: https://stackoverflow.com/questions/63075215/read-html-where-required-table-needs-users-input/

import pandas as pd

all_dfs = pd.read_html('https://coinmarketcap.com/exchanges/bitfinex/')

df = all_dfs[2]

df[ df['Pair'].str.endswith('USD') ]

read more | czytaj więcej

Python: How to scrape collegiate-ac.com with scrapy

It is example code to scrape it:

#!/usr/bin/env python3

#
# https://stackoverflow.com/a/47729218/1832058
#

import scrapy

class CollegiateSpider(scrapy.Spider):

    name = 'Collegiate'

    allowed_domains = ['collegiate-ac.com']

    start_urls = ['https://collegiate-ac.com/uk-student-accommodation/']

    # Step 1 - Get the area links

    def parse(self, response):
        for url in response.xpath('//*[@id="top …

read more | czytaj więcej

Python: How to scrape comics.panini.it with scrapy

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2019.08.06
# https://stackoverflow.com/questions/57366488/how-to-pass-the-single-link-in-a-nested-url-scrape

import scrapy

def clean(text):
    text = text.replace('\xa0', ' ')
    text = text.strip().split('\n')
    text = ' '.join(x.strip() for x in text)
    return text

class PaniniSpider(scrapy.Spider):

    name …

read more | czytaj więcej

Python: How to scrape corporate.dow.com with selenium

It is example code to scrape it:

#!/usr/bin/env python3 

# date: 2019.11.24
# https://stackoverflow.com/questions/59019810/python-web-scraping-ahref-link-and-articles-not-showing-up-in-source-code

import selenium.webdriver

url = 'https://corporate.dow.com/en-us/news.html'
driver = selenium.webdriver.Firefox()
driver.get(url)

all_items = driver.find_elements_by_xpath('//ul[@class="results__list"]/li')
for item in all_items …

read more | czytaj więcej

Python: How to scrape coursetalk.com with scrapy

It is example code to scrape it:

#!/usr/bin/env python3

#
# https://stackoverflow.com/a/48017689/1832058
#

import scrapy

class MySpider(scrapy.Spider):

    name = 'myspider'

    start_urls = ['https://www.coursetalk.com/subjects/data-science/courses']

    def parse(self, response):
        print('url:', response.url)

        for item in response.xpath('.//*[@class="as-table-cell"]/a/@href …

read more | czytaj więcej

Python: How to scrape craigslist.org with requests

It is example code to scrape it:

#!/usr/bin/env python3

#
# https://stackoverflow.com/a/47720827/1832058
# 

import requests
from bs4 import BeautifulSoup
import csv

filename = "output.csv"

f = open(filename, 'w', newline="", encoding='utf-8')

csvwriter = csv.writer(f)

csvwriter.writerow( ["Date", "Location", "Title", "Price"] )

offset = 0

while True:
    print …

read more | czytaj więcej

Python: How to scrape curecity.in with selenium

It is example code to scrape it:

#!/usr/bin/env python3 

# date: 2019.12.18
# https://stackoverflow.com/questions/59386434/selenium-webdriver-i-want-to-click-on-the-next-page-till-last-page/59387563#59387563

from selenium import webdriver
#from bs4 import BeautifulSoup as bs
import time

url = 'https://curecity.in/vendor-list.php?category=Doctor&filters_location=Jaipur&filters%5Bsubareas_global%5D=&filters_speciality='

#driver …

read more | czytaj więcej

Python: How to scrape data.gov with requests

It is example code to scrape it:

#
# https://api.data.gov/
# https://regulationsgov.github.io/developers/basics/
#
# https://stackoverflow.com/a/48030949/1832058
#

import requests
import json
import time

all_titles = ['EPA-HQ-OAR-2013-0602']

api_key = 'PB36zotwgisM02kED1vWwvf7BklqCObDGVoyssVE'
api_base='https://api.data.gov/regulations/v3/'

api_url = '{}docket.json?api_key={}&docketId='.format(api_base, api_key)

try:
    for …

read more | czytaj więcej

Python: How to scrape deezer.com with requests

It is example code to scrape it:

import requests
from bs4 import BeautifulSoup
import json

base_url = 'https://www.deezer.com/en/profile/1589856782/loved'

r = requests.get(base_url)

soup = BeautifulSoup(r.text, 'html.parser')

all_scripts = soup.find_all('script')

data = json.loads(all_scripts[6].get_text()[27:])

print('key:', data.keys())
print …

read more | czytaj więcej

Python: How to scrape doctor.webmd.com with scrapy

It is example code to scrape it:

#!/usr/bin/env python3

import scrapy

class MySpider(scrapy.Spider):

    name = 'myspider'

    #allowed_domains = [''link'']
    start_urls = ['https://doctor.webmd.com/find-a-doctor/specialty/psychiatry/arizona/phoenix?pagenumber=1']

    def parse(self, response):

        doctors_urls =  (response.xpath('//*[@class="doctorName"]//@href').extract())

        for doctor in doctors_urls:
            doctor = response …

read more | czytaj więcej

Python: How to scrape dps.psx.com.pk with selenium

It is example code to scrape it:

#!/usr/bin/env python3 

# date: 2019.11.23
# https://stackoverflow.com/questions/59008770/want-to-read-a-tag-data-using-selenium

from selenium import webdriver

driver = webdriver.Firefox()
driver.get('https://dps.psx.com.pk/')

last_table = driver.find_elements_by_xpath("//table")[-1]

for row in last_table.find_elements_by_xpath(".//tr")[1:]:
    print(row.find_element_by_xpath …

read more | czytaj więcej

Python: How to scrape drugbank.ca with requests

It is example code to scrape it:

#
# https://stackoverflow.com/a/47716786/1832058
#
# https://stackoverflow.com/a/48116666/1832058
#

import requests
from bs4 import BeautifulSoup

def get_details(url):
    print('details:', url)

    # get subpage
    r = requests.get(url)
    soup = BeautifulSoup(r.text ,"lxml")

    # get data on subpabe
    dts = soup.findAll('dt …

read more | czytaj więcej

Python: How to scrape drugeye.pharorg.com with requests

It is example code to scrape it:

# date: 2019.09.09
# link: https://stackoverflow.com/questions/57856461/python-run-search-function-on-net-web-page

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0'}

r = requests.get('http://www.drugeye.pharorg.com/', headers=headers)
soup = BeautifulSoup(r.text,'lxml')

payload = {
    'ttt': 'asd',
    'b1': 'wait...',
    'Passgenericname …

read more | czytaj więcej

Python: How to scrape e-turysta.pl with requests, BS

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.01.10
# https://stackoverflow.com/questions/59674049/multiple-pages-web-scraping-with-python-and-beautiful-soup/

import requests
from bs4 import BeautifulSoup # HTML data structure
import pandas as pd

def get_page_data(number):
    print('number:', number)

    url = 'https://e-turysta.pl/noclegi-krakow/?page={}'.format(number)
    response = requests …

read more | czytaj więcej

Python: How to scrape ec.europa.eu with requests, BS

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.01.10
# https://stackoverflow.com/questions/59674921/how-can-i-scrape-image-url-from-this-website/

import requests
from bs4 import BeautifulSoup as BS

s = requests.Session()

url = 'https://ec.europa.eu/taxation_customs/dds2/ebti/ebti_consultation.jsp?Lang=en&Lang=en&refcountry=&reference=&valstartdate=&valstartdateto …

read more | czytaj więcej

« Page: 3 / 12 »