Articles for tag: scrapy

Search on blog:

Python: How to scrape aastocks.com with requests

It is example code to scrape it:

# date: 2019.09.16
# https://stackoverflow.com/questions/57861715/scrapy-infinite-scrolling-no-pagination-indication
# http://www.aastocks.com
import requests

newstime = '934735827'
newsid = 'HKEX-EPS-20190815-003587368'

url = 'http://www.aastocks.com/tc/resources/datafeed/getmorenews.ashx?cat=all&newstime={}&newsid={}&period=0&key=&symbol=00001'
url_artickle = "http://www.aastocks.com/tc/stocks/analysis/stock-aafn-con/00001/{}/all"

for x in range(3):

    print('---', x, '----')
    print('data:', url.format(newstime, newsid))

    r = requests.get(url.format(newstime, newsid))
    data = r.json()

    #for item in data[:3]: # test only few links
    for item in data[:-1]: # skip last link which gets next page
        r = requests.get(url_artickle.format(item['id']))
        print('news:', r.status_code, url_artickle.format(item['id']))

    # get data for next page
    newstime = data[-1]['dtd']
    newsid = data[-1]['id']
    print('next page:', newstime, newsid)

Python: How to scrape allegro.pl with scrapy

It is example code to scrape it:

# date: 2017.12.10
# https://stackoverflow.com/a/47744135/1832058

import scrapy

#from allegro.items import AllegroItem

#class AllegroItem(scrapy.Item):
#    product_name = scrapy.Field()
#    product_sale_price = scrapy.Field()
#    product_seller = scrapy.Field()

class AllegroPrices(scrapy.Spider):

    name = "AllegroPrices"
    allowed_domains = ["allegro.pl"]

    start_urls = [
        "http://allegro.pl …

read more | czytaj więcej

Python: How to scrape alloschool.com with scrapy

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2019.07.29
# https://stackoverflow.com/questions/57245315/using-scrapy-how-to-download-pdf-files-from-some-extracted-links

import scrapy

class MySpider(scrapy.Spider):

    name = 'myspider'

    start_urls = [
          'https://www.alloschool.com/course/alriadhiat-alaol-ibtdaii',
    ]

    def parse(self, response):

        for link in response.css('.default .er').xpath('@href').extract …

read more | czytaj więcej

Python: How to scrape amazon.com (1) with requests, lxml

It is example code to scrape it:

import requests
from lxml import html
import json

# date: 2017.12.22
# https://stackoverflow.com/a/47935432/1832058

url = "http://www.amazon.com/dp/B008HDREZ6"

headers = {
  'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311 …

read more | czytaj więcej

Python: How to scrape amazon.com (2) with selenium

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.03.30

import selenium.webdriver

url = 'https://www.amazon.com/international-sales-offers/b/?ie=UTF8&node=15529609011&ref_=nav_cs_gb_intl'

driver = selenium.webdriver.Firefox()
driver.get(url)

for x in range(10):
    deal = driver.find_element_by_id('100_dealView_' + str(x))

    image …

read more | czytaj więcej

Python: How to scrape aopa.org with selenium

It is example code to scrape it:

# https://stackoverflow.com/questions/60601053/python-selenium-for-loop-iterates-through-entire-website/60601428

from selenium import webdriver
import time

driver = webdriver.Chrome()

#wait = WebDriverWait(driver, 10)

driver.get("https://www.aopa.org/destinations/airports/state/AL")
time.sleep(3)

airport_list = []
paved_runway = []

airport_row = driver.find_elements_by_xpath('//div[@class = "state-airports__airport"]')
print(len …

read more | czytaj więcej

Python: How to scrape api.weatherflow.com with requests

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.02.10
# 

import requests

url = 'https://api.weatherflow.com/wxengine/rest/model/getModelDataBySpot?model_id=-1&spot_id=110&units_wind=mph&units_temp=F&format=json&wf_apikey=84e778ae-fe8e-4b8f-8d33-6bc88967a2b1&wf_token=f147702351af100d7c220b633d085318&v=1.1'
r = requests.get(url)
data = r.json …

read more | czytaj więcej

Python: How to scrape apps.upenn.edu with selenium

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.02.26
# 

import selenium.webdriver

def scrape(last_name, first_name):        
    url = 'https://directory.apps.upenn.edu/directory/jsp/fast.do'

    driver = selenium.webdriver.Firefox()
    driver.get(url)

    inputs = driver.find_elements_by_tag_name('input')

    #for item in inputs:
    #    print(item.get_attribute …

read more | czytaj więcej

Python: How to scrape associatedrealtorsaruba.com with requests

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.01.07
# https://stackoverflow.com/questions/59632031/how-to-extract-href-when-href-element-is-a-hyperlink?noredirect=1#comment105434826_59632031

import requests
from bs4 import BeautifulSoup as BS

url = 'https://associatedrealtorsaruba.com/index.php?option=com_ezrealty&Itemid=11&task=results&cnid=0&custom7=&custom8=&parking=&type …

read more | czytaj więcej

Python: How to scrape associatedrealtorsaruba.com with selenium

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.01.07
# https://stackoverflow.com/questions/59632031/how-to-extract-href-when-href-element-is-a-hyperlink?noredirect=1#comment105434826_59632031

import selenium.webdriver

url = 'https://associatedrealtorsaruba.com/index.php?option=com_ezrealty&Itemid=11&task=results&cnid=0&custom7=&custom8=&parking=&type=0&cid=0&stid=0 …

read more | czytaj więcej

Python: How to scrape ausrealtimefueltype.global-roam.com with requests

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.01.17
# https://stackoverflow.com/questions/59779978/python-requests-output-is-different-to-expected-output/

import requests

headers = {'User-Agent': 'Mozilla/5.0'}

url = 'https://ausrealtimefueltype.global-roam.com/api/SeriesSnapshot?time='

r = requests.get(url,  headers=headers)
data = r.json()

for item in data['seriesCollection …

read more | czytaj więcej

Python: How to scrape automationpractice.com with selenium

It is example code to scrape it:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException, TimeoutException
import time

try …

read more | czytaj więcej

Python: How to scrape avanza.se with bank with requests

It is example code to scrape it:

import requests
from bs4 import BeautifulSoup

def display(content, filename='output.html'):
    with open(filename, 'w') as f:
        f.write(content)
    webbrowser.open(filename)

session = requests.Session()
session.headers.update({'USER-AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:57.0) Gecko/20100101 …

read more | czytaj więcej

Python: How to scrape bankier.pl with requests

It is example code to scrape it:

import requests
import datetime
import time

# https://www.bankier.pl/inwestowanie/profile/quote.html?symbol=CDPROJEKT

def one_day(symbol):

    print('Symbol:', symbol)

    # jeden dzien
    url = f'https://www.bankier.pl/new-charts/get-data\
?symbol={symbol}\
&intraday=true\
&today=true\
&type=area\
&init=true'

    r …

read more | czytaj więcej

Python: How to scrape basketball-reference.com with requests, BS

It is example code to scrape it:

# date: 2019.04.28
# author: Bartłomiej 'furas' Burek
# https://stackoverflow.com/a/55885909/1832058

import requests
from bs4 import BeautifulSoup
from bs4 import Comment

url = 'https://www.basketball-reference.com/players/b/bogutan01.html#advanced::none'

r = requests.get(url)

soup = BeautifulSoup(r.content …

read more | czytaj więcej

Python: How to scrape bcdental.org with requests with ASP.net

It is example code to scrape it:

#
# https://stackoverflow.com/a/48075115/1832058
# 

import requests
from bs4 import BeautifulSoup

url = 'https://www.bcdental.org/yourdentalhealth/findadentist.aspx'

# --- session ---

s = requests.Session() # to automatically copy cookies
#s.headers.update({'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:57.0) Gecko …

read more | czytaj więcej

Python: How to scrape bing.com with requests, BS

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.01.07
# ???

from bs4 import BeautifulSoup
import requests
#import webbrowser

#s = requests.Session()

#headers = {
#    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:73.0) Gecko/20100101 Firefox/73.0'
#}

#response = s.get("https://www.bing.com", headers …

read more | czytaj więcej

Python: How to scrape bit.do with requests

It is example code to scrape it:

# date: 2019.04.21
# https://stackoverflow.com/a/55778640/1832058

import requests

# not need Sessions
s = requests.Session()
s.headers.update({
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Encoding': 'gzip, deflate',
    'Accept-Language': 'pl,en-US;q=0 …

read more | czytaj więcej

Python: How to scrape blockchain.info with requests

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.05.18
# https://stackoverflow.com/questions/61858764/is-there-an-easy-way-to-access-all-transactions-recorded-in-a-bitcoin-block-with/
# 
# https://www.blockchain.com/api/blockchain_api

import requests

r = requests.get('https://blockchain.info/block-height/100?format=json')
data = r.json()

#print(r.text)
#print(data)
print(data['blocks …

read more | czytaj więcej

Python: How to scrape blog.prepscholar.com with urlib, BS, pandas

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.02.26
# https://stackoverflow.com/questions/60407196/creating-csv-spreadsheets-from-web-tables-acquired-through-beautifulsoup

# with pandas 

import pandas as pd

all_tables = pd.read_html('https://blog.prepscholar.com/act-to-sat-conversion')
all_tables[0].to_csv("output1.csv")
all_tables[1].to_csv("output2.csv") 

# with BeautifulSoup it would need …

read more | czytaj więcej

« Page: 1 / 11 »