Search on blog:

Python: How to scrape howmanysyllables.com with selenium

It is example code to scrape it:

# author: https://blog.furas.pl
# date: 2020.07.08
# 

from selenium import webdriver

url = 'https://www.howmanysyllables.com/syllable_counter/'

# open browser
driver = webdriver.Firefox()

# load page
driver.get(url)

# find field 
item = driver.find_element_by_id('syl_input')

# put text
item.send_keys('Hello World')

# find button …

Python: How to scrape ikea.com

It is example code to scrape it:

#
# https://stackoverflow.com/a/47741611/1832058
#

import scrapy

class MySpider(scrapy.Spider):

    name = 'myspider'

    #allowed_domains = ['http://www.ikea.com']

    start_urls = ['http://www.ikea.com/ae/en/catalog/categories/departments/childrens_ikea/31772/']

    def parse(self, response):
        print('url:', response.url)

        all_products = response.css('div …

Python: How to scrape ikea.com with requests

It is example code to scrape it:

# date: 2019.04.07
# https://stackoverflow.com/questions/55541971/image-src-text-scrap-and-tablescrap-from-a-webpage-using-beautifulsoup/55542309?noredirect=1#comment97819263_55542309

#------------------------------------------------------------------------------

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.ikea.com/sa/en/catalog/products/00361049/")

soup = BeautifulSoup(r.text, "html.parser")

html = soup.select('div#productDimensionsContainer …

Python: How to scrape indeed.com

It is example code to scrape it:

#!/usr/bin/env python3

# 
# https://stackoverflow.com/a/48031565/1832058
# 

import scrapy

class MySpider(scrapy.Spider):

    name = 'myspider'

    start_urls = ['https://www.indeed.cl/trabajo?q=Data%20scientist&l=']

    def parse(self, response):
        print('url:', response.url)

        results = response.xpath('//h2[@class="jobtitle …

Python: How to scrape inshorts.com

It is example code to scrape it:

#!/usr/bin/env python3 

# date: 2019.11.29
# https://stackoverflow.com/questions/59109679/how-to-scrap-1000-news-from-https-inshorts-com-en-read-data-using-beautiful-so

import requests
from bs4 import BeautifulSoup as BS

# --- first page ---

r = requests.get('https://inshorts.com/en/read')
soup = BS(r.text, 'html.parser')

for item in soup.find_all …

Python: How to scrape investing.com with request, BS

It is example code to scrape it:

# date: 2020.09.11
# author: Bartłomiej "furas" Burek (https://blog.furas.pl)
# https://stackoverflow.com/questions/63840415/how-to-scrape-website-tables-where-the-value-can-be-different-as-we-chose-but-th

import requests
from bs4 import BeautifulSoup
import csv

url = 'https://id.investing.com/instruments/HistoricalDataAjax'

payload = {
    "curr_id": "8830",
    "smlID": "300004",
    "header": "Data+Historis+Emas+Berjangka …

Python: How to scrape ishares.com

It is example code to scrape it:

# date: 2019.05.10
# https://stackoverflow.com/questions/56070434/my-code-wrongfully-downloads-a-csv-file-from-an-url-with-python/56071844#56071844

import requests
from bs4 import BeautifulSoup

s = requests.Session()

url='https://www.ishares.com/uk/individual/en/products/251567/ishares-asia-pacific-dividend-ucits-etf?switchLocale=y&siteEntryPassthrough=true'

response = s.get(url, allow_redirects=True)

if …

Python: How to scrape kbb.com

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.06.05
# https://stackoverflow.com/questions/62211750/how-to-extract-text-from-svg-using-python-selenium/

# it needs IP in location US to display page
# I used free VPN https://windscribe.com/?affid=kez9ypcg with program installed on Linux Mint

from selenium import webdriver
import …

Python: How to scrape koinex.in with selenium with requests

It is example code to scrape it:

from selenium import webdriver
import time

# --- Selenium ---

url = 'https://koinex.in/'

driver = webdriver.Firefox()
driver.get(url)

time.sleep(8)

#tables = driver.find_elements_by_tag_name('table')
#for item in tables:
#    print(item.text)

# --- convert cookies/headers from Selenium to Requests ---

cookies = driver.get_cookies()

for item …

Python: How to scrape lastsecond.ir

It is example code to scrape it:

#!/usr/bin/env python3

#
# https://stackoverflow.com/a/47956427/1832058
# 

import scrapy
#from scrapy.commands.view import open_in_browser
import json

class MySpider(scrapy.Spider):

    name = 'myspider'

    #allowed_domains = []

    start_urls = ['https://lastsecond.ir/hotels']

    #def start_requests(self):
    #    self.url_template = http://quotes.toscrape.com/tag/{}/page …

Python: How to scrape legit.ng with requests

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.01.09
# 

import requests
from bs4 import BeautifulSoup

res = requests.get('https://www.legit.ng/1087216-igbo-proverbs-meaning.html')
soup = BeautifulSoup(res.content, 'html.parser')

data = []
for div in soup.find_all('div'):
    for block in div.find_all('blockquote'):
        text …

Python: How to scrape lequipe.fr

It is example code to scrape it:

#!/usr/bin/env python3

#
# https://stackoverflow.com/a/47761077/1832058
#

import scrapy

class MySpider(scrapy.Spider):

    name = 'myspider'

    allowed_domains = ['www.lequipe.fr']

    start_urls = ['http://www.lequipe.fr/Basket/RES_NBA.html']

    def parse(self, response):
        print('url:', response.url)

        for item in response.xpath …

Python: How to scrape letterboxd.com with requests

It is example code to scrape it:

#
# https://stackoverflow.com/a/47733374/1832058
#

import requests
from bs4 import BeautifulSoup

url = 'https://letterboxd.com/shesnicky/list/top-50-favourite-films/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

all_items = soup.find_all('div', {'data-target-link': True})

for item in all_items:
    print(item['data-target-link'])

Python: How to scrape lifestorage.com

It is example code to scrape it:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import csv

urls = [
    'https://www.lifestorage.com/storage-units/florida/orlando/32810/610-near-lockhart/?size=5x5'
]

filename = 'life_storage.csv'

f = open(filename, 'a+')
csv_writer = csv.writer(f) 

headers = ['unit_size', 'unit_type', 'description', 'online_price …

Python: How to scrape listado.mercadolibre.com.pe

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.04.23
# https://stackoverflow.com/questions/61376200/i-dont-get-all-the-product-description-data-with-scrapy/61377436#61377436

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
#from mercadolibre.items import MercadolibreItem

class MercadolibreperuSpider(CrawlSpider):
    name = 'mercadolibreperu'
    allowed_domains = ['mercadolibre.com.pe …

Python: How to scrape livescore.in

It is example code to scrape it:

# date: 2020.03.01
# https://stackoverflow.com/questions/60477459/how-to-scrape-table-from-livescore-in-using-python

import requests
from bs4 import BeautifulSoup as BS

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:75.0) Gecko/20100101 Firefox/75.0',
#    'Accept': '*/*',
#    'Accept-Language': '*',
#    'Accept-Encoding': 'gzip, deflate, br',
#    'X-Referer': 'https://www …

Python: How to scrape londonstockexchange.com with request, BS

It is example code to scrape it:

# author: Bartłomiej "furas" Burek (https://blog.furas.pl)
# date: 2020.09.08
# https://stackoverflow.com/questions/63785398/web-scraping-using-python-scrapy-or-beautiful-soup-fails-to-extract-data-from-ls

import requests
from bs4 import BeautifulSoup
import json   # only to display with indents (pretty print)

url = 'https://www.londonstockexchange.com/stock/GSK/glaxosmithkline-plc/fundamentals?lang …

Python: How to scrape longandfoster.com with scrapy

It is example code to scrape it:

# author: https://blog.furas.pl
# date: 2020.07.16
# link: 

import scrapy
import json

class MainSpider(scrapy.Spider):

    name = 'main'
    # allowed_domains = ['longandfoster.com']

    start_urls = ['https://www.longandfoster.com/include/ajax/api.aspx?op=SearchAgents&firstname=&lastname=&page=1&pagesize=200']

    def parse(self …

« Page: 5 / 11 »