Articles for tag: beautifusoup

Search on blog:

Python: How to scrape finance.yahoo.com-quote-spy with requests

It is example code to scrape it:

# date: 2019.04.23

import requests
from bs4 import BeautifulSoup
import json

url = 'https://finance.yahoo.com/quote/SPY'
result = requests.get(url)

html = BeautifulSoup(result.content, 'html.parser')
script = html.find_all('script')[-3].text
data = script[112:-12]
print(data[:10], data …

read more | czytaj więcej

Python: How to scrape finance.yahoo.com with news with selenium

It is example code to scrape it:

#!/usr/bin/env python3

# author: https://blog.furas.pl
# date: 2020.07.11
# 
from selenium import webdriver
import time

#driver = webdriver.Chrome()
driver = webdriver.Firefox()

driver.get("https://finance.yahoo.com/quote/INFY/news?p=INFY")

for i in range(20):
       driver.execute_script …

read more | czytaj więcej

Python: How to scrape flashscore.com

It is example code to scrape it:

# date: 2020.06.10
# https://stackoverflow.com/questions/62293949/web-scraping-with-bs4-pyhton3-cant-find-elements/62294633#62294633

import requests
import bs4 as bs

#url = 'https://www.flashscore.com/field-hockey/netherlands/hoofdklasse/standings/'

url = 'https://d.flashscore.com/x/feed/ss_1_INmPqO86_GOMWObX1_table_overall'

headers = {
#    'User-Agent': 'Mozilla/5.0'
#    'User-Agent': 'Mozilla/5 …

read more | czytaj więcej

Python: How to scrape ford.co.uk with dowload-manual with Selenium + BS

It is example code to scrape it:

# https://stackoverflow.com/questions/60377798/error-while-selecting-dependent-drop-down-and-click-the-option-in-python/60378558#60378558
# Error while selecting dependent drop down and click the option In Python


# BTW: sometimes page shows popup window at start but I didn't try to solve this problem

# BTW: I had to check `if …

read more | czytaj więcej

Python: How to scrape forexfactory.com

It is example code to scrape it:

#!/usr/bin/env python3 

# date: 2019.12.30
# https://stackoverflow.com/questions/59535798/python-webscraping-with-beautifulsoup-not-displaying-full-content/59536553#59536553

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.forexfactory.com/#detail=108867")
# page uses JavaScript to redirect page so browser may shows different results …

read more | czytaj więcej

Python: How to scrape forum.toribash.com

It is example code to scrape it:

#
# https://stackoverflow.com/a/48078358/1832058
# 

import requests
from lxml import html

s = requests.session()

result = s.get("http://forum.toribash.com/tori_spy.php")
tree = html.fromstring(result.content)

for script in tree.xpath("//script"):
    if script.text and 'highestid' in script.text …

read more | czytaj więcej

Python: How to scrape fr.alliexpress.com with requests

It is example code to scrape it:

#!/usr/bin/env python3

#
# https://stackoverflow.com/a/47851923/1832058
#

import urllib.request
from bs4 import BeautifulSoup

headers = {
    #'User-Agent': 'Mozilla/5.0',

    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:48.0) Gecko/20100101 Firefox/48.0',

    #'User-Agent': 'Mozilla/5.0 (Windows …

read more | czytaj więcej

Python: How to scrape fundamentus.com.br with requests, pandas, json

It is example code to scrape it:

# author: https://blog.furas.pl
# date: 2020.07.16
# link: https://stackoverflow.com/questions/62921395/pandas-include-key-to-json-file/

import requests
import pandas as pd
import json

url = 'http://www.fundamentus.com.br/resultado.php'

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64 …

read more | czytaj więcej

Python: How to scrape fundrazr.com

It is example code to scrape it:

#
# https://stackoverflow.com/a/47495628/1832058
#

import scrapy
import pyquery

class MySpider(scrapy.Spider):

    name = 'myspider'

    start_urls = ['https://fundrazr.com/find?category=Health']

    def parse(self, response):
        print('--- css 1 ---')
        for title in response.css('h2'):
            print('>>>', title)

        print('--- css 2 ---')
        for title …

read more | czytaj więcej

Python: How to scrape g2a.com

It is example code to scrape it:

# date: 2019.05.19
# author: Bartłomiej 'furas' Burek
# https://stackoverflow.com/questions/56208824/403-forbidden-error-when-scraping-a-site-user-agents-already-used-and-updated?noredirect=1#comment99040341_56208824

import requests

url = 'https://www.g2a.com/lucene/search/filter?&search=The+Elder+Scrolls+V:+Skyrim&currency=nzd&cc=NZD'

headers = {
 #   'User-Agent': 'Mozilla/5.0 (X11 …

read more | czytaj więcej

Python: How to scrape gall.dcinside.com

It is example code to scrape it:

#!/usr/bin/env python3 

# date: 2020.01.01
# https://stackoverflow.com/questions/59551193/i-want-to-download-images-from-python-what-should-i-do/

from selenium import webdriver
import requests

#path = r"C:\Users\qpslt\Desktop\py\chromedriver_win32\chromedriver.exe"
#driver = webdriver.Chrome(path)
driver = webdriver.Firefox()

url = "https://gall.dcinside.com/board …

read more | czytaj więcej

Python: How to scrape games.crossfit.com with requests

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2019.12.20
# https://stackoverflow.com/questions/59419682/how-do-i-extract-this-entire-table-and-store-it-in-csv-file/

import requests

r = requests.get('https://games.crossfit.com/competitions/api/v1/competitions/open/2020/leaderboards?view=0&division=1&scaled=0&sort=0')

data = r.json()

for row …

read more | czytaj więcej

Python: How to scrape goodjobsfirst.org

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.06.10
# https://stackoverflow.com/questions/62306522/scraping-list-of-values-from-drop-down/

from selenium import webdriver
from selenium.webdriver.support.ui import Select

#browser = webdriver.Chrome(executable_path=r"C:\webdrivers\chromedriver.exe")
browser = webdriver.Firefox()

url = ('https://www.goodjobsfirst.org/violation-tracker …

read more | czytaj więcej

Python: How to scrape google.com-finance with selenium

It is example code to scrape it:

#!/usr/bin/env python3 

# date: 2019.12.09
# ?

from selenium import webdriver

url = 'https://www.google.com/finance'
#driver = webdriver.Chrome()
driver = webdriver.Firefox()

driver.get(url)

all_tables = driver.find_elements_by_css_selector('.mod')

for table in all_tables[1:]:
    print('====== TABLE ======')
    try:
        for item in table …

read more | czytaj więcej

Python: How to scrape gpw.pl with spółki with requests + BS

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.04.28
# https://stackoverflow.com/questions/61481586/how-to-scrap-the-non-loaded-content-of-the-page/

import requests
from bs4 import BeautifulSoup

def get_page(session):
    url = "https://www.gpw.pl/spolki"

    response = session.get(url)
    soup = BeautifulSoup(response.content)

    all_links = soup.find_all("a")

    data = []

    for …

read more | czytaj więcej

Python: How to scrape grainger.com with requests, JSON

It is example code to scrape it:

# author: https://blog.furas.pl
# date: 2020.07.09
# link: https://stackoverflow.com/questions/62812282/why-arent-the-table-data-tags-available-in-the-soup/

import requests

url = 'https://www.grainger.com/product/tableview/GRAINGER-APPROVED-Type-F-Stainless-Steel-Cam-WP11501162?breadcrumbCatId=1001429'
r = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
data = r.json()

for item in …

read more | czytaj więcej

Python: How to scrape haul.com

It is example code to scrape it:

# https://stackoverflow.com/questions/47872975/python-web-scraping-format-cleaning/47879161#47879161

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import csv

urls = [
    'https://www.uhaul.com/Locations/Self-Storage-near-Charlotte-NC-28206/780052/',
    'https://www.uhaul.com/Locations/Self-Storage-near-Charlotte-NC-28212/780063/'
]

filename = 'u_haul.csv'

f = open …

read more | czytaj więcej

Python: How to scrape hedgefollow.com with selenium

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.05.25
# https://stackoverflow.com/questions/62003463/web-scraping-hedge-fund-data-with-beautifulsoup

import selenium.webdriver
import time

url = 'https://hedgefollow.com/funds/Duquesne+Family+Office'

driver = selenium.webdriver.Firefox()
driver.get(url)

time.sleep(3)

table = driver.find_element_by_id('dgtopHolders')

print('--- headers …

read more | czytaj więcej

Python: How to scrape hltv.org

It is example code to scrape it:

#!/usr/bin/env python3

import scrapy
#from scrapy.commands.view import open_in_browser
#import json

class MySpider(scrapy.Spider):

    name = 'myspider'

    #allowed_domains = []

    start_urls = ['https://www.hltv.org/matches']

    #def start_requests(self):
    #    self.url_template = http://quotes.toscrape.com/tag/{}/page/{}/
    #    self.tags = ['love', 'inspirational', 'life …

read more | czytaj więcej

Python: How to scrape horariodebuses.com with requests

It is example code to scrape it:

#!/usr/bin/env python3

# date: 2020.01.13
# https://stackoverflow.com/questions/59710076/encode-unicode-characters-in-dict-to-send-as-data-in-a-post-request/

# page uses ISO-8859-1 

import requests
import urllib.parse
import webbrowser

d = {
    'fromClass': 'Golfito',
    'toClass': 'Cañon del Guarco',
    'viaClass': '',
    'jDate': '01/12/2020',
    'jTime': '21:34',
    'addtime': '0',
    'lang': 'en …

read more | czytaj więcej

« Page: 4 / 11 »