web scraping - Python web scrape with Beautiful Soup -


i able scrape sites tables no issue; however, access tables customize need login first scrape because if not default output. feel close, relatively new python. looking forward learning more mechanize , beautifulsoup.

it seems logging in correctly due fact "incorrect password" error if purposely enter wrong password below, how connect login url want scrape?

from bs4 import beautifulsoup import urllib import csv import mechanize import cookielib  cj = cookielib.cookiejar() br = mechanize.browser() br.set_cookiejar(cj) br.open("http://www.barchart.com/login.php")  br.select_form(nr=0) br.form['email'] = 'username' br.form['password'] = 'password' br.submit()  #print br.response().read()  r = urllib.urlopen("http://www.barchart.com/stocks/sp500.php?view=49530&_dtp1=0").read()  soup = beautifulsoup(r, "html.parser")  tables = soup.find("table", attrs={"class" : "datatable ajax"})  headers = [header.text header in tables.find_all('th')]  rows = []  row in tables.find_all('tr'):     rows.append([val.text.encode('utf8') val in row.find_all('td')])   open('snp.csv', 'wb') f:     writer = csv.writer(f)     writer.writerow(headers)     writer.writerows(row row in rows if row)  #from pymongo import mongoclient #import datetime #client = mongoclient('localhost', 27017)  print soup.table.get_text() 

i not sure need login retrieve url in question; same results whether logged in or not.

however, if need logged in access other data, problem logging in mechanize, using urllib.urlopen() access page. there no connection between two, session data gathered mechanize not available urlopen when makes request.

in case don't need use urlopen() because can open url , access html mechanize:

r = br.open("http://www.barchart.com/stocks/sp500.php?view=49530&_dtp1=0") soup = beautifulsoup(r.read(), "html.parser") 

Comments

Popular posts from this blog

c# - How Configure Devart dotConnect for SQLite Code First? -

java - Copying object fields -

c++ - Clear the memory after returning a vector in a function -