python - Using BeautifulSoup to parse facebook -

- April 15, 2015

so i'm trying parse public facebook pages using beautifulsoup. i've managed scrape linkedin, i've spent hours trying work on facebook no luck. code i'm trying use looks this:

for urls in my_urls: try:     page = urllib2.urlopen(urls)     soup = beautifulsoup(page)     info = soup.find_all("div", class_="fsl fwb fcb")     info2 = info.findall('a')

the part that's frustrating me can title element out, , can pretty far down document, can't part need get.

this line successfuly grabs pagetitle:

info = soup.find_all("title", attrs={"id": "pagetitle"})

this line can pretty far down list of elements, can't go farther.

info = soup.find_all(id="pagelet_timeline_main_column")

here's sample page i'm trying parse, want current city it:

https://www.facebook.com/100004210542493

and heres quick screenshot of part want looks like:

http://prntscr.com/1t8xx6

i feel i'm close, can't figure out. in advance help!

edit 2: should mention can print whole soup , visually find part need, whatever reason parsing won't work way should.

try looking @ content returned using curl or wget. seeing in browser has been rendered after javascripts has been executed.

wget https://www.facebook.com/100004210542493

you might want use memchanize or selenium, since want simulate client browser (instead of handling raw content).

another issue related might beautiful soup cannot find css class if object has other classes, too

Search This Blog

SSIS

python - Using BeautifulSoup to parse facebook -

Comments

Post a Comment

Popular posts from this blog

c# - How Configure Devart dotConnect for SQLite Code First? -

java - Copying object fields -

c++ - Clear the memory after returning a vector in a function -