python - Extracting arguments of JavaScript function from HTML page with BeautifulSoup -
i'm parsing html page several script blocks:
<script type="text/javascript"> // code </script> <script type="text/javascript"> foo(arg1, arg2); // code </script>
i need extract arguments of foo function - 'arg1' , 'arg2'. can obtain inner content of script tag:
def parse_foo(pagecontent): soup = beautifulsoup(pagecontent) scripttags = soup.find_all('script') script in scripttags: tagcontent = script.get_text() if tagcontent.count('foo') > 0: return tagcontent return ''
is there way arguments using beautifulsoup or should use regular expression?
pyesprima port of esprima, "a high performance, standard-compliant ecmascript parser written in ecmascript". fortunately, it's easy use. unfortunately, it's bit slow.
there's online parser tool use: http://esprima.org/demo/parse.html
when input foo(arg1,arg2);
, comes back:
{ "type": "program", "body": [ { "type": "expressionstatement", "expression": { "type": "callexpression", "callee": { "type": "identifier", "name": "foo" }, "arguments": [ { "type": "identifier", "name": "arg1" }, { "type": "identifier", "name": "arg2" } ] } } ] }
tree:
expressionstatement | expression / | \ type=callexpression callee arguments | name=foo
- look expressionstatement expression.callee.name called "foo".
- return expressionstatement's arguments.raw (you need provide "raw" option true, see docs)
Comments
Post a Comment