ruby - Parsing HTML from a defined start point to a defined end point? -
i have html:
<hr noshade> <p><a href="#1">some text here</a></p> <p style="margin-top:0pt;margin-bottom:0pt;line-height:120%;"><span style="color:#000000;font-weight:bold;">this description</span></p> <hr noshade> <!-- <hr noshade> delimiter me --> <p><a href="#2">some more text here</a></p> <p style="margin-top:0pt;margin-bottom:0pt;line-height:120%;"><span style="color:#000000;font-weight:bold;">this description more text</span></p> <hr noshade>
while parsing using nokogiri, want print information between each of these set of tags separated own delimiter <hr noshade>
. so, first block should print information between "p" tags lie between 2 hr noshade
tags , on.
i'm using accepted answer on xpath select elements between 2 specific elements
i have semi-safisfactory solution
you can use xpath expression:
.//hr[1][@noshade] /following-sibling::*[not(self::hr[@noshade])] [count(preceding-sibling::hr[@noshade])=1]
for first group between <hr noshade>
1 , 2,
then,
.//hr[2][@noshade] /following-sibling::*[not(self::hr[@noshade])] [count(preceding-sibling::hr[@noshade])=2]
for elements between <hr noshade>
2 , 3, etc.
what these expressions select:
- all siblings of
<hr noshade>
, specified position n - that have n
<hr noshade>
previous siblings, i.e. positionned in n'th group - and not
<hr noshade>
themselves
as select several elements between 2 <hr noshade>
, may have loop on results , extract data each sibling element.
anyone on more generic solution?
Comments
Post a Comment