Skip to content Skip to sidebar Skip to footer

HTMLAgilityPack And Separating On

I have some html, which is separated by
e.g.: Jack Janson
309 123 456
My Special Street 43 What is the easiest way to retrieve the information

Solution 1:

In pure XPATH over XML, you would use an XPATH expression like this: //preceding-sibling::br or //following-sibling::br (see here for help on XPATH Axes)

But, the XPATH over HTML implementation that you'll find in Html Agility Pack does not support pure text node or (Attribute node) in XPATH selection expressions (//br/text() or //br/@blah do not work for example). Note it works in filters, so, these //br[text()='blah'] or //br[@att='blah'] work.

So, back to the question, you need to combine XPATH and code, something like this:

HtmlDocument doc = new HtmlDocument();
doc.Load(myHtmlFile);

foreach (HtmlNode p in doc.DocumentNode.SelectNodes("//br"))
{
    Console.WriteLine(p.PreviousSibling.InnerText.Trim());
}

That will output

Jack Janson
309 123 456

Post a Comment for "HTMLAgilityPack And Separating On
"