How To Convert A Jsoup Document To A W3C Document?
Solution 1:
Alternatively, Jsoup provides the W3CDom class with the method fromJsoup
. This method transforms a Jsoup Document into a W3C document.
Document jsoupDoc = ...
W3CDom w3cDom = new W3CDom();
org.w3c.dom.Document w3cDoc = w3cDom.fromJsoup(jsoupDoc);
UPDATE:
- Since 1.10.3 W3CDom is no longer experimental.
- Up to Jsoup 1.10.2 W3CDom class is still experimental.
Solution 2:
To retrieve a jsoup document via HTTP, make a call to Jsoup.connect(...).get()
. To load a jsoup document locally, make a call to Jsoup.parse(new File("..."), "UTF-8")
.
The call to DomBuilder
is correct.
When you say,
I used an available library DOMBuilder for this but when parsing I get org.w3c.dom.Document as null.
I think you mean, "I used an available library, DOMBuilder, for this but when printing the result, I get [#document: null]
." At least, that was the result I saw when I tried printing the w3cDoc
object - but that doesn't mean the object is null. I was able to traverse the document by making calls to getDocumentElement
and getChildNodes
.
public static void main(String[] args) {
Document jsoupDoc = null;
try {
jsoupDoc = Jsoup.connect("http://stackoverflow.com/questions/17802445").get();
} catch (IOException e) {
e.printStackTrace();
}
org.w3c.dom.Document w3cDoc= DOMBuilder.jsoup2DOM(jsoupDoc);
Element e = w3cDoc.getDocumentElement();
NodeList childNodes = e.getChildNodes();
Node n = childNodes.item(2);
System.out.println(n.getNodeName());
}
Post a Comment for "How To Convert A Jsoup Document To A W3C Document?"