I started to develop a web crawler part of a bigger project, then I have to
choice what kind of HTML parser library I have to use. I have used NekoHTML
in the past and it was pretty good but it doesn’t have any helper to select
the DOM elements, you have o use the XPath, very flexible but not so easy.
I have found JSoup to be very cool library, its code is
well written, clean and the interface is powerful. I love it. I was writing
a Scala crawler so beside the JSoup interface is pretty
cool, it is very javish, I prefer to have a better integration with Scala, so
I started my first Pimp My Library pattern.
Let talk the code:
The code has been upload to github SSoup reporitory.