I've just released the Java Sitemap Parser on SourceForge.net. The software is capable of reading Sitemaps in XML, Atom, RSS, and text format. As far as I can tell, this is the first open source Sitemap-parsing software available on the Web.
The Java Sitemap Parser was the final project for my Search Engine Development class. I talked about the project a few weeks ago and how prevalent Sitemaps are becoming. Originally we wanted to add Sitemap support to Nutch, but developing just the parser proved to be quite a task. By releasing it as an independent project, I'm hoping Nutch, Heritrix, and other open-source crawlers will integrate it into their systems.
Wednesday, May 20, 2009
Subscribe to:
Post Comments (Atom)

0 comments:
Post a Comment