Wednesday, May 20, 2009

Java Sitemap Parser

I've just released the Java Sitemap Parser on SourceForge.net. The software is capable of reading Sitemaps in XML, Atom, RSS, and text format. As far as I can tell, this is the first open source Sitemap-parsing software available on the Web.

The Java Sitemap Parser was the final project for my Search Engine Development class. I talked about the project a few weeks ago and how prevalent Sitemaps are becoming. Originally we wanted to add Sitemap support to Nutch, but developing just the parser proved to be quite a task. By releasing it as an independent project, I'm hoping Nutch, Heritrix, and other open-source crawlers will integrate it into their systems.

3 comments:

  1. Thanks very much for your open source Sitemap-parsing software. I downloaded source, founded that lack of a part. net.sourceforge.sitemaps.Sitemap.SitemapType and net.sourceforge.sitemaps.Sitemap.ChangeFrequency Classes are not in source. Can you upload it again? Thanks again.

    ReplyDelete
  2. I'm not sure why you weren't able to find those classes, but I've had others use the software and even make modifications with no problem. Perhaps you didn't download everything.

    ReplyDelete
  3. Hello Frank, can you give me a example java source code to see how this Java Sitemap Parser work, thank you.

    ReplyDelete