Friday, January 11, 2008

Search engine class, Nutch, and Wikia Search

This is the final week before classes begin, and I'm frantically preparing for my Search Engine Development class. There are just a handful of courses taught like this that I'm aware of, and thankfully most of the lecture notes are available online.

I've really been really struggling with how much development work to give my students... do I require them to write the complete engine from scratch or use existing components? There are advantages and disadvantages to both approaches, so I'm shooting for something in the middle.

I've decided that we're write a few components ourselves, but we're also going use Nutch, an open source search engine written in Java. I hope we'll be able to make a major contribution to Nutch, although I'm not sure exactly what that will be yet. By using a somewhat mature open source project, my students will get to experience what it's like to learn a large pre-existing code base and understand how software is developed in the open source arena.

Just a few days ago, Wikia Search (alpha) was launched to less than stellar reviews. Wikia Search is Jimmy Wales' attempt to create an open source search engine that uses human feedback. Wales expects Wikia Search to compete with Google and hoping it will some day capture around 5% of all searches. Wikia Search is using Nutch although they don't make that clear on their website. (I wrote a little about Wikia Search [or Wikiasari] about this time last year.)

I've tried out Wikia Search myself, and the results are pretty poor. But, as Whales points out, this is an attempt to build a search engine, not the final product. And had people judged Wikipedia's quality when it first launched, they would have thought it useless.