Monday, July 28, 2008

How cool is Cuil?

Today a new competitor enters the world of Web search: Cuil (pronounced "cool"). What's notable about this newcomer is that it's president and founder, Anna Patterson, is an ex-Googler as are several of Cuil's VPs.

In 2004, Patterson developed a search engine called Recall that was used to search the Internet Archive's massive corpus (apparently the search engine didn't last long... the Archive is only searchable by URL today). Shortly thereafter, she was hired by Google only to leave in 2006 to startup her own Google competitor. How much of Google's intellectual property went with her? That's a tough one to answer.

So why does Cuil think it can compete with Google?
  1. Cuil supposedly index three times as much content as Google.
  2. Cuil presents results in a magazine-like, multi-column format with more snippet text than Google, including embedded images.
  3. Cuil has an "Explore by Categories" widget that attempts to categorize pages.
Considering Google doesn't index every page they know about, it's hard to argue that the size difference is really significant. What will make or break their search engine is the quality of results and the interface. Some have already done some testing and named Google the winner. I did a little test querying for my name, and the results were not quite up to par.


Here's how I scored it:
  • Result number 1 (top-left) links to my old website at Old Dominion University instead of my current site at Harding (next result to the right). -1 point
  • The photo of me in result 1 comes from a different website entirely, so I'm impressed they made the connection. +1 point
  • The photo in the Harding result is not me (wish I was that tan). -1 point
  • The result at the bottom-left is from DBLP which indexes academic papers. It's certainly relevant. +1 point
  • The photo in the DBPL result is not me (I'm a lot more buff)- it's the actor Frank McCown, better known as Rory Calhoun. -1 point
  • The next result to the right points to celebrity entry for Frank McCown AOL's Television website. This is a website that does their own web mining and erroneously marked my blog and Harding website as belonging to Frank McCown the actor. (BTW, this is a really tough problem to solve.) -1 point
  • The first categorization labels in the upper-right under Digital Libraries were somewhat descriptive of my research interests or projects I've been involved with: Digital preservation, Open Archives Initiative, and LOCKSS. +1 point
  • But when I click on National Science Digital Library, I get 0 results. -1 point
So my overall score:-2 points. Using the same query at Google shows 8 of the top 10 results are about me (result #1 points to my blog, #3 to my Harding website), but Google is less ambitious and doesn't mix in photos or categories. Still, I'd have to give Google a higher score than -2.

Does any else have any thoughts on Cuil?

Update on 7/30/2008:

Someone at Java Rants has created a parody of Cuil using Yahoo's new BOSS Search API: Yuil.

6 comments:

  1. So, this is what you are doing at work all day?

    ReplyDelete
  2. I searched for myself on Cuil and Google.

    Google has my site as the #1 result, while the top 5 results on Cuil are all bare-bones profile pages for people who aren't me.
    Cuil lists my site as results 6, 7, 8, and 9. The second page mostly consists of more links to my blog coupled with unrelated images.

    Google also lists some results that aren't me, but most of them are much more relevant. Also, it hides all of the hits to my blog under two results and a link for more from my site.

    Google is the clear winner for this search.

    ReplyDelete
  3. i think its interesting that they do not rank by pagerank but by relevance. i just wonder what exactly that means... i hope we are not back to a ranking where a page with 5x "britney spears" in its title beats a page where it occurs just once!

    from http://www.cuil.com/info/:
    "Rather than rely on superficial popularity metrics, Cuil searches for and ranks pages based on their content and relevance. When we find a page with your keywords, we stay on that page and analyze the rest of its content, its concepts, their inter-relationships and the page’s coherency."

    i like the interface and the longer snippets but the fact that you can click on the incorrect images and get to the correct page is annoying...

    ReplyDelete
  4. They're being deliberately vague about what "relevancy" is to protect their intellectual property. I'd be really interested to know what' going into their secret sauce. :-)

    I can't imagine them ignoring the web graph (i.e., algorithms like PageRank) since it's been so successful in improving web search. Maybe they are just using that property less significantly than other properties in their secret relevancy formula.

    ReplyDelete
  5. We checked this out at our office today and found the results to be lacking. My coworker who is about to go on vacation searched for "Florida Spearfishing Regulations". The relevance of the results was very poor, especially compared to google's results. I am underwhelmed by its capability at this point.

    I also think that they made a huge mistake with an early marketing campaign. Until their search results are really close or better than google's, releasing early gives us a first look at an incomplete product. That in turn has me writing it off and not taking them seriously when they do get their act together at some point in the future.

    ReplyDelete
  6. This morning I saw on Reddit/Programming the #1 story was entitled "I don't understand what Cuil is doing". Clicking on the link did a search for Cobol on Cuil. Guess how many results it returned?

    ReplyDelete