Tuesday. I split time between the Adversarial Information Retrieval on the Web (AIRWeb) and the Query Log Analysis workshops.
The clearest message I left AIRWeb with was that web spam and splogs can be detected in a number of ways which will likely change over time, but the one thing that won’t change is that it will always be motivated by financial gain. In other words, follow the money. The Query Log Analysis workshop had an interesting panel talking about issues surrounding the use of search engine query logs by academic researchers and the public. I especially liked Bernard Jansen’s proposal of archiving search logs for posterity just as we archive the Web.
Wednesday. Tim Berners-Lee opened the conference with a talk about Web Science, a new initiative by MIT and the University of Southampton. An interesting comment that Tim made was that spam will not make its way into the Semantic Web. We’ll see about that...
The most impressive presentation on Wednesday was on CSurf. The presenter used lots of examples which helped clarify what they had done. One thing that I’ve noticed at every conference I’ve been to is that some of the best researchers are not necessarily the best communicators, so it was nice to actually see an effective presentation along with a good paper.
Another paper, The Discoverability of the Web was also interesting, but I wondered if wide-spread adoption of Sitemaps and mod_oai would make much of their work irrelevant.
Thursday. Prabhakar Raghaven of Yahoo gave an interesting talk about immerging technologies to fuel Web N.0. Raghaven pointed out the ESP Game, a creative game developed at Carnegie Mellon that encourages people to label images in an entertaining way.
Later I sat in on a talk by Bradley Horowitz, also from Yahoo, who discussed some initiatives at Yahoo to change how people search the Web. Essentially they want to make everyone creators, contributors, and consumers instead of the current model where 1% creates, 10% contributes, and 100% consume.
In the afternoon I attended Uri Schonfeld’s presentation on DUST, an extension of their work from last year when they had a poster. (Thanks for the citation.) I also sat in on the DevTrack and heard an interesting talk by Marc Hadley (Sun) about WADL, a way of creating Java stubs automatically for web applications.
Fun dinner Thursday night:
Friday. Today's plenary speaker was Bill Buxton (Microsoft Research). Interesting guy with some interesting predictions: MySpace is just a fad, and pixels will be everywhere and cost nothing in 5 years. The crowd seemed to like Bill’s talk a lot.
I attended the DevTrack for two sessions. The biggest highlight was Yahoo Pipes; the Semantic Web Browser presentation by Tim Berners-Lee was so-so.
I spent the breaks manning my poster and talking to interested passerbys. Quite a few showed interest, and I even had a few Microsoft and Yahoo guys stop buy. The Google guys were unfortunately nowhere to be found. Note to self: next time I have a poster, bring business cards and have some copies of my papers available like Marko did. Also I need to get a better spot- I’m not sure who Johan and Marko paid-off.
Saturday. I was a little burned out by Saturday, but I still managed to attend two sessions and the plenary speaker, Dick Hardt (Sxip Identity). I don’t think I’ve ever seen a talk quite like Dick’s- he probably averaged 5 slides per sentence, and it was orchestrated perfectly.
My favorite talk of the day was by Luca de Alfaro: A Content-Driven Reputation System for the Wikipedia. Basically they propose an elaborate system of measuring the input from each Wikipedia author which would allow users to see which authors are the most reputable. Luca told me after the talk that Wikipedia was showing interest is his system.
At the closing ceremony, Johan and Marko were awarded best poster for Friday, even though Johan can't seem to spell "Scholarly". I accepted the award on their behalf and had a really nice dinner with the award money.
- Most Interesting Fact: The average WWW'07 paper was submitted 20 times.
- Most overused acronym: JSON
- Paper I’ll Probably Read Next: Detecting Near-Duplicates for Web Crawling
- Poster I’m Most Likely to Cite Soon: A Large-Scale Study of Robots.txt
- Paper I’d Most Like to Re-Title: Effort Estimation: How Valuable is it for a Web company to Use a Cross-company Data Set Compared to Using Its Own Single-company Data Set?
(How about “The Web Company’s Use of Data Sets in Effort Estimation” instead?)