Questio Verum: WWW2007 in Banff, Alberta

The WWW2007 conference is one of the best I’ve attended. The speakers were great, the papers were top-notch, the food was excellent, and you couldn’t beat the location. It was also one of the most expensive conferences I’ve been to, so I made myself attend every session I could.

Tuesday. I split time between the Adversarial Information Retrieval on the Web (AIRWeb) and the Query Log Analysis workshops.

The clearest message I left AIRWeb with was that web spam and splogs can be detected in a number of ways which will likely change over time, but the one thing that won’t change is that it will always be motivated by financial gain. In other words, follow the money. The Query Log Analysis workshop had an interesting panel talking about issues surrounding the use of search engine query logs by academic researchers and the public. I especially liked Bernard Jansen’s proposal of archiving search logs for posterity just as we archive the Web.

Wednesday. Tim Berners-Lee opened the conference with a talk about Web Science, a new initiative by MIT and the University of Southampton. An interesting comment that Tim made was that spam will not make its way into the Semantic Web. We’ll see about that...

The most impressive presentation on Wednesday was on CSurf. The presenter used lots of examples which helped clarify what they had done. One thing that I’ve noticed at every conference I’ve been to is that some of the best researchers are not necessarily the best communicators, so it was nice to actually see an effective presentation along with a good paper.

Another paper, The Discoverability of the Web was also interesting, but I wondered if wide-spread adoption of Sitemaps and mod_oai would make much of their work irrelevant.

Thursday. Prabhakar Raghaven of Yahoo gave an interesting talk about immerging technologies to fuel Web N.0. Raghaven pointed out the ESP Game, a creative game developed at Carnegie Mellon that encourages people to label images in an entertaining way.

Later I sat in on a talk by Bradley Horowitz, also from Yahoo, who discussed some initiatives at Yahoo to change how people search the Web. Essentially they want to make everyone creators, contributors, and consumers instead of the current model where 1% creates, 10% contributes, and 100% consume.

In the afternoon I attended Uri Schonfeld’s presentation on DUST, an extension of their work from last year when they had a poster. (Thanks for the citation.) I also sat in on the DevTrack and heard an interesting talk by Marc Hadley (Sun) about WADL, a way of creating Java stubs automatically for web applications.

Fun dinner Thursday night:

Friday. Today's plenary speaker was Bill Buxton (Microsoft Research). Interesting guy with some interesting predictions: MySpace is just a fad, and pixels will be everywhere and cost nothing in 5 years. The crowd seemed to like Bill’s talk a lot.

I attended the DevTrack for two sessions. The biggest highlight was Yahoo Pipes; the Semantic Web Browser presentation by Tim Berners-Lee was so-so.

I spent the breaks manning my poster and talking to interested passerbys. Quite a few showed interest, and I even had a few Microsoft and Yahoo guys stop buy. The Google guys were unfortunately nowhere to be found. Note to self: next time I have a poster, bring business cards and have some copies of my papers available like Marko did. Also I need to get a better spot- I’m not sure who Johan and Marko paid-off. wink

Saturday. I was a little burned out by Saturday, but I still managed to attend two sessions and the plenary speaker, Dick Hardt (Sxip Identity). I don’t think I’ve ever seen a talk quite like Dick’s- he probably averaged 5 slides per sentence, and it was orchestrated perfectly.

My favorite talk of the day was by Luca de Alfaro: A Content-Driven Reputation System for the Wikipedia. Basically they propose an elaborate system of measuring the input from each Wikipedia author which would allow users to see which authors are the most reputable. Luca told me after the talk that Wikipedia was showing interest is his system.

At the closing ceremony, Johan and Marko were awarded best poster for Friday, even though Johan can't seem to spell "Scholarly". I accepted the award on their behalf and had a really nice dinner with the award money. smile

Miscellaneous:

Most Interesting Fact: The average WWW'07 paper was submitted 20 times.

Most overused acronym: JSON

Paper I’ll Probably Read Next: Detecting Near-Duplicates for Web Crawling
Poster I’m Most Likely to Cite Soon: A Large-Scale Study of Robots.txt

Paper I’d Most Like to Re-Title: Effort Estimation: How Valuable is it for a Web company to Use a Cross-company Data Set Compared to Using Its Own Single-company Data Set?
(How about “The Web Company’s Use of Data Sets in Effort Estimation” instead?)

4 comments:

Anonymous7/09/2007 12:29 PM
What do you mean by that The average WWW'07 paper was submitted 20 times?
Frank McCown7/09/2007 12:32 PM
I probably should have stated it this way: "A WWW'07 paper was submitted 20 times on average." This info came straight from the conference organizers.
Anonymous7/09/2007 1:27 PM
Sorry, still didn't get it ...
If rephrase: Was an average paper reviewed by PC members 20 times?

p.s. I have been reading this phrase as "an average paper was sent to other conferences before WWW'07 as many as 20 times". Looks suspicious to me :)
Frank McCown7/09/2007 1:35 PM
The WWW conference has a paper submission system, and normally a person submits their paper once it is ready. But if the submitter later catches a mistake in their paper, they may correct their mistake and submit the paper again.

In this case, some people submitted their paper over and over again which seemed very odd... like they kept finding lots of errors.

Only the last submissions gets reviewed, but it makes you wonder why the individuals making multiple submissions didn't do a more thorough job of finding their mistakes the first time.

Hope that clears it up. :-)

Monday, May 14, 2007

WWW2007 in Banff, Alberta

4 comments: