Here are a few highlights so far:
- In the opening talk Monday morning, Daniel Clancy, Engineering Director of the Google Book Search, talked about Google’s efforts to digitize and index books from the G5, the five libraries that are cooperating with the digitization process. It was a very informative talk, and I certainly applaud Google for taking on such a massive and important project.
- Andrew McCallum presented a paper about leveraging topic analysis and introduced rexa.info, a website like Google Scholar that displays published papers. The cool thing is how they also show co-authorship, authors that you site, and authors that cite you. They just had 2 of my papers indexed, but I guess that isn’t bad for a research project.
- Carl Lagoze presented a paper that honestly addressed some of the shortcomings of the “low barrier” implementation of the NSDL. Turns out the implementation is rather people-intensive: problems include content providers unwilling to prove quality metadata and improperly implementing OAI-PMH. There was one notable absence from the references. At least one of the audience members publicly admitted being depressed at the current situation. I also do wonder about the future of a digital library that can’t scale without an enormous amount of people-intensive work. How do you build a DL that in many ways is competing with Google?
- Johan gave a very in-your-face poster presentation: “Have any of you wondered about your funky JCDL reviews from last year?” Johan’s poster showed how the reviewers from last year’s JCDL were not reviewing papers based solely on their expertise. So why were non-experts judging papers that weren’t in their domain?
- Bill Arms introduced me to Andreas Paepcke, a researcher at Stanford who works with WebBase/WebVac. Looks like they are making all their crawls available to other researchers who want them, but they won’t work for my website reconstruction research since it depends on real-time search engine content.
- I talked some with Alesia Zuccala who presented her work with LexiURL, a piece of software written by Mike Thelwall. LexiURL uses the Yahoo API to report backlinks for a set of URLs. I really enjoy reading Thelwall's papers and hope to meet him at some point.
- This morning Jonathan Zittrain gave a very entertaining and informative presentation about redaction, restriction, and removal of open information. It was one of the best presentations that I’ve seen, and his PowerPoint presentation was a fantastic example of how to put together a presentation. Even Tufte would have approved. Once of the most memorable slides showed the accidental grouping of two books on Amazon.com: a children’s book with “American Jihad”.
Technorati Tag: jcdl2006