Tuesday, August 31, 2010

Some computing history

The fall semester is in full swing here at Harding, and I've decided to convert some of my notes on historical events in computing to slides. If you are interested, here are my slides on Internet and Web history and history of graphical user interfaces (GUIs). I'll admit the GUI slides are slanted toward Microsoft because we focus on Windows programming in my GUI course.

I'm still working on my general history of computing and will post an update later.

Tuesday, August 17, 2010

Why I left Wikipedia

An article in this week's Newsweek reports that Wikipedia has been floundering since the spring: "Thousands of volunteer editors, the loyal Wikipedians who actually write, fact-check, and update all those articles, logged off-- many for good." The WSJ first reported the fallout almost a year ago when it was discovered that 49,000 English editors left Wikipedia during the first three months of 2009 compared to a loss of 4,900 during the same period in 2008.

Update: As one of the comments below states, the WSJ article was hasty in their conclusions. It all hinges on what you call an "editor", and a more balanced definition suggests that editors are not leaving Wikipedia in droves.

As the Newsweek article points out, there are a number of reasons why Wikipedia may be stagnating. There are so many articles already present that there is little new ground to break. Some may be scared away or frustrated by overly aggressive editors. Or perhaps "most people simply don't want to work for free."

Some research at Georgia Tech shows that editing a Wikipedia article is very challenging for computing newbies; the "Editing this way will cause your IP address to be recorded publicly" message causes lots of confusion, and this certainly prevents many from joining the ranks of Wikipedia editors.

I have always been a Wikipedia fan. I first started making serious contributions in 2004 when I was beginning my PhD research and discovered that many of the new concepts I was being introduced to simply didn't exist in Wikipedia.

I wrote a number of articles from scratch like web archiving, web search query, adversarial information retrieval, and URL normalization and made a significant number of edits on other technical topics. I was motivated in part by being the first to write the articles and the fact that I would likely refer back to them as reference material as I continued my research.

However, I found that keeping vandalism at bay and fighting poor edits was quite time-consuming. Some articles that I valued quite highly like web crawler needed tons of work, and although the desire was there, I just didn't have the time... I was trying to complete my PhD, and maintaining Wikipedia articles was not paying the bills.

I had an ah-ha moment at a conference a few years ago when someone quoted from Wikipedia's article on digital preservation, and I could have sworn I had been the sole author of the quoted piece. Wikipedia was given credit as the source, not me. That didn't bother me all that much, but it did make me realize that contributing to Wikipedia is often not in the interests of academics who are often judged by the amount of citable material they produce. Someone citing what you wrote in Wikipedia doesn't "count" like someone citing what you wrote in a journal article.

Over the past year or so, I just have lacked the motivation necessary to put time into an anonymous forum. My time is expensive, and Wikipedia is not paying. It's hard enough just to find time to edit my blog!

I still think Wikipedia is extremely valuable, and I hope it never goes away. I regularly send my students there and encourage them to make a serious contribution.

Have you seen The Book of Eli? At the end of the movie, a group of people are attempting to restore some of the greatest literary works of mankind. They are quite happy to have nearly a complete set of Britannica encyclopedias. No mention is made about the remnants of Wikipedia. :-(

Thursday, August 05, 2010

Students needed to work the WAC

I just received word that my grant proposal with the NSF has been funded. The project is called the "Web Archive Cooperative" or WAC. It's a 3 year grant with Hector Garcia-Molina (Stanford University), Andreas Paepcke (Stanford University), Michael L. Nelson (Old Dominion University), and myself.

In short, the WAC is our attempt to provide services, tools, and data access to web scientists. We are researching methods to provide access to web data like query logs, tag annotations, blogs, profiles and Twitter messages that are often located in disparate archives. We are working on finding this data, building software tools for combining and analyzing the data, and methods to preserve the data for the long term.

What this means is that I will be looking for some highly talented/motivated CS students (currently enrolled at Harding) to work with me over the next 3 years during the summers. You will get to work closely with me and in conjunction with others at Stanford and ODU, and you will receive a stipend. If you think this is something you'd like to get involved with, please let me know.