Monday, June 22, 2009

Elrod on Twitter and Iran

I just found out that Harding Professor Mark Elrod was interviewed just a few days ago by Jessica Dean on KATV-7 about Iranians using Twitter (see the video below). Just a few months ago David Adams was interviewed by THV-11 about the history of the flu. Looks like the vast expertise of our history dept is starting to get tapped by the local press. wink

Thursday, June 18, 2009

I'm at JCDL 2009 in Austin

JCDL 2009 is about to wrap up. It's been a good conference with some interesting presentations, and I've enjoyed catching up with old friends. The conference is being held on the UT campus... short on grass but big on buildings. I think the UT football stadium is more impressive than many NFL stadiums I've visited. I guess that's what happens when you win a few national championships.

I especially enjoyed the two panels. The first panel, What should we preserve from a born-digital world?, basically came to the conclusion that everything should be saved. I concur... disk space is cheap, and it's hard to know what will truly be valuable years from now. I also enjoyed hearing about Megan Winget's work in preserving games.

The second panel, Google as Library Redux, discussed the unfortunate conclusion of Google's lawsuit with publishers and authors, agreeing to a settlement instead of pressing the court to settle the bigger questions in regards to copyright, orphaned works, etc. One of the more provocative statements came from Michael Lesk who said JCDL was irrelevant because there were no attendees from Google, Amazon, Microsoft, etc. We are being ignored. Ouch. But he may be right. I see plenty of guys from Google et al. at the WWW and SIGIR conferences.

I gave a couple of talks this year (see my slides below). There was a lot of interest particularly in my Facebook paper, What Happens When Facebook is Gone?, where I discuss the ramifications of having all our data locked-up in the walled garden of Facebook. Carlton Northern, a graduate student at ODU, is currently working on a Facebook archiving add-on for Firefox, and hopefully it will be available soon.

My second paper, A Framework for Describing Web Repositories, is work pulled from my dissertation. In it I discuss how we can view web repositories (everything from a search engine cache to a web archive) in a more abstract manor. I propose some new terminology and an API that web repositories could/should implement to be helpful to clients accessing the repository's contents.

Tomorrow I'll be co-hosting a the InDP 2009 workshop. It's an all-day event, and I'll be flying home late tomorrow night. It'll be good to be back with the family.

Tuesday, June 09, 2009

I think I'm going to be sick...

No need to blatantly lie to your professor anymore... a new "service" helps students deceive their professors by giving them a corrupted file to turn-in, possibly buying them a few more hours or days to work on their assignment. When the professor goes to access the assignment and notices the submitted file was corrupted, he'll just ask the student to re-submit her file. The student is happy to oblige, and this time she submits the completed assignment to the unsuspecting professor.

I'm not sure if I'm more sickened by the thought of someone developing such a service or the thought that they are likely to be quite successful.

Update on 6/22/09

I thought about this problem a little more, and there's really a simple solution for the technically-inclined.
  1. Have the student produce an MD5 hash of the file before it is emailed or submitted to the professor, and have the student email the hash to the professor.

  2. If the received file is corrupted, the professor should produce an MD5 hash of the file. If it matches the hash from the student, he received the correct file, so the student's original file was corrupted. Let him bring in his laptop and show you how his file could be opened successfully on his machine since it won't open on yours. Probably he won't be able to, so give him a zero.

  3. If the submitted file's hash does not match the submitted hash, the file got garbled in transmission or the student did not email the correct hash. The student should just resubmit the file... eventually the received file's hash should match the original hash. If the student is not able to produce a file that matches the original hash, he's either incompetent because he did not properly create the original hash, or he modified the original file (which he shouldn't do if it's finished), or he's trying to cheat. Either way, give him a zero. (Wow, I'm mean!)

Tweet this: Manor one of 20 developers to follow

Elijah Manor, one of our Harding CS graduates, was just listed in 20 Developers to Follow on Twitter. Very cool.

Thursday, June 04, 2009

Google Squared & Wolfram Alpha

Structuring the world's unstructured data... this is the future of search. These last few weeks have seen some impressive attempts to do just this by Wolfram Alpha and Google Squared.

Wolfram Alpha, which launched on May 18, is pulling results from their highly curated, massive database which is likely built atop massive (possibly unstructured) data sets. Google Squared, launched on May 12, is pulling results straight from the unstructured Web. These two approaches are complementary, but they are also competitive.

I'll provide just a couple of examples.

Below is Wolfram Alpha's answer to the query passing touchdowns Dallas Cowboys, Denver Broncos. Wolfram Alpha is providing a graph of data they probably acquired from a trusted source (they give some source information, but nothing specific).

The same query against Google Squared won't produce a very useful result. But a query for NFL teams results in a table of results pulled from a variety of websites. The data making up the first row is from, a travel website, Wikipedia. Why they are not just taking information from a single trusted site like is anyone's guess... it likely has to do with making their search algorithms more generic.

Give these search engines a try and let me know what you think.