Thursday, July 16, 2009

Report on InDP in D-Lib Magazine

My report on the Innovation in Digital Preservation workshop (InDP 2009) has just been published in D-Lib Magazine. Overall I think the workshop was a success, although we really missed not having Andreas Rauber there. I'm not sure if I'll be the one to lead the 2nd InDP, but I hope there will be one in the future.

Thanks to Spencer Lee (Virginia Tech) who filmed the workshop and created a virtual presence for InDP in Second Life, where the memories of InDP will last forever (or five years, whichever comes first). Below are some screenshots from Second Life that Spencer sent me.




Wednesday, July 15, 2009

What are you doing this summer?

I've been asked a number of times what I'm doing this summer since I'm faculty and have no classes to teach. Last summer I was doing research in Los Alamos, but this summer has been very different. A lot of my time is spent at home, getting adjusted to life with a newborn and toddler and helping Becky get some extra sleep in the mornings.

Professionally, I've presented a few papers at a conference, co-chaired a workshop, and am working on a paper about my search engine courses.

But most of my working days are spent producing a series of instructional videos for Introduction to Programming with C++ (2nd ed) by Y. Daniel Liang. You can sample a video I made just this week on file I/O. I'm not sure if the videos will be available to book owners only or made freely available on the book's website. I'll hopefully wrap these up by end the end of July and then start on videos for Liang's Introduction to Java (8th ed).

I'll be preparing soon for my Games Programming course. This course has only been offered once at Harding before, and it was taught by Dana Steil who is currently away working on his PhD. I'm excited about teaching this courses, but it's also a lot of work to teach a class for the first time, and it's a little disconcerting that I will likely not get to teach it again since Dana will likely want the course back when he returns.

So that's my summer. What are you doing?

Friday, July 10, 2009

Power.com: Give me your Facebook data!

TechCrunch is reporting that Power.com is suing Facebook over their lack of data portability. Power.com is a service which allows you to aggregate your various social networks into a single location, but Facebook's data, as indicated in their Terms of Service, is still off-limits to them. Disregarding the restrictions, Power.com tried using the Facebook API and screen-scraping to get their data until being sued earlier in the year by Facebook.

This is exactly what I've been working on (with a graduate student at ODU) for the last few months. But I'm doing this to preserve the data, not necessarily to aggregate it along with other social networks. However, there's no reason why a preserved Facebook account could not be uploaded into another service.

My guess is my approach won't be looked at kindly by Facebook, but they'll probably leave me alone since I'm only providing a service for individuals to archive their account, and I'm not aggregating the data to my own server.

Tuesday, July 07, 2009

Email Preservation Parser

Here's an excerpt from an email announcement I received from Riccardo Ferrante (Smithsonian Institution Archives) about a tool for preserving email. It was one of the tools developed by the Collaborative Electronic Records Project (CERP).
The Email Parser migrates an email account and its messages into a single XML file using the Email Account XML Schema developed in collaboration with the North Carolina State Archives and the EMCAP project.

The CERP Email Parser migrates an email account in MBOX format into XML, using the schema to preserve the full body of messages, together with their attachments, and keeps intact the account’s internal organization (e.g., an Inbox containing subfolders labeled Policies, Special Events, and Projects). The CERP team successfully preserved email accounts from a variety of applications including Microsoft Outlook, AppleMail, LotusNotes, and Netscape. All email messages retain their full header content, in contrast to some tools produced in earlier research efforts.

Monday, June 22, 2009

Elrod on Twitter and Iran

I just found out that Harding Professor Mark Elrod was interviewed just a few days ago by Jessica Dean on KATV-7 about Iranians using Twitter (see the video below). Just a few months ago David Adams was interviewed by THV-11 about the history of the flu. Looks like the vast expertise of our history dept is starting to get tapped by the local press. wink

Thursday, June 18, 2009

I'm at JCDL 2009 in Austin

JCDL 2009 is about to wrap up. It's been a good conference with some interesting presentations, and I've enjoyed catching up with old friends. The conference is being held on the UT campus... short on grass but big on buildings. I think the UT football stadium is more impressive than many NFL stadiums I've visited. I guess that's what happens when you win a few national championships.

I especially enjoyed the two panels. The first panel, What should we preserve from a born-digital world?, basically came to the conclusion that everything should be saved. I concur... disk space is cheap, and it's hard to know what will truly be valuable years from now. I also enjoyed hearing about Megan Winget's work in preserving games.

The second panel, Google as Library Redux, discussed the unfortunate conclusion of Google's lawsuit with publishers and authors, agreeing to a settlement instead of pressing the court to settle the bigger questions in regards to copyright, orphaned works, etc. One of the more provocative statements came from Michael Lesk who said JCDL was irrelevant because there were no attendees from Google, Amazon, Microsoft, etc. We are being ignored. Ouch. But he may be right. I see plenty of guys from Google et al. at the WWW and SIGIR conferences.

I gave a couple of talks this year (see my slides below). There was a lot of interest particularly in my Facebook paper, What Happens When Facebook is Gone?, where I discuss the ramifications of having all our data locked-up in the walled garden of Facebook. Carlton Northern, a graduate student at ODU, is currently working on a Facebook archiving add-on for Firefox, and hopefully it will be available soon.




My second paper, A Framework for Describing Web Repositories, is work pulled from my dissertation. In it I discuss how we can view web repositories (everything from a search engine cache to a web archive) in a more abstract manor. I propose some new terminology and an API that web repositories could/should implement to be helpful to clients accessing the repository's contents.




Tomorrow I'll be co-hosting a the InDP 2009 workshop. It's an all-day event, and I'll be flying home late tomorrow night. It'll be good to be back with the family.

Tuesday, June 09, 2009

I think I'm going to be sick...

No need to blatantly lie to your professor anymore... a new "service" helps students deceive their professors by giving them a corrupted file to turn-in, possibly buying them a few more hours or days to work on their assignment. When the professor goes to access the assignment and notices the submitted file was corrupted, he'll just ask the student to re-submit her file. The student is happy to oblige, and this time she submits the completed assignment to the unsuspecting professor.

I'm not sure if I'm more sickened by the thought of someone developing such a service or the thought that they are likely to be quite successful.



Update on 6/22/09

I thought about this problem a little more, and there's really a simple solution for the technically-inclined.
  1. Have the student produce an MD5 hash of the file before it is emailed or submitted to the professor, and have the student email the hash to the professor.

  2. If the received file is corrupted, the professor should produce an MD5 hash of the file. If it matches the hash from the student, he received the correct file, so the student's original file was corrupted. Let him bring in his laptop and show you how his file could be opened successfully on his machine since it won't open on yours. Probably he won't be able to, so give him a zero.

  3. If the submitted file's hash does not match the submitted hash, the file got garbled in transmission or the student did not email the correct hash. The student should just resubmit the file... eventually the received file's hash should match the original hash. If the student is not able to produce a file that matches the original hash, he's either incompetent because he did not properly create the original hash, or he modified the original file (which he shouldn't do if it's finished), or he's trying to cheat. Either way, give him a zero. (Wow, I'm mean!)

Tweet this: Manor one of 20 developers to follow

Elijah Manor, one of our Harding CS graduates, was just listed in 20 Developers to Follow on Twitter. Very cool.