Thursday, July 16, 2009

Report on InDP in D-Lib Magazine

My report on the Innovation in Digital Preservation workshop (InDP 2009) has just been published in D-Lib Magazine. Overall I think the workshop was a success, although we really missed not having Andreas Rauber there. I'm not sure if I'll be the one to lead the 2nd InDP, but I hope there will be one in the future.

Thanks to Spencer Lee (Virginia Tech) who filmed the workshop and created a virtual presence for InDP in Second Life, where the memories of InDP will last forever (or five years, whichever comes first). Below are some screenshots from Second Life that Spencer sent me.

Wednesday, July 15, 2009

What are you doing this summer?

I've been asked a number of times what I'm doing this summer since I'm faculty and have no classes to teach. Last summer I was doing research in Los Alamos, but this summer has been very different. A lot of my time is spent at home, getting adjusted to life with a newborn and toddler and helping Becky get some extra sleep in the mornings.

Professionally, I've presented a few papers at a conference, co-chaired a workshop, and am working on a paper about my search engine courses.

But most of my working days are spent producing a series of instructional videos for Introduction to Programming with C++ (2nd ed) by Y. Daniel Liang. You can sample a video I made just this week on file I/O. I'm not sure if the videos will be available to book owners only or made freely available on the book's website. I'll hopefully wrap these up by end the end of July and then start on videos for Liang's Introduction to Java (8th ed).

I'll be preparing soon for my Games Programming course. This course has only been offered once at Harding before, and it was taught by Dana Steil who is currently away working on his PhD. I'm excited about teaching this courses, but it's also a lot of work to teach a class for the first time, and it's a little disconcerting that I will likely not get to teach it again since Dana will likely want the course back when he returns.

So that's my summer. What are you doing?


I'm no longer doing Liang's Java book. I didn't finish the C++ videos until Aug... where does the time go?

Friday, July 10, 2009 Give me your Facebook data!

TechCrunch is reporting that is suing Facebook over their lack of data portability. is a service which allows you to aggregate your various social networks into a single location, but Facebook's data, as indicated in their Terms of Service, is still off-limits to them. Disregarding the restrictions, tried using the Facebook API and screen-scraping to get their data until being sued earlier in the year by Facebook.

This is exactly what I've been working on (with a graduate student at ODU) for the last few months. But I'm doing this to preserve the data, not necessarily to aggregate it along with other social networks. However, there's no reason why a preserved Facebook account could not be uploaded into another service.

My guess is my approach won't be looked at kindly by Facebook, but they'll probably leave me alone since I'm only providing a service for individuals to archive their account, and I'm not aggregating the data to my own server.

Tuesday, July 07, 2009

Email Preservation Parser

Here's an excerpt from an email announcement I received from Riccardo Ferrante (Smithsonian Institution Archives) about a tool for preserving email. It was one of the tools developed by the Collaborative Electronic Records Project (CERP).
The Email Parser migrates an email account and its messages into a single XML file using the Email Account XML Schema developed in collaboration with the North Carolina State Archives and the EMCAP project.

The CERP Email Parser migrates an email account in MBOX format into XML, using the schema to preserve the full body of messages, together with their attachments, and keeps intact the account’s internal organization (e.g., an Inbox containing subfolders labeled Policies, Special Events, and Projects). The CERP team successfully preserved email accounts from a variety of applications including Microsoft Outlook, AppleMail, LotusNotes, and Netscape. All email messages retain their full header content, in contrast to some tools produced in earlier research efforts.