Tuesday, July 07, 2009

Email Preservation Parser

Here's an excerpt from an email announcement I received from Riccardo Ferrante (Smithsonian Institution Archives) about a tool for preserving email. It was one of the tools developed by the Collaborative Electronic Records Project (CERP).
The Email Parser migrates an email account and its messages into a single XML file using the Email Account XML Schema developed in collaboration with the North Carolina State Archives and the EMCAP project.

The CERP Email Parser migrates an email account in MBOX format into XML, using the schema to preserve the full body of messages, together with their attachments, and keeps intact the account’s internal organization (e.g., an Inbox containing subfolders labeled Policies, Special Events, and Projects). The CERP team successfully preserved email accounts from a variety of applications including Microsoft Outlook, AppleMail, LotusNotes, and Netscape. All email messages retain their full header content, in contrast to some tools produced in earlier research efforts.