Tuesday, February 06, 2007

Web Curator Tool, standardizing PDF, and orphaned works

Some notable events in the world of digital preservation:
  • The National Library of New Zealand and the British Library have collaborated to produce the Web Curator Tool (WCT), a tool that allows non-technical users to archive websites in a simplified manner. It’s essentially a wrapper around the Heritrix web crawler with numerous management functions added on. In a recent article, Philip Beresford from the British Library discusses the history of WCT and shows how it can be used to crawl and archive a website.

  • In an effort to convince the world that the PDF format is ideal for long-term storage, Adobe is submitting it to ISO for standardization. Microsoft has also submitted their Ecma-approved Office Open XML for standardization to ISO, a radical departure from the "secret-sauce" mentality Microsoft has held for years. Governments and other organizations are slowly becoming aware of the problems created by storing their data on closed formats that change over time, and Microsoft and Adobe don’t want to be dropped from their largest customers. By standardizing these formats, interoperability should be much less of an issue in the future.

  • Brewster Kahle, co-founder of the Internet Archive, recently lost a U.S. appeals decision in Kahle v. Gonzales. Kahle, along with several notable companies like Google, MSN, and Yahoo, are trying to get orphaned works (copyrighted work whose owner cannot be reached) into the public domain in order to remove legal barriers that prohibit the scanning and digital distribution of those works. Kahle rightly blames Disney for the mess:
    What happened is that some overzealous copyright laws got passed with heavy lobbying from folks like Disney and these are screwing things up... Instead of keeping just Mickey Mouse or just the profitable works under copyright for longer, they fundamentally changed the structure of copyright.