Saturday, October 24, 2009

Article in CACM

Check out my article Why web sites are lost (and how they're sometimes found) in the November edition of the Communications of the ACM. My co-authors were Cathy Marshall (Microsoft Research) and Michael Nelson (Old Dominion University).

If you don't have an ACM Digital Library subscription, you can access the pre-print here.

We have surveyed individuals who have lost their websites (through hard drive crashes, ISP bankruptcies, etc.) or have tried to recover websites that once belonged to others. We investigate why these websites were lost and how individuals reconstructed them, including how they recovered data from search engine caches and web archives. The findings suggest that digital data loss is likely to continue since backups are frequently neglected or performed incorrectly; furthermore, respondents perceive that loss is uncommon and that data safety is the responsibility of others. Finally we suggest that this benign neglect be countered by lazy preservation techniques.