I've received numerous inquiries about Warrick the past few months, so I wanted to let everyone know where it currently stands. For those of you that don't know about Warrick, it is a program I wrote that can automatically reconstruct a website that is no longer available on the Web by locating missing web pages from various web repositories like the Internet Archive, Google's cache, etc.
Since creating Warrick about six years ago, a lot has changed:
- The Internet Archive radically changed their web interface in the spring.
- Google deprecated their web search API and beefed up their ability to detect automated queries.
- Microsoft's Bing is now Yahoo's search engine, rendering Yahoo's cache worthless.
These changes have required me to make some radical changes to Warrick in the past, but it's still broken in terms of accessing the Internet Archive. That's why there's been a note on the Warrick website for several months warning about Warrick's current state.
Fortunately, a new development called Memento will help shield Warrick from some of these types of difficulties in working with various web repositories. Memento is an addition to the HTTP protocol which enables easier access to old web pages. If you keep up with this blog, you might remember that I implemented an Android browser a year ago that uses Memento to surf the Web. Warrick can use Memento to find archived web pages much easier than the current method which requires custom code for each web repository.
A PhD student at Old Dominion University, Justin Brunelle, is currently modifying Warrick to make it Memento-compliant. Hopefully Warrick will be up and running again soon. Once it's working, the old Warrick website will be replaced with a more up-to-date version, and it will be open to the public once again.
I appreciate everyone's patience while Warrick is being transformed.
UPDATE
Dec 12, 2011: Justin is still making progress on Warrick. I hope it will be available in a few weeks. I will keep updating this blog post when I know more.
Dec 20, 2011: Justin has given me a beta version of Warrick which I am testing. I hope to make this version available as soon as some documentation is available. Unfortunately, this beta version will require some technical knowledge of how to install Perl libraries and run the tool from the command line. We plan to make Warrick run automatically from our website in the future.
Jan 24, 2012: Warrick 2.0 Beta is now available from Google Code! You can read more about the new version here. Right now Warrick only runs from the command line on *nix systems (Linux and Unix-like systems), but a Windows version is in the works. Work is also being done on a new web interface for less tech-savvy users... I don't have an ETA for it yet.
Mar 6, 2012: Warrick's web interface is now available! That means you can just submit a job and get an email to pick up your recovered website when the job completes. For those of you who are tech savy, you can still download and run Warrick locally on your own machine.