
Become.com actually developed 2 crawlers in 2004- one written entirely in Java and the other mostly Java with some C++. The article states that the crawlers "may be the most sophisticated, massively scaled Java technology application in existence."
The article doesn’t mention anything about Heritrix, a crawler which is also completely written in Java. Although Heritrix doesn’t currently have a distributed architecture, it could still be deployed in such an environment. It would be really interesting to see the two crawlers compete at the National Java Crawling Championships.

No comments:
Post a Comment