Tuesday, January 10, 2006

Google Is Sorry

Google has been really confusing some of its users recently with their “Google is sorry” web page. The page reads like this:

We're sorry... but we can't process your request right now. A computer virus or spyware application is sending us automated requests, and it appears that your computer or network has been infected. We'll restore your access as quickly as possible, so try again soon. In the meantime, you might want to run a virus checker or spyware remover to make sure that your computer is free of viruses and other spurious software. We apologize for the inconvenience, and hope we'll see you again on Google.

It appears this page started appearing in mass around Nov-Dec of 2005. There are many discussions about it in on-line forums. Here are 2 of them that garnered a lot of attention:

  1. Webmasterworld.com
  2. Google groups

I ran into the error when modifying Warrick to use the “site:” parameter in order to better reconstruct a website. Unfortunately I had to drop the feature, and although I’m still making automated queries, I’ve yet to see the page again.

Google appears to be mum about the whole thing. The most credible explanation I found was here:


Apparently it is a new "feature" of Google that is getting back at bandwidth-hogging SEOs that use automated queries with "site:" or "allinurl:" in them. Their IA is a little over-zealous and is hurting the regular human user and the user like me who is performing very limited daily queries for no financial gain.

Update on 3/8/2006:

Google has caught me again! Although my scripts ran for a while without seeing the sorry page, they started getting caught again in early Feb. I conversed with someone at Google about it who basically said sorry but there is nothing they can do and that I should use their API.

The Google API is rather constrained for my purposes. I've noticed many API users venting their frustrations at the inconsistent results returned by the API when compared to the public search interface.

I finally decided to use a hybrid approach: page scraping when performing "site:" queries and the API to access cached pages. I haven't had any trouble from Google since.