Tuesday, June 20, 2006

Integer problems for the Google API

I’m not sure when it first started, but the Google API has been bombing out over the last few months when returning over 2^31 (2,147,483,648) results for a query. The API has bombed-out almost every day in June when my script searching for “database” and “list” which each return several billion results. Apparently Google’s SOAP interface is using a 32-bit integer for returning the total pages returned, but they need to be using a 64-bit long integer.

Michael Freidgeim made note of the problem on his blog a few weeks ago. Others have noticed this problem going back to April 2006. Who knows when Google will make a fix. If it's not one thing, it's something else... ;)

When searching to see when Google started using the larger total results, I came across a posting by Danny Sullivan that shows how he was attempting to use a “trick” to reveal how many pages Google has indexed. Danny suggested issuing a query that says, “give me all the pages that don’t have the word asdkjlkjasd.” I just tried –asdkjlkjasd on Google, and it gives me back 20.7 billion results. MSN gives around 5.2 billion results, but Yahoo and Ask won’t accept the query. Interesting…