Thursday, July 10, 2008

Yahoo's new Search BOSS API

Yahoo has just released a new web search API called BOSS (Build your Own Search Service) which improves on their earlier API in several ways:
  1. No daily query limits.

  2. No restrictions on how the results are displayed, ordered, or mixed in with other proprietary results.

  3. Ability to make money showing paid results.

The BOOS API is REST-based. You can receive results in either JSON or XML format, and you can get 10-50 results back per query.

There is one item that appears to be missing without explanation: the cached URL of each search result. This URL is useful to the user when the result's live URL is not responding. The old Yahoo web search API did provide this, so I'm not sure why it dropping in Boss.

One thing that makes me a little nervous about the API from a researcher's perspective is the prohibition in their Terms of Service against analyzing their search results:
You will not, will not attempt, or will not permit or take actions designed to enable other third parties to: ... perform any analysis, reverse engineering or processing of the Web Search Results
Analyzing the Yahoo search results is exactly what I did in my paper Agreeing to Disagree: Search Engines and their Public Interfaces. Well, better to do and ask forgiveness than get permission up front. ;-)

So here's a simple example in Java using the new BOSS API to search for the title of my blog "questio verum", the index status of my blog's root page, and all the pages indexed for my blog. To make this example work for you, simply put your Yahoo API key in API_KEY.

Note that this example is very similar to the Google AJAX example in Java from last month.

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.net.URLEncoder;
import org.json.JSONArray; // JSON library from http://www.json.org/java/
import org.json.JSONObject;

public class YahooQuery {

// Yahoo API key
private final String API_KEY = "Your Key Here";


public YahooQuery() {

makeQuery("questio verum");
makeQuery("url:http://frankmccown.blogspot.com/");
makeQuery("site:frankmccown.blogspot.com");
}

private void makeQuery(String query) {

System.out.println("\nQuerying for " + query);

try
{
// Convert spaces to +, etc. to make a valid URL
query = URLEncoder.encode(query, "UTF-8");

// Give me back 10 results in JSON format
URL url = new URL("http://boss.yahooapis.com/ysearch/web/v1/" + query +
"?appid=" + API_KEY + "&count=10&format=json");
URLConnection connection = url.openConnection();

String line;
StringBuilder builder = new StringBuilder();
BufferedReader reader = new BufferedReader(
new InputStreamReader(connection.getInputStream()));
while((line = reader.readLine()) != null) {
builder.append(line);
}

String response = builder.toString();

JSONObject json = new JSONObject(response);

System.out.println("\nResults:");
System.out.println("Total results = " +
json.getJSONObject("ysearchresponse")
.getString("deephits"));


System.out.println();

JSONArray ja = json.getJSONObject("ysearchresponse")
.getJSONArray("resultset_web");

System.out.println("\nResults:");
for (int i = 0; i < ja.length(); i++) {
System.out.print((i+1) + ". ");
JSONObject j = ja.getJSONObject(i);
System.out.println(j.getString("title"));
System.out.println(j.getString("url"));
}

}
catch (Exception e) {
System.err.println("Something went wrong...");
e.printStackTrace();
}
}

public static void main(String args[]) {
new YahooQuery();
}
}


Running this program produces the following results:


Querying for questio verum

Total results = 13600

Results:
1. Questio Verum
http://frankmccown.blogspot.com/
2. WikiAnswers - What does questio verum mean
http://wiki.answers.com/Q/What_does_questio_verum_mean
3. Questio Verum: URL Canonicalization
http://frankmccown.blogspot.com/2006/04/url-canonicalization.html
4. Questio Verum: WIDM 2006
http://frankmccown.blogspot.com/2006/11/widm.html
5. Questio Verum: Fav5
http://frankmccown.blogspot.com/2007/09/fav5_29.html
6. Questio Verum: Fav5
http://frankmccown.blogspot.com/2007/12/fav5.html
7. Questio Verum: August 2006
http://frankmccown.blogspot.com/2006_08_01_archive.html
8. Amazon.com: Profile for Questio Verum
http://www.amazon.com/gp/pdp/profile/A2Q6CLLQPXG55A
9. Questio Verum: JCDL 2007 - day 2
http://frankmccown.blogspot.com/2007/06/jcdl-2007-day-2.html
10. Questio Verum: OA debate - Eysenbach and Harnad
http://frankmccown.blogspot.com/2006/05/oa-debate-eysenbach-and-harnad.html


Querying for url:http://frankmccown.blogspot.com/

Total results = 1

Results:
1. Questio Verum
http://frankmccown.blogspot.com/


Querying for site:frankmccown.blogspot.com

Total results = 4080

Results:
1. Questio Verum
http://frankmccown.blogspot.com/
2. Questio Verum: OA debate - Eysenbach and Harnad
http://frankmccown.blogspot.com/2006/05/oa-debate-eysenbach-and-harnad.html
3. Questio Verum: JCDL 2007 - day 2
http://frankmccown.blogspot.com/2007/06/jcdl-2007-day-2.html
4. Questio Verum: No singles here
http://frankmccown.blogspot.com/2007/08/no-single-here.html
5. Questio Verum: Pledge Week and Insults
http://frankmccown.blogspot.com/2007/10/pledge-week-and-insults.html
6. Questio Verum: WIDM 2006
http://frankmccown.blogspot.com/2006/11/widm.html
7. Questio Verum: Fav5
http://frankmccown.blogspot.com/2007/09/fav5_29.html
8. Questio Verum: August 2006
http://frankmccown.blogspot.com/2006_08_01_archive.html
9. Questio Verum: Fav5
http://frankmccown.blogspot.com/2007/06/fav5.html
10. Questio Verum: Fav5
http://frankmccown.blogspot.com/2007/12/fav5.html


Thanks, Martin, for the head's up on this.

Update on 7/28/2008:

The missing cached URL feature is apparently coming soon.