Thursday, July 10, 2008

Yahoo's new Search BOSS API

Yahoo has just released a new web search API called BOSS (Build your Own Search Service) which improves on their earlier API in several ways:
  1. No daily query limits.

  2. No restrictions on how the results are displayed, ordered, or mixed in with other proprietary results.

  3. Ability to make money showing paid results.

The BOOS API is REST-based. You can receive results in either JSON or XML format, and you can get 10-50 results back per query.

There is one item that appears to be missing without explanation: the cached URL of each search result. This URL is useful to the user when the result's live URL is not responding. The old Yahoo web search API did provide this, so I'm not sure why it dropping in Boss.

One thing that makes me a little nervous about the API from a researcher's perspective is the prohibition in their Terms of Service against analyzing their search results:
You will not, will not attempt, or will not permit or take actions designed to enable other third parties to: ... perform any analysis, reverse engineering or processing of the Web Search Results
Analyzing the Yahoo search results is exactly what I did in my paper Agreeing to Disagree: Search Engines and their Public Interfaces. Well, better to do and ask forgiveness than get permission up front. ;-)

So here's a simple example in Java using the new BOSS API to search for the title of my blog "questio verum", the index status of my blog's root page, and all the pages indexed for my blog. To make this example work for you, simply put your Yahoo API key in API_KEY.

Note that this example is very similar to the Google AJAX example in Java from last month.

import org.json.JSONArray; // JSON library from
import org.json.JSONObject;

public class YahooQuery {

// Yahoo API key
private final String API_KEY = "Your Key Here";

public YahooQuery() {

makeQuery("questio verum");

private void makeQuery(String query) {

System.out.println("\nQuerying for " + query);

// Convert spaces to +, etc. to make a valid URL
query = URLEncoder.encode(query, "UTF-8");

// Give me back 10 results in JSON format
URL url = new URL("" + query +
"?appid=" + API_KEY + "&count=10&format=json");
URLConnection connection = url.openConnection();

String line;
StringBuilder builder = new StringBuilder();
BufferedReader reader = new BufferedReader(
new InputStreamReader(connection.getInputStream()));
while((line = reader.readLine()) != null) {

String response = builder.toString();

JSONObject json = new JSONObject(response);

System.out.println("Total results = " +


JSONArray ja = json.getJSONObject("ysearchresponse")

for (int i = 0; i < ja.length(); i++) {
System.out.print((i+1) + ". ");
JSONObject j = ja.getJSONObject(i);

catch (Exception e) {
System.err.println("Something went wrong...");

public static void main(String args[]) {
new YahooQuery();

Running this program produces the following results:

Querying for questio verum

Total results = 13600

1. Questio Verum
2. WikiAnswers - What does questio verum mean
3. Questio Verum: URL Canonicalization
4. Questio Verum: WIDM 2006
5. Questio Verum: Fav5
6. Questio Verum: Fav5
7. Questio Verum: August 2006
8. Profile for Questio Verum
9. Questio Verum: JCDL 2007 - day 2
10. Questio Verum: OA debate - Eysenbach and Harnad

Querying for url:

Total results = 1

1. Questio Verum

Querying for

Total results = 4080

1. Questio Verum
2. Questio Verum: OA debate - Eysenbach and Harnad
3. Questio Verum: JCDL 2007 - day 2
4. Questio Verum: No singles here
5. Questio Verum: Pledge Week and Insults
6. Questio Verum: WIDM 2006
7. Questio Verum: Fav5
8. Questio Verum: August 2006
9. Questio Verum: Fav5
10. Questio Verum: Fav5

Thanks, Martin, for the head's up on this.

Update on 7/28/2008:

The missing cached URL feature is apparently coming soon.