Friday, January 27, 2006

Yahoo Reports URLs with No Slash

Yahoo does not properly report URLs that end in a directory with a slash at the end. For example, the query for "site:privacy.getnetwise.org" will yield the following URLs:

1) http://privacy.getnetwise.org/sharing/tools/ns6
2) http://privacy.getnetwise.org/browsing/tools/profiling

among others. Are ns6 and profiling directories or dynamic pages? You can’t tell by just looking at the URLs… Yahoo strips off the slash (`/`) from the end of URLs that are directories. The only way to tell is to actually visit the URL. URL 1 will return a 301 code (moved permanently) along with the correct URL:

http://privacy.getnetwise.org/sharing/tools/ns6/

URL 2 will respond with a 200 code because it is a dynamic page. This is no big deal for the user looking for search results, but it is a big deal for an application like Warrick which needs to know if a URL is pointing to a directory or not without actually visiting the URL.

I’ve contacted Yahoo about the "problem" but did not receive a response:
http://finance.groups.yahoo.com/group/yws-search-web/message/309

Google and MSN don’t have this problem.