Yahoo does not properly report URLs that end in a directory with a slash at the end. For example, the query for "site:privacy.getnetwise.org" will yield the following URLs:
1) http://privacy.getnetwise.org/sharing/tools/ns6
2) http://privacy.getnetwise.org/browsing/tools/profiling
among others. Are ns6 and profiling directories or dynamic pages? You can’t tell by just looking at the URLs… Yahoo strips off the slash (`/`) from the end of URLs that are directories. The only way to tell is to actually visit the URL. URL 1 will return a 301 code (moved permanently) along with the correct URL:
http://privacy.getnetwise.org/sharing/tools/ns6/
URL 2 will respond with a 200 code because it is a dynamic page. This is no big deal for the user looking for search results, but it is a big deal for an application like Warrick which needs to know if a URL is pointing to a directory or not without actually visiting the URL.
I’ve contacted Yahoo about the "problem" but did not receive a response:
http://finance.groups.yahoo.com/group/yws-search-web/message/309
Google and MSN don’t have this problem.
No comments:
Post a Comment