Monday, February 16, 2009

Specifying canonical URLs

Last week the big three search engines (Google, Yahoo, and Live Search) announced their support for a new HTML attribute value which will help prevent search engines from indexing duplicate content. Search engines naturally want to avoid crawling and indexing duplicate content because it lessens the quality of search result pages. Google's Webmaster Central Blog has a good write-up about the new rel="canonical" attribute value.

Essentially, the new attribute value will allow a webmaster to tell a web crawler to ignore a page if it is accessible from another URL. So if a I have a single page that is accessible at URLs A, B, and C, I can tell the web crawler that URLs B and C are pointing to the same content as A by placing the following code in the head element of the page:

<link rel="canonical" href="http://foo.com/A" />

When the web crawler grabs the pages using URLs B or C, it will find the given canonical URL A in the header and therefore ignore the contents of the pages since they duplicate page A.

Of course the entire mechanism requires a willing and competent webmaster to implement it. Webmasters who are very concerned about SEO are likely to use it since it will help bolster the PageRank of certain pages. But the rest of us who aren't concerned about our rankings can safely ignore this new functionality.

See also rel="nofollow".