X-Robots-Tag Directive in the HTTP Header

Posted by John W. Furst on Thursday, December 13, 2007

I did not write about Web Development for quite a while. After reading an article about the use of X-Robot Tags today, I though it's time, again.

The Basics

You retrieve content from the Web by typing a Web address into your browser and the addressed Web server will send you the requested resource, like a regular HTML Web page, a PDF document, a JPEG image, a video or Flash Movie, a XML file, etc.

The browser and Web server communicate using he HTTP (Hypertext Transfer Protocol ↑) and before the requested data is actually sent, they exchange HTTP Request and HTTP Response headers with information about the document, the browser, and the server. The X-Robots-Tag Directive is an optional element (a directive) of such a HTTP Response Header. It was introduced by Google this year. Now, Yahoo has announced 2 weeks ago, that they support it, too.

You might recall that there is a (X)HTML Meta Tag that allows to restrict the access control for search engine. But they only work for (X)HTML documents. Now the X-Robots-Tag Directive allows the same for any non-(X)HTML resource like video-, audio-files, images, etc.

The X-Robots-Tag Directive explained

The following commands are supported:

X-Robots-Tag: noindex
The document will not show up in the search results.
X-Robots-Tag: noarchive
The document will be indexed, but not cached. I.e. no local copy at the search engine.
X-Robots-Tag: nosnippet
No summary of the document in the search results page.
X-Robots-Tag: nofollow
Links in the document will not be indexed (nofollow).
X-Robots-Tag: unavailable_after: 31 Dec 2007 16:30:00 GMT

The attributes are case-insensitive. “NOFOLLOW” is the same as “nofollow”. You can combine values in one line, e.g.

X-Robots-Tag: noarchive, nosnippet

The configuration for the HTTP header depends on your server. With Apache you can configure the X-Robots-Tag Directive in the main configuration files or within .htaccess files in each directory. Most Web authors might not have access to these configuration options. I don't go into the details, but misconfiguration can mess up your Web site completely. That leaves the X-Robots-Tag Directive as a tool for the advanced folks.

Another mechanism for controlling search engines access are the (X)HTML Robots Meta Tags. They share the same attributes as the X-Robots-Tag Directive. It looks like this:

<meta name="robots" content="noarchive">
<meta name="robots" content="nosnippet">
<meta name="robots" content="nofollow">
<meta name="robots" content="noindex">

Now you understand why the X-Robots-Tag has been introduced by Google. The (X)HTML Meta Tags work only for (X)HTML documents.

However there is another standard that is widely used: The robots.txt file. Its simple syntax allows you to exclude directories and files from being indexed. As an alternative to using the X-Robots-Tag Directive you can setup particular Web directories and put all your PDF, Audio, Video files that you don't want to have indexed there.

Depending on your Web host you might have access to the robots.txt file or the HTTP Header configuration via .htaccess files. No matter what. Now you have the choice, even with Yahoo.

Neither the X-Robots-Tag Directive, the (X)HTML Meta Tags, nor the robots.txt file will improve your rankings, but they make sure that you have a choice about what is being indexed.

Resources

Official Google Blog: Answering more popular picks: meta tags and web search (↑)
Official Google Blog: Robots Exclusion Protocol: now with even more flexibility (↑)
Yahoo! Search Support for X-Robots-Tag Directive to Simplify Webmaster's Control and Weather Update (↑)

Yours
John W. Furst

Trackbacks

Trackback specific URI for this entry

No Trackbacks

Comments

Display comments as Linear | Threaded

No comments

Comments are closed.
However, if you want to tell me something, drop me a line. Contact Us link in the footer.