Server-Robots-Header

What is the X-Robots-Tag?

The X-Robots-Tag is an HTTP header that allows website operators to give search engine crawlers precise instructions for indexing and crawling web pages. Unlike Robots-Meta-Tags, which are placed in the HTML head of a page, the X-Robots-Tag is transmitted at server level as an HTTP response header.

Advantages of the X-Robots-Tag

The X-Robots-Tag offers several decisive advantages over conventional Meta-Robots-Tags:

  • Server-Side Control: Works with non-HTML files (PDFs, images, videos)
  • Early Processing: Crawlers receive instructions already during HTTP response
  • Flexibility: Can be set dynamically based on various conditions
  • Reliability: Less susceptible to HTML parsing errors

Syntax and Implementation

Basic Syntax

X-Robots-Tag: [directive1], [directive2], [directive3]

Common Directives

Directive
Description
Use Case
block indexing
Prevents page indexing
Test pages, internal areas
ignore links
Prevents link following
User-generated content
noarchive
Prevents caching
Dynamic content
nosnippet
Prevents snippet display
Confidential content
noodp
Prevents ODP descriptions
Controlled meta descriptions
notranslate
Prevents translations
Language-specific content

Implementation Examples

Apache (.htaccess)

# Single page
<Files "test.html">
    Header set X-Robots-Tag "noindex, nofollow"
</Files>

# Directory-wide
<Directory "/admin">
    Header set X-Robots-Tag "noindex, nofollow"
</Directory>

# File type specific
<FilesMatch "\.(pdf|doc)$">
    Header set X-Robots-Tag "noindex"
</FilesMatch>

Nginx

# Single location
location /admin/ {
    add_header X-Robots-Tag "noindex, nofollow";
}

# File type specific
location ~* \.(pdf|doc)$ {
    add_header X-Robots-Tag "noindex";
}

PHP (Dynamic)

<?php
// Conditional X-Robots-Tag setting
if ($user->isLoggedIn() && $user->isAdmin()) {
    header('X-Robots-Tag: noindex, nofollow');
}

// For specific pages
if (strpos($_SERVER['REQUEST_URI'], '/test/') !== false) {
    header('X-Robots-Tag: noindex');
}
?>

Combinations and Best Practices

Common Combinations

Combination
Purpose
Application Area
noindex, nofollow
Complete exclusion
Admin areas, test pages
noindex, noarchive
No indexing, no cache
Dynamic, time-critical content
nofollow, noarchive
Follow links, but don't cache
External links
noindex, nosnippet
No indexing, no snippets
Confidential documents

Best Practices

  1. Maintain Consistency: Don't use X-Robots-Tag and Meta-Robots-Tags simultaneously for the same directives
  2. Test: Check implementation with tools like Google Search Console
  3. Documentation: Keep track of all X-Robots-Tag implementations
  4. Monitoring: Monitor the impact on crawling behavior

Crawler-Specific Directives

Google-Specific Directives

X-Robots-Tag: googlebot: noindex, nofollow
X-Robots-Tag: googlebot-image: noindex

Bing-Specific Directives

X-Robots-Tag: bingbot: noindex
X-Robots-Tag: msnbot: nofollow

General Crawler Directives

X-Robots-Tag: noindex, nofollow
X-Robots-Tag: googlebot: noindex, bingbot: nofollow

Common Errors and Solutions

Error 1: Duplicate Directives

Problem:

X-Robots-Tag: noindex, noindex, nofollow

Solution:

X-Robots-Tag: noindex, nofollow

Error 2: Wrong Syntax

Problem:

X-Robots-Tag: "noindex, nofollow"

Solution:

X-Robots-Tag: noindex, nofollow

Error 3: Spaces in Directives

Problem:

X-Robots-Tag: no index, no follow

Solution:

X-Robots-Tag: noindex, nofollow

Testing and Checking

Tools for Verification

  1. Google Search Console: Monitor indexing status
  2. HTTP Header Checker: Online tools for header verification
  3. Browser Developer Tools: Network tab for header inspection
  4. cURL Commands: Command line tests

cURL Test Example

curl -I https://example.com/admin/
# Check X-Robots-Tag in response

Monitoring and Analytics

Important Metrics

  • Crawl Rate: How often are protected areas crawled?
  • Indexing Status: Are X-Robots-Tag directives being followed?
  • Error Rate: Are there parsing errors in headers?

Google Search Console Monitoring

  1. Coverage Report: Check that protected pages are not indexed
  2. Crawl Errors: Monitor crawling problems
  3. Sitemaps: Ensure protected URLs are not in sitemaps

Advanced Use Cases

Dynamic X-Robots-Tags

<?php
// Based on User-Agent
$userAgent = $_SERVER['HTTP_USER_AGENT'];
if (strpos($userAgent, 'Googlebot') !== false) {
    header('X-Robots-Tag: noindex');
}

// Based on IP address
$clientIP = $_SERVER['REMOTE_ADDR'];
if (in_array($clientIP, $blockedIPs)) {
    header('X-Robots-Tag: noindex, nofollow');
}
?>

Content Management System Integration

WordPress Platform:

// In functions.php
add_action('send_headers', function() {
    if (is_admin() || is_user_logged_in()) {
        header('X-Robots-Tag: noindex, nofollow');
    }
});

Drupal CMS:

// In template.php
function theme_preprocess_html(&$variables) {
    if (arg(0) == 'admin') {
        drupal_add_http_header('X-Robots-Tag', 'noindex, nofollow');
    }
}

X-Robots-Tag Implementation Checklist

  • [ ] Goal Defined: Which pages should be protected?
  • [ ] Directives Chosen: Which X-Robots-Tag directives are needed?
  • [ ] Server Configuration: Apache/Nginx configured correctly?
  • [ ] Testing Done: Does the implementation work?
  • [ ] Monitoring Set Up: Google Search Console monitored?
  • [ ] Documentation Created: All implementations documented?
  • [ ] Team Informed: All stakeholders know about the changes?

Related Topics