Repeated content

What is Duplicate Content?

Duplicate content refers to identical or very similar content that is available on multiple URLs of a website or different domains. In e-commerce, this is a common problem that can lead to ranking losses and crawl budget waste.

Google's Definition

Google defines duplicate content as "substantive blocks of content within or across domains that either completely match other content or are appreciably similar". This means that not only identical texts, but also very similar content can be considered duplicate content.

Common Causes in E-Commerce

1. Product Variants

Many online shops create separate pages for product variants with identical descriptions:

  • Different colors (e.g. "T-Shirt Red", "T-Shirt Blue")
  • Different sizes
  • Different materials
  • Different manufacturers with identical product descriptions

2. Item descriptions

Direct adoption of manufacturer product descriptions leads to identical content on different shop pages.

3. Category Pages

Similar or identical category descriptions for related product categories.

4. URL Parameters

Different URLs show the same content:

  • product.html?color=red
  • product.html?color=blue
  • product.html?sort=price

Impact on SEO

Negative Consequences

Problem
Impact
Severity
Ranking Losses
Google cannot decide which URL should rank
High
Crawl Budget Waste
Bots crawl identical content multiple times
Medium
Link Equity Dilution
Backlinks are distributed across multiple URLs
High
User Experience
Confusion with identical search results
Medium

Positive Aspects

Duplicate content does not automatically lead to Google penalties, but to suboptimal indexing and ranking distribution.

Detecting Duplicate Content

1. Manual Tools

Google Search Console

  • "Coverage" → "Duplicate content"
  • "HTML improvements" → "Duplicate meta descriptions"

Screaming Frog SEO Spider

  • Crawl analysis for duplicate content
  • Identical title tags and meta descriptions
  • Similar content areas

2. Automated Tools

Copyscape

  • Online duplicate content checker
  • Paid, but very accurate
  • Also checks external domains

Siteliner

  • Internal duplicate content analysis
  • Similarity score
  • Free and premium versions

3. Google Search

Use site operator:

site:your-domain.com "identical text"

Search exact phrases:

"Product description text"

Solution Strategies

1. Primary URL tags

Self-referencing Canonicals

<link rel="canonical" href="https://shop.com/main-product-variant" />

Cross-Domain Canonicals

<link rel="canonical" href="https://original-shop.com/product" />

2. 301 Redirects

Merge product variants:

/product-red → /product (main variant)
/product-blue → /product (main variant)

3. Parameter Handling

Configure Google Search Console:

  • Mark URL parameters as "No URL"
  • For sorting and filters
  • For session IDs and tracking parameters

4. Content Differentiation

Unique product descriptions:

  • Highlight specific product features
  • Integrate customer reviews
  • Describe usage scenarios
  • Mention local availability

Best Practices for E-Commerce

1. Product Page Optimization

Create main product page:

  • One URL for the main product
  • Variants as parameters or dropdown
  • Unique description for each variant

Example structure:

/product/t-shirt-basic
  - Color: Red, Blue, Green (parameters)
  - Size: S, M, L, XL (parameters)
  - Material: Cotton, Polyester (parameters)

2. Category Page Differentiation

Unique category descriptions:

  • Specific product features of the category
  • Local availability
  • Seasonal aspects
  • Target group-specific content

3. Adapt Manufacturer Descriptions

Content adaptation:

  • Use manufacturer text as basis
  • Add own additions
  • Integrate customer reviews
  • Add usage tips

4. Optimize URL Structure

Clean URL hierarchy:

/category/subcategory/product-name

Avoid parameters:

❌ /product?id=123&color=red
✅ /product/t-shirt-basic-red

Technical Implementation

1. JSON-LD Markup

Product Schema with variants:

{
  "@type": "Product",
  "name": "T-Shirt Basic",
  "description": "High-quality cotton t-shirt",
  "hasVariant": [
    {
      "@type": "ProductModel",
      "name": "T-Shirt Basic - Red",
      "color": "Red"
    },
    {
      "@type": "ProductModel", 
      "name": "T-Shirt Basic - Blue",
      "color": "Blue"
    }
  ]
}

2. XML Sitemap

Exclude product variants:

  • Only main product pages in sitemap
  • Don't index variant URLs
  • Exclude parameter URLs

3. Robots.txt

Optimize crawl budget:

# Exclude parameter URLs
Disallow: /*?*
Disallow: /*&*

# Block session IDs
Disallow: /*sessionid=*

Monitoring and Control

1. Regular Audits

Monthly checks:

  • Check Google Search Console for duplicate content
  • Run Screaming Frog crawl
  • Copyscape analysis for critical pages

2. Automated Monitoring

Set up tools:

  • Google Alerts for own content
  • Automated duplicate content checks
  • Ranking monitoring for affected keywords

3. Performance Tracking

Monitor KPIs:

  • Indexing rate
  • Crawl budget distribution
  • Ranking development
  • SEO traffic

Avoid Common Mistakes

❌ Wrong Canonical Implementation

Error:

<!-- Wrong: Canonical points to itself -->
<link rel="canonical" href="https://shop.com/product-variant" />

Correct:

<!-- Right: Canonical points to main variant -->
<link rel="canonical" href="https://shop.com/main-product-variant" />

❌ Indexing Parameter URLs

Problem: Sorting and filter URLs are indexed
Solution: Mark parameters in GSC as "No URL"

❌ Identical Meta Descriptions

Problem: Same meta descriptions for similar products
Solution: Unique descriptions with specific product features

Checklist: Avoid Duplicate Content

Content Strategy

  • ☐ Unique product descriptions for each variant
  • ☐ Adapt and expand manufacturer descriptions
  • ☐ Differentiate category descriptions
  • ☐ Integrate local and seasonal aspects

Technical Implementation

  • ☐ Canonical tags correctly implemented
  • ☐ 301 redirects for old URLs
  • ☐ Parameter handling configured in GSC
  • ☐ Schema.org markup for product variants

Monitoring

  • ☐ Regular duplicate content audits
  • ☐ Monitor Google Search Console
  • ☐ Optimize crawl budget
  • ☐ Track performance metrics

Related Topics