Repeated content
What is Duplicate Content?
Duplicate content refers to identical or very similar content that is available on multiple URLs of a website or different domains. In e-commerce, this is a common problem that can lead to ranking losses and crawl budget waste.
Google's Definition
Google defines duplicate content as "substantive blocks of content within or across domains that either completely match other content or are appreciably similar". This means that not only identical texts, but also very similar content can be considered duplicate content.
Common Causes in E-Commerce
1. Product Variants
Many online shops create separate pages for product variants with identical descriptions:
- Different colors (e.g. "T-Shirt Red", "T-Shirt Blue")
- Different sizes
- Different materials
- Different manufacturers with identical product descriptions
2. Item descriptions
Direct adoption of manufacturer product descriptions leads to identical content on different shop pages.
3. Category Pages
Similar or identical category descriptions for related product categories.
4. URL Parameters
Different URLs show the same content:
product.html?color=redproduct.html?color=blueproduct.html?sort=price
Impact on SEO
Negative Consequences
Positive Aspects
Duplicate content does not automatically lead to Google penalties, but to suboptimal indexing and ranking distribution.
Detecting Duplicate Content
1. Manual Tools
Google Search Console
- "Coverage" → "Duplicate content"
- "HTML improvements" → "Duplicate meta descriptions"
Screaming Frog SEO Spider
- Crawl analysis for duplicate content
- Identical title tags and meta descriptions
- Similar content areas
2. Automated Tools
Copyscape
- Online duplicate content checker
- Paid, but very accurate
- Also checks external domains
Siteliner
- Internal duplicate content analysis
- Similarity score
- Free and premium versions
3. Google Search
Use site operator:
site:your-domain.com "identical text"
Search exact phrases:
"Product description text"
Solution Strategies
1. Primary URL tags
Self-referencing Canonicals
<link rel="canonical" href="https://shop.com/main-product-variant" />
Cross-Domain Canonicals
<link rel="canonical" href="https://original-shop.com/product" />
2. 301 Redirects
Merge product variants:
/product-red → /product (main variant)
/product-blue → /product (main variant)
3. Parameter Handling
Configure Google Search Console:
- Mark URL parameters as "No URL"
- For sorting and filters
- For session IDs and tracking parameters
4. Content Differentiation
Unique product descriptions:
- Highlight specific product features
- Integrate customer reviews
- Describe usage scenarios
- Mention local availability
Best Practices for E-Commerce
1. Product Page Optimization
Create main product page:
- One URL for the main product
- Variants as parameters or dropdown
- Unique description for each variant
Example structure:
/product/t-shirt-basic
- Color: Red, Blue, Green (parameters)
- Size: S, M, L, XL (parameters)
- Material: Cotton, Polyester (parameters)
2. Category Page Differentiation
Unique category descriptions:
- Specific product features of the category
- Local availability
- Seasonal aspects
- Target group-specific content
3. Adapt Manufacturer Descriptions
Content adaptation:
- Use manufacturer text as basis
- Add own additions
- Integrate customer reviews
- Add usage tips
4. Optimize URL Structure
Clean URL hierarchy:
/category/subcategory/product-name
Avoid parameters:
❌ /product?id=123&color=red
✅ /product/t-shirt-basic-red
Technical Implementation
1. JSON-LD Markup
Product Schema with variants:
{
"@type": "Product",
"name": "T-Shirt Basic",
"description": "High-quality cotton t-shirt",
"hasVariant": [
{
"@type": "ProductModel",
"name": "T-Shirt Basic - Red",
"color": "Red"
},
{
"@type": "ProductModel",
"name": "T-Shirt Basic - Blue",
"color": "Blue"
}
]
}
2. XML Sitemap
Exclude product variants:
- Only main product pages in sitemap
- Don't index variant URLs
- Exclude parameter URLs
3. Robots.txt
Optimize crawl budget:
# Exclude parameter URLs
Disallow: /*?*
Disallow: /*&*
# Block session IDs
Disallow: /*sessionid=*
Monitoring and Control
1. Regular Audits
Monthly checks:
- Check Google Search Console for duplicate content
- Run Screaming Frog crawl
- Copyscape analysis for critical pages
2. Automated Monitoring
Set up tools:
- Google Alerts for own content
- Automated duplicate content checks
- Ranking monitoring for affected keywords
3. Performance Tracking
Monitor KPIs:
- Indexing rate
- Crawl budget distribution
- Ranking development
- SEO traffic
Avoid Common Mistakes
❌ Wrong Canonical Implementation
Error:
<!-- Wrong: Canonical points to itself -->
<link rel="canonical" href="https://shop.com/product-variant" />
Correct:
<!-- Right: Canonical points to main variant -->
<link rel="canonical" href="https://shop.com/main-product-variant" />
❌ Indexing Parameter URLs
Problem: Sorting and filter URLs are indexed
Solution: Mark parameters in GSC as "No URL"
❌ Identical Meta Descriptions
Problem: Same meta descriptions for similar products
Solution: Unique descriptions with specific product features
Checklist: Avoid Duplicate Content
Content Strategy
- ☐ Unique product descriptions for each variant
- ☐ Adapt and expand manufacturer descriptions
- ☐ Differentiate category descriptions
- ☐ Integrate local and seasonal aspects
Technical Implementation
- ☐ Canonical tags correctly implemented
- ☐ 301 redirects for old URLs
- ☐ Parameter handling configured in GSC
- ☐ Schema.org markup for product variants
Monitoring
- ☐ Regular duplicate content audits
- ☐ Monitor Google Search Console
- ☐ Optimize crawl budget
- ☐ Track performance metrics