
In this post
Duplicate content is an often overlooked SEO issue that can quietly hurt your rankings and traffic. To help you avoid that fate, in this post I’ll walk you through what duplicate content is, how it happens, how to identify it on your website, and the exact steps to clean it up, including duplication issues caused by translating your website.
What’s Duplicate Content (And Why Does It Matter)?
Duplicate content refers to blocks of text that appear in more than one place on the internet in identical or near-identical form. This can be either on multiple pages of your own site (internal duplicate content) or across several domains (external duplicate content).
It’s a problem for SEO because search engines like Google want to rank distinct content. If their top search results all contained the same text, it wouldn’t be very helpful for their users, would it?
Therefore, when the same content shows up in multiple URLs, search engines may struggle to decide which version to index and rank. As a consequence, they may choose neither.
That means, if you have pages with duplicate content on your site, they may hinder each other’s search visibility, leading to lower rankings, fewer indexed pages, and reduced authority. Backlinks to your site may also go to the page version you’re not trying to rank, diluting their positive influence. Plus, internal duplication also wastes crawl budget that could go to more important pages.
Much of that applies to external duplicated content as well. Here, you have the added risk that stolen content may end up ranking higher than your original. Or, if you copy content from someone else, it can hurt your website as a whole. In rare cases, Google may even penalize websites that copy and use content in a deceptive manner.
Where Does Duplicate Content Come From?
Duplicated content problems can arise for many reasons, many of them technical.
Using Different Domains for Your Website
One common cause is if your site is accessible through more than one domain. Google considers all of the following separate websites:
- http://example.com/
- http://www.example.com/
- https://example.com/
- https://www.example.com/
So, if you don’t clearly specify if your site uses “www”, HTTP, or HTTPS, you essentially create different site versions containing the same identical content.
This also applies if you use a separate subdomain for the mobile version of your site like so:
- https://example.com/
- https://m.example.com/
The same is true to staging and development sites you might accidentally leave discoverable for search engines and may life at separate URLs such as:
- https://dev.example.com/
- https://staging.example.com/
Including URL Parameters
Another factor are URL parameters, which e-commerce websites often use to filter product variations:
- http://example.com/shoes/
- http://example.com/shoes/?color=red
These, too, if set up to be indexed, can show similar content but different addresses.
Same if you allow search engines to index result pages for search requests on your website.
Order matters here as well. Consider these two addresses:
- http://example.com/shoes/?color=red&size=xl
- http://example.com/shoes/?size=xl&color=red
These will display the same products but represent distinct URLs for search engines.
Similar cases that lead to slightly different URLs with duplicate content are printer-friendly versions of pages, session IDs, or UTM codes.
Content Duplication Caused by CMS
CMS platforms sometimes generate duplicate pages through archives. WordPress creates a great number of them for categories, tags, authors, and dates. But if you only have one author publishing on your site, your blog page and the author archive will contain the exact same posts.
Pagination can also be an issue where content is split into several pages with separate URLs. For example, in WordPress, there’s an option for comment pagination that creates URLs like this:
- https://example.com/sample-post/
- https://example.com/sample-post/comment-page-2/
- https://example.com/sample-post/comment-page-3/
The post body will be the same for each page, only the comments will differ.
Manually Copying Content
Of course, it’s also possible to create duplicate content by hand—accidentally or on purpose.
For example, you may clone a post to update it and publish it separately by accident. Or, you engage in content syndication, meaning publishing the same article in different places for exposure.
Many bloggers re-post their articles on Medium, to get access to its audience. If not marked properly, this may appear as duplicate content to search engines.
But it can also be much more harmless, like reusing boilerplate product descriptions provided by the manufacturer or re-using the same text in different business directories.
Another options is that you have service pages for different locations because your business operates in more than one area. Your services and offers stay the same, only the location mentioned on the page changes, which leaves the rest of the content very similar.
Finally, as mentioned, sometimes people simply copy someone else’s content and post it on their own site (not that you would ever engage in that).
Incorrectly Translating Web Content
As a last factor, multilingual websites are especially at risk for duplicate content. Translating your site basically means posting the same content on multiple pages, just in a different language.
This is compounded if you’re catering to different markets that share a language, such as Spain, Mexico, and Argentina. In that case, you may have three pages with the same content:
- http://www.example.com/es/
- http://www.example.com/mx/
- http://www.example.com/ar/
If you don’t mark them as targeted at different regions, Google may punish you.
How to Find Repeated Content on Your Site
The first step to fix identical content on your site is to find if there is any and where it’s hiding.
A simple method for that is to compare the number of pages you created to the number of pages in Google’s index. For that, consult Indexing > Pages in Google Search Console.
If there are way more pages in the index, they are being added from somewhere. Search Console also shows duplicate content issues for pages it hasn’t indexed in case there are any.
Outside of that, you can use a duplicate content checker like Siteliner. The tool scans your site and highlights internal duplicates quickly.
Another option is Screaming Frog SEO Spider. It can crawl your site and flag duplicate titles, meta descriptions, and content blocks.
SEO tools like Semrush and Ahrefs will also tell you of similar problems.
To find external duplicate content, use Copyscape or Duplichecker.
Alternatively, copy distinct phrases from your popular content and input them into Google with quotation marks around them. This will show up pages in the index that use identical phrasing
6 Ways to Fix Duplicate Content Issues
Once you’ve identified problematic pages, there are several proven ways to address the issue. The right solution depends on what’s causing it in the first place.
1. Stop Producing It
If there is a clear underlying root cause to your site’s SEO issues, the most logical step is to simply address it:
- Settle on one website domain format and redirect the others (more on that next)
- Use responsive design for mobile users instead of a separate domain
- Make sure your staging and development websites aren’t discoverable and indexable
- Disable comment pagination in WordPress
- Switch off certain archives on your site and/or display excerpts instead of full posts
- Make sure URL parameters are always in the same format
- Remove session IDs from your URLs and use cookies instead
- Get rid of printer-friendly pages and switch to a print style sheet
2. Use 301 Redirects
A redirect tells browsers and search engines that a page has moved to a new URL. “301” refers to redirects that tell search spiders that the move is permanent (as opposed to 302 redirects, which denote only a temporary move). Permanent redirects pass nearly all of the original page’s SEO value to the new location, helping preserve rankings and traffic.
Redirects are best for duplicate content where you don’t need to keep one version, for example, if you’re switching from the www to non-www version of your site (or vice versa) or from HTTP to HTTPS.
Aside from that, use 301 redirects to fix duplicate content when two pages serve the same purpose and you only want to keep one live, for example:
- https://example.com/about-us/
- https://example.com/about/
If both are indexed, choose one and delete and redirect the other. You can do the same to clean up outdated URLs or consolidate multiple similar pages into one strong, canonical page.
Redirects are at the server level, e.g. via .htaccess or NGINX config. They look like this:
Redirect 301 /about-us https://example.com/about
WordPress users can use a plugin like Redirection. Most SEO plugins also include features to set up redirects.
Avoid redirect chains and internal links to redirected URLs. Plus, always test your redirects with the URL inspection tool in Search Console to ensure they’re working as expected.
3. Set Up Canonical Tags
A canonical tag tells search engines which version of a page is the “master copy” you would like it to index and rank. This is essential when you have multiple URLs that show the same or very similar content but where you can’t get rid of the duplicates, e.g. in the case of URL parameters or pagination.
Canonical tags looks like this:
<link rel="canonical" href="http://yourdomain/page-you-want-indexed">
They belong in the <head>
section of both the page you want to rank and its duplicates. Every duplicate should point to the main page and, which should also point to itself (this is called self-referencing).
In WordPress, SEO plugins like Yoast and Rank Math make it easy to set canonical URLs for each page.
Like redirects, verify your canonicals using Google Search Console after implementation.
4. Take Advantage of Noindex Tags
A noindex tag tells search engines not to include a specific page in their index, even if they can crawl it. This is useful when a page doesn’t provide unique value and could trigger a duplicate content issue, like tag archives. Noindex tags look like this:
<meta name="robots" content="noindex">
They, again, belong in the page <head>
section or as an HTTP response header. Alternatively, most SEO plugins allow you to set individual pages or entire content types to noindex without touching code.
You can also use it together with canonical tags to guide search engines toward your most important content. In addition, set pages to noindex instead of blocking search engine crawlers from accessing them via robots.txt.
5. Rewrite Content
When two or more pages are similar in topic but need to stay online, the best solution is to update their content to make each page unique. A typical case is when duplicate content arises from similar service or product pages.
Rewriting doesn’t mean changing every word—focus on adding unique value, perspective, or examples to differentiate the pages. Use clear, original headings and tailor the content to specific use cases, locations, or audiences if possible.
In short, create original, fresh and authoritative content.
If rewriting isn’t possible and both pages offer little value, consider merging them into one stronger resource. Don’t forget to redirect!
6. Add Hreflang Tags to Translated Pages
The solution to duplicate content due to different language versions are hreflang tags. They tell search crawlers which language and geographical location a web page should appear in search for. You can place them in the <head>
section of your pages, HTTP headers, or an XML sitemap and they look like this:
<link rel="alternate" href="https://example.com/en" hreflang="en-us" />
An hreflang generator makes them easier to generate.
Hreflang tags can reference both a language and locale (though just a language is enough). They allow you to signal to search engines that pages with the same translated content are aimed at particular regions:
<link rel="alternate" href="https://example.com/ar" hreflang="es-ar" />
<link rel="alternate" href="https://example.com/mx" hreflang="es-mx" />
This works across domains, if you have a separate one for different language versions:
<link rel="alternate" href="https://example.de" hreflang="de-de" />
<link rel="alternate" href="https://example.br" hreflang="pt-br" />
All available language versions need to have hreflang links to every available alternative, including themselves. Otherwise Google may ignore them.
7. Tackle External Copied Content
Finally, what do you do with external duplicate content that someone else stole and posted on their site?
Google is generally quite good at figuring out the original source of content but not always. If this is an issue for you, the first step is to consider adding a snippet to your RSS feed that links back to your site.
Scrapers often use RSS to automatically harvest content. When you add a link, they will scrape the link as well, letting Google know where the original is. It’s not foolproof but easy to do and can have a big impact. Yoast SEO makes it easy to do this.
As a next step, you can contact the owners of websites using your content and ask them to take it down. This can be enough because people want to avoid trouble.
If not, your last resort is to issue a DMCA (Digital Copyright Millennium Act) takedown through Google’s content report tool.
It will take Google a while to process the request, but if successful, it will remove the duplicated content from search results.
Avoid Duplicate Multilingual Content With TranslatePress
Are you looking for a way to streamline translating your WordPress site without duplicate content issues? Here’s how TranslatePress helps you do so.
Automatically Implement Hreflang Tags
As mentioned, your most important tool to avoid duplicate content on multilingual websites are hreflang tags. TranslatePress implements them for you automatically. All you have to do is configure your site’s default and target languages under Settings → TranslatePress.
Save your choices and TranslatePress adds all relevant hreflang tags to the HTML of every language version of your web pages as well as your XML sitemap.
The plugin can differentiate between regional versions of languages, such as Brazilian Portuguese, Mexican Spanish, or Swiss French and includes both the language and locale in the tags.
Translate Your Website With AI
In addition, you can translate your website quickly using TranslatePress AI. It’s part of every TranslatePress license, and you can use it with credits available through your account.
With the default and at least one target language in place, head over to Automatic Translation. Use the drop-down menu to enable machine translation, then save your changes at the bottom.
TranslatePress will immediately start converting your website to your desired language(s). By the time you go to your site’s front end and use the language switcher, the translation is probably already complete.
TranslatePress AI uses several sources for its machine translation and automatically picks the most accurate for your language pair. It also translates all WordPress content, including text coming from plugins and themes.
Use Additional Translation Options
If you want, you can also use Google Translate or DeepL as your translation engine. For that, change the setting in Automatic Translation under Alternative Engines. Note that you may have to set up an API and that DeepL is reserved for TranslatePress Pro.
Aside from that, you have the option to refine translations by hand. For that, just click on Translate Site/Page in the TranslatePress settings or the WordPress admin bar.
In the translation interface, pick content from the preview window and enter or edit the translation in the dedicated field.
You can use the same process to translate your entire website manually as well as set up translated or localized images.
Additional TranslatePress Features
Besides automatic hrelang tags, TranslatePress helps you further optimize your multi-language website for search engines by creating multilingual sitemaps. It also works with most of the popular WordPress SEO plugins.
The Pro version comes with the SEO pack that lets you convert your page URLs, SEO titles, meta descriptions, and other important SEO markers to other languages. It also offers many more useful features such as automatic user language detection, translator accounts, and navigation based on language.
You can start off with the free version of TranslatePress for one additional language. TranslatePress Pro offers three pricing tiers so you can find the right option for you.
Solve Duplicate Content Issues on Your Multilingual Site
Duplicate content can seriously impact your website’s SEO performance, but with the right approach, it’s entirely manageable.
The key is to first identify duplicate content on your site, understand what’s causing it, and then apply the appropriate solution. There are many tools at your disposal.
Duplicated content is something a lot of people are afraid of before going multilingual too. Our support team geats a lot of pre-sale quations on this topic from people trying to decide on the translation plugin they want to go with.
But, as shown above, if this is one of your concerns, TranslatePress is your safest pick.