Duplicate Content

Duplicate content (also abbreviated as DC) refers to web content that can be accessed in identical form under different URLs on the Internet.

Duplicate content, also known as “duplicated content,” is content from different websites that is very similar or completely the same. Search engines like Google try to prevent duplicate content and may downgrade websites that use (too much) duplicate content in their index. Especially if there is a suspicion of manipulation (for SEO purposes), pages with copied content can suffer ranking losses or even deindexation.

Why is duplicate content bad?

Search engines evaluate duplicate content as negative, as it does not provide any added value to the user. Nevertheless, every website has to be crawled as well as indexed and thus consumes resources.

Since webmasters often filled websites with duplicate content in the past (also for SEO purposes), Google started to take action against content used more than once. With algorithm changes such as the Panda update, the search engine provider ensured that pages with duplicate content were downgraded in the ranking.

What helps against duplicate content?

Duplicate content usually does not immediately lead to a downgrade by the search engine. However, since there is a risk that duplicate content will be evaluated negatively and will no longer be indexed, website operators should observe some important measures with which they can avoid duplicate content:

301 redirects

A redirect with a 301 code is useful to always lead the search engine and the reader to the desired page and thus skip old content. If, for example, a page is completely replaced by another one – with a different URL – (e.g. in the case of a relaunch), 301 forwarding is a good solution. This way, there are not two pages with identical content, but the visitor is led directly to the second, matching page, even if he selects the URL of one page.

Google sees this redirection as unproblematic. However, to make it as user-friendly as possible, webmasters should only redirect to pages that are an appropriate replacement for the original page.

Ensure the use of correct URLs

To prevent duplicate content, the use of correct URLs is particularly important. Google itself advises, for example, to always pay attention to the consistency of URLs, i.e. to use web addresses consistently. For example, always use only one version: www.example.com/name or www.example.com/name/ or www.example.com/name/index.htm.

Website operators should also use Webmaster Tools to specify the preferred address of a page: http://www.example.com or http://example.com, etc. The canonical tag (see below) can also help here to identify the correct page.

Google also advises using top-level domains to better specify content. For example, webmasters should better use www.beispiel.de instead of URLs such as de.beispiel.com.

Many content management and tracking systems can inadvertently produce duplicate content by rearranging page URLs. By pagination or by creating archives, the CMS may change the URL of a page (for example: example.com/text/022015 instead of example.de/text) and thus the website exists under different URLs. The same applies to (automatically generated) tracking parameters that create a URL snippet that is appended to the original URLs. If the search engine does not detect these snippets correctly, it may recognize the tracking as a new URL and count the page twice. Webmasters and SEO experts should therefore check their CMS and analytics system for these vulnerabilities.

Minimize duplicate content

Website operators should avoid duplicate content as much as possible and produce unique content. On many pages, individual text modules must or should be used redundantly, and occasionally even the duplication of complete pages cannot be ruled out. However, webmasters should limit this as far as possible and, if necessary, point out to the search engine via a link in the HTML code that a page with the same content already exists.

In addition to self-generated duplicate content, it can also happen that other websites produce duplicate content – when a website operator passes on/sells its content to different websites or other websites use the content without permission. In both cases, if the incident is known, website operators should request the operator of the other site to mark copied content with a backlink to the original content or the noindex tag. This way, the search engine can recognize which is the original content and which content it should index.

Use Canonical/href lang/noindex tag or robots.txt disallow

With the help of various tags (in the source code), certain forms of duplicate content can be prevented. The canonical tag in the <head> area, for example, signals Google to index the page to which the tag points. The crawler, however, should neglect the copy of this page (in which the tag is integrated).

The noindex meta tag is used to tell the search engine that it should crawl the page but refrain from indexing it. Unlike the disallow entry in robots.txt, the webmaster thus allows the Googlebot to crawl the page and its content.

Disallow can be used in the robots.txt file to protect entire pages, page types or content types from crawling and thus also from indexing by Google and Co. The robots.txt is a file that regulates which content may be captured by the crawler of a search engine and which may not. Disallow says that the search engine has no access to the defined content.

The href lang tag can be used to indicate to search engines that a page is simply a translation of a domain in another language. For example, if a domain exists under both .co.uk for the UK market and .com for the US market, the href lang tag signals that it is an offshoot of the other page, preventing the search engine from evaluating the pages as duplicate content.

Duplicate content can become a problem for webmasters and SEO experts, as search engines are reluctant to spend resources on duplicate content. At the same time, Google wants to provide unique content to its users. As a result, DC can be considered negative and, in the worst case, the page can be downgraded in the ranking or, if manipulation is suspected, even deindexed. Website operators have various options to prevent or eliminate duplicate content – including clean redirects, tags in the source code and unique texts.


SEO-CONTENT ✔️ Blog Content ✔️ SEO Content Writing ✔️ Article Writing ✔️