Table of contents
Google can select a different canonical tag than the one which was specified by the user, you may notice this in the Webmaster Tools or Search Console.
Why did Google ignore your web page's canonical URL?
What is the canonical tag?
A canonical URL, according to Google, is the URL of the page on your site that Google believes is the most representative of a group of duplicate pages.
The pages don’t have to be exactly alike; minute adjustments to the list pages’ sorting or filtering don’t make them special (for example, sorting by price or filtering by item color).
A duplicate URL cannot be on the same domain as the canonical URL.
Google determines the canonical page depending on a variety of characteristics (or signals), including the website’s quality, whether it is provided over HTTP or HTTPS, the URL’s inclusion in a sitemap, and any rel=canonical tags.
All duplicate pages have a rel=canonical link> tag added to their HTML code that directs users to the canonical page. To increase the link value of an endless number of link pages, it needs to be included into each one.
Inserting the canonical as an HTTP header is another way to use the same tag. This has the advantage of not growing the size of the page.
On larger websites or websites where the URLs frequently change, it can be difficult to keep the mapping updated.
Even though you can tell Google what you prefer, it might select a different page as the canonical one than you do for a variety of reasons.
Why did Google disregard the canonical tag you used?
Whenever you see the error on Google Search Console that is labeled "alternate page with proper canonical tag", it means that Google has ignored your preferred "rel canonical" tag which you set up as the user or webmaster.
Regardless of the preference stated via the canonical tag placed on the page, there are a number of signals that can affect Google’s decision to select one page as the canonical over the other.
Technicalities, page performance and content relevancy are the three subcategories that they fall under.
Let’s look at a few of them.
Google favors using HTTPS canonicals over HTTP pages.
If one of the pages is provided over HTTPS, it will be prioritized over other versions of the URL unless there are other signals that are in conflict (e.g. invalid SSL certificate, insecure discrepancies, redirects to an HTTP page, or has an existing canonical tag to an HTTP page).
Your sitemap does not contain a reference to your canonical page.
It’s not ideal if your sitemap is out-of-date and has a lot of broken links.
What’s worse is if you have a page that you’ve designated as canonical but that isn’t also mentioned in the sitemap. The worst case scenario is having duplicates of this page in the sitemap that lead to the preferred canonical page but don’t actually exist.
There is a ton of uncertainty and conflicting information concerning Googlebot. This is why it decides for itself.
You’ve used a canonical wildcard.
With sites having many subdomains (for example, excluding defining the www. host-variant), the implemented canonical tag could be a wild-card in terms of structure, causing Google to ignore it.
Avoid using the incorrect host variation when implementing your SSL/TLS certificate. As an illustration, example.com is the server for the www.example.com certificate. The certificate must be a wildcard certificate that works with various subdomains on a domain or match your entire site’s URL.
You haven’t told Google to refrain from looking at dynamic parameters in JS-intensive URLs.
If the URLs contain dynamic parameters, indicating that they should be disregarded can help with duplicate content problems and URL canonicalization mismatches.
Google has opted for a page that better matches the user’s goal at position #5.
In some cases, Google may decide that a page that is more relevant to the user’s search terms should be indexed in search results.
The technical requirements of the user are not met by the canonical page.
Google will display the mobile version of a website if the majority of visitors access it through mobile devices and the website delivers pages in separate mobile, desktop, and AMP versions (even if the desktop one is marked as canonical).
You canonicalized a page that is not indexed.
Noindex and rel=canonical should never be combined because they provide us very different information. The rel=canonical tag will typically be chosen and used instead of the noindex, but if you rely on a computer script’s interpretation, the weight of your input is diminished (and SEO is mostly about informing computer scripts of your preferences).
Search engines are informed by the self-referencing canonical that this is the only version of the page that exists. Search engines are instructed not to index it by the robots noindex tag. But the true problem is not these pages.
In extreme circumstances, you might even have sites that are no-indexed themselves but are designated as a canonical page for pages that are no-indexed. The canonical instructions are sending contradicting signals to the search engines in this case since Page A is essentially pleading with them to index Page B while Page B is pleading with them to ignore me.
The canonical configuration is totally unusable in this environment.
To remove the noindex directive from Page B, first determine if Page B is in fact the right canonical for Page A. Page A should be amended to change the canonical tag, either to become self-referential or to reference to another URL, if Page B should indeed be noindexed.
Check check our Sitebulb tutorial to learn how to resolve this problem. Screaming Frog allows for quick diagnoses (feel free to ask me how in the comments).
These kinds of serious crawlability problems usually need to be handled on a case-by-case basis.
Hopefully, this helps you comprehend Google’s perception of the pages you designate as canonicals.
The URL inspection tool in Google Search Console can be used to determine whether version of the page is the canonical one if it is hosted on your domain. Check out my free Data Studio dashboard design for the Google URL Inspection API if you want to migrate this kind of reporting into Data Studio.
Always ensure that non-canonical pages are still accessible to Google. That entails refraining from using noindex to stop the selection of a canonical page.
Visit the Google Search Central article on duplicate content for more reading on the subject.