It was announced earlier this month that, as of the 1st September 2019, Google will no longer support ‘noindex’ within your robots.txt files. Webmasters are currently being notified to remove it and rely on other methods.
But what does this mean for business and SEO webmasters? Essentially, if you are solely relying on the ‘noindex’ inside your robots.txt file to prevent specific pages from being crawled and indexed on the Google SERPS, THIS WILL NO LONGER WORK.
What you should do to prepare for the Google change
You may now be wondering how to prevent content from being indexed on Google and other search engines going forward?
Well don’t panic. Here are 5 ways (also recommended by Google) to prevent pages from being indexed outside of the traditional robots.txt approach, and some guidance on how to implement them,
1. Amend your meta tags
You can prevent a specific page from being indexed by using the ‘noindex’ operative within the <meta> tag.
This works with most search engines and all you need to do is place the following line of code inside your HTML <head> tags. Web Developers can help if you need technical advice.
<meta name="robots" content="noindex">
2. Use Disallow in robots.txt
The disallow attribute in the robots.txt file will still work as it should.
Google and other search engines only index pages they can see. Therefore, if access has been blocked, they will be unable to index them.
MOZ have a handy guide on writing the perfect robots.txt file which will show you how to add pages to the disallow list.
3. HTTP 404 and 410 status codes
These two status codes mean that the content doesn’t exist, or it cannot be found by the server. Google only wants to deliver functional pages. Subsequently, any 404 and 410 status pages will be dropped by the crawlers and not indexed.
There is nothing to action here, but just a little reminder that Google wont index pages that are not found. So, if you don’t require the content, deleting it will ensure it isn’t indexed.
5. Password protected pages
Any pages that require a password or sit behind a login page will not be indexed. If Google can’t get to them, they won’t appear on search results pages.
Again, there is nothing to change here, you just don’t need to actively noindex these pages anymore.
6. Use the removal tool in Googles Search Console
You can use Googles removal tool but be aware, these pages are only removed temporarily, and the removal lasts for just 90 days before they are indexed again. This is not a long-term solution but a useful quick fix while you implement the other approaches.
To make a removal permanent, it is recommended that you either rely on 404 and 401 status codes by removing the content completely from your website or implement the ‘disallow’ operator mentioned in point 2. Remove ‘noindex’ from your Robots.txt file and ensure you have considered the points above and you’ll be fine.
Googles algorithm is becoming increasingly clever. Taking the above steps into consideration will serve you well in keeping your pages off the search results pages. Once you have prepared for this change; you can then request that Google re-crawls your website through the Google Search Console.