Changes to prevent search engines indexing sites.

In WordPress 5.3 the method used to discourage indexing will change on sites enabling the option “discourage search engines from indexing this site” in the WordPress dashboard. These changes were made as part of ticket #43590.

These changes are intended to better discourage search engines from listing a site rather than only preventing them from crawling the site.

robots.txt file changes.

In previous versions of WordPress, Disallow: / was added to the robots.txt file to prevent search engines from crawling the site. This has been removed for non-public websites in WordPress 5.3.

As Joost de Valk writes in an explainer on search engine exclusion, disallowing crawling can have the effect of allowing a site to be indexed:

A site doesn’t have to be [crawled] to be listed. If a link points to a page, domain or wherever, Google follows that link. If the robots.txt on that domain prevents [crawling] of that page by a search engine, it’ll still show the URL in the results if it can gather … it might be worth looking at.

Meta tag changes.

Sites with the “discourage search engines from indexing this site” option enabled will display an updated robots meta tag to prevent the site from being listed in search engines: <meta name='robots' content='noindex,nofollow' />.

This meta tag requests search engines exclude the page from indexing and discourages them from further crawling the website.

Excluding development servers from search engines.

The most effective method to exclude development sites from being indexed by search engines is to include the HTTP Header X-Robots-Tag: noindex, nofollow when serving all assets for your site: images, PDFs, video and other assets.

As most non-HTML assets are served directly by the web server on a WordPress site, the core software is unable to set this HTTP header. You should consult your web server’s documentation or your host to ensure these assets are excluded on development sites.

#5-3, #dev-notes