Changes on Twenty Nineteen HTML structure in WordPress 5.3

In some templates, Twenty Nineteen was using a <main> HTML element which was appearing as a direct descendant of a <section> HTML element. This resulted in markup errors on the following templates:

  • 404.php
  • archive.php
  • image.php
  • index.php
  • page.php
  • search.php
  • single.php

For reference, see HTML specifications:

A hierarchically correct main element is one whose ancestor elements are limited to html, body, div, form without an accessible name, and autonomous custom elements.

Source: HTML Living Standard

WordPress 5.3 will change that behavior by using a neutral <div> element instead of <section>.

Previous HTML rendering example:

<section id="primary" class="content-area">
    <main id="main" class="site-main">
        …
    </main>
</section>

New HTML rendering example:

<div id="primary" class="content-area">
    <main id="main" class="site-main">
        …
    </main>
</div>

WordPress users who are using Twenty Nineteen as a parent theme are encouraged to check their child themes stylesheets are not using selectors like section#primary or section.content-area and to update them to #primary or .content-area if needed.

For reference, see the related Trac ticket: #47066

#5-3, #dev-notes

Changes to prevent search engines indexing sites.

In WordPress 5.3 the method used to discourage indexing will change on sites enabling the option “discourage search engines from indexing this site” in the WordPress dashboard. These changes were made as part of ticket #43590.

These changes are intended to better discourage search engines from listing a site rather than only preventing them from crawling the site.

robots.txt file changes.

In previous versions of WordPress, Disallow: / was added to the robots.txt file to prevent search engines from crawling the site. This has been removed for non-public websites in WordPress 5.3.

As Joost de Valk writes in an explainer on search engine exclusion, disallowing crawling can have the effect of allowing a site to be indexed:

A site doesn’t have to be [crawled] to be listed. If a link points to a page, domain or wherever, Google follows that link. If the robots.txt on that domain prevents [crawling] of that page by a search engine, it’ll still show the URL in the results if it can gather … it might be worth looking at.

Meta tag changes.

Sites with the “discourage search engines from indexing this site” option enabled will display an updated robots meta tag to prevent the site from being listed in search engines: <meta name='robots' content='noindex,nofollow' />.

This meta tag requests search engines exclude the page from indexing and discourages them from further crawling the website.

Excluding development servers from search engines.

The most effective method to exclude development sites from being indexed by search engines is to include the HTTP Header X-Robots-Tag: noindex, nofollow when serving all assets for your site: images, PDFs, video and other assets.

As most non-HTML assets are served directly by the web server on a WordPress site, the core software is unable to set this HTTP header. You should consult your web server’s documentation or your host to ensure these assets are excluded on development sites.

#5-3, #dev-notes