Merge Announcement: Extensible Core Sitemaps

This proposal seeks to integrate basic, extensibleExtensible This is the ability to add additional functionality to the code. Plugins extend the WordPress core software. XML sitemaps functionality into WordPress CoreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress..

While web crawlers are able to discover pages from links within the site and from other sites, sitemaps supplement this approach by allowing crawlers to quickly and comprehensively identify all URLs included in the sitemap and learn other signals about those URLs using the associated metadata.

Purpose & Goals

Sitemaps help WordPress sites become more discoverable by providing search engines with a map of content that should be indexed. The Sitemaps protocol is a URLURL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org inclusion protocol and complements robots.txt, a URL exclusion protocol.

A Sitemap is an XML file that lists the URLs for a site. Sitemaps can optionally include information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs of the site. This allows search engines to crawl the site more effectively and to discover every public URL the site has made available. 

This core sitemaps feature aims to provide the base required functionality for the Sitemaps protocol for core WordPress objects, then enables developers to extend this functionality with a robust and consistent set of filters. For example, developers can control which object types (posts, taxonomies, authors) or object subtypes (post types, taxonomies) are included, exclude specific entries, or extend sitemaps to add optional fields. See below for the full list.

Project Background

The idea of adding sitemaps to core was originally proposed in June 2019.  Since then, development has been ongoing in GitHubGitHub GitHub is a website that offers online implementation of git repositories that can easily be shared, copied and modified by other developers. Public repositories are free to host, private repositories require a paid subscription. GitHub introduced the concept of the ‘pull request’ where code changes done in branches by contributors can be reviewed and discussed before being merged be the repository owner. https://github.com/, and weekly meetings in the #core-sitemaps channel started this year to push development forward. Several versions of the feature plugin have been released on the WordPress.orgWordPress.org The community site where WordPress code is created and shared by the users. This is where you can download the source code for WordPress core, plugins and themes as well as the central location for community conversations and organization. https://wordpress.org/ pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party repository, with the latest 0.4.1 representing the state that is considered ready to merge into core. The team is currently working on preparing the final patch to include on the Trac ticket.

Implementation Details

XML Sitemaps will be enabled by default making the following object types indexable:

  • Homepage
  • Posts page
  • Core post types (i.e. pages and posts)
  • Custom post types
  • Core taxonomies (i.e. tags and categories)
  • Custom taxonomies
  • Author archives

Additionally, the robots.txt file exposed by WordPress will reference the sitemap index.

A crucial feature of the sitemap plugin is the sitemap index. This is the main XML file that contains the listing of all the sitemap pages exposed by a WordPress site. By default, the plugin creates a sitemap index at /wp-sitemap.xml which includes sitemaps for all supported content, separated into groups by type. Each sitemap file contains a maximum of 2,000 URLs per sitemap, when that threshold is reached a new sitemap file is added.

By default, sitemaps are created for all public post types and taxonomies, as well as for author archives. Several filters exist to tweak this behavior, for example to include or exclude certain entries. Also, there are plenty of available hooksHooks In WordPress theme and development, hooks are functions that can be applied to an action or a Filter in WordPress. Actions are functions performed when a certain event occurs in WordPress. Filters allow you to modify certain functions. Arguments used to hook both filters and actions look the same. for plugins to integrate with this feature if they want to, or to disable it completely if they wish to roll their own version.

Contributors and Feedback

The following people have contributed to this project in some form or another:

Adrian McShane, @afragen, @adamsilverstein, @casiepa, @flixos90, @garrett-eclipse, @joemcgill, @kburgoine, @kraftbj, @milana_cap, @pacifika, @pbiron, @pfefferle, Ruxandra Gradina, @swissspidy, @szepeviktor, @tangrufus, @tweetythierry

With special thanks to the docs, polyglots, and security teams for their thorough reviews.

Available Hooks and Filters

Check out the feature plugin page for a full list of filters and also a few usage examples.

Frequently Asked Questions

How can I disable sitemaps?

If you update the WordPress settings to discourage search engines from indexing your site, sitemaps will be disabled. Alternatively, use the wp_sitemaps_is_enabled filterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output., or use remove_action( 'init', 'wp_sitemaps_get_server' ); to disable initialization of any sitemap functionality.

How can I disable sitemaps for a certain object type or exclude a certain item?

Using the filters referred to above – check out the feature plugin page for examples.

Does this support lastmod, changefreq, or priority attributes for sitemaps?

By default, no. Those are optional fields in the sitemaps protocol and not typically consumed by search engines. Developers can still add those fields if they want to using the filters referred to above.

lastmod in particular has not been implemented due to the added complexity of calculating the last modified dates for all object types and sitemaps with reasonable performance. For a common website with less frequent updates, lastmod does not offer additional benefits. For sites that are updated very frequently and want to use lastmod, it is recommended to use a plugin to add this functionality.

What about image/video/news sitemaps?

These sitemap extensions were declared a non-goal when the project was initially proposed, and as such are not covered by this feature. In future versions of WordPress, filters and hooks may be added to enable plugins to add such functionality.

Are there any UIUI User interface controls to exclude posts or pages from sitemaps?

No. User-facing changes were declared a non-goal when the project was initially proposed, since simply omitting a given post from a sitemap is not a guarantee that it won’t get crawled or indexed by search engines. In the spirit of “Decisions, not options”, any logic to exclude posts from sitemaps is better handled by dedicated plugins (i.e. SEO plugins). Plugins that implement a UI for relevant areas can use the new filters to enforce their settings, for example to only query content that has not been flagged with a “noindex” option.

Are there any privacy implications of listing users in sitemaps?

The sitemaps only surface the site’s author archives, and do not include any information that isn’t already publicly available on a site.

Are there any performance implications by adding this feature?

The addition of this feature does not impact regular website visitors, but only users who access the sitemap directly. Benchmarks during development of this feature showed that sitemap generation is generally very fast even for sites with thousands of posts. Thus, no additional caching for sitemaps was put in place.

If you want to optimize the sitemap generation, for example by optimizing queries or even short-circuiting any database queries, use the filters mentioned above.

What about sites with existing sitemap plugins?

Many sites already have a plugin active that implements sitemaps. For most of them, that will no longer be necessary, as the feature in WordPress core suffices. However, there is no harm in keeping them. The core sitemaps feature was built in a robust and easily extensible way. If for some reason two sitemaps are exposed on a website (one by core, one by a plugin), this does not result in any negative consequences for the site’s discoverability.

#5-5, #feature-plugins, #feature-projects, #merge-proposals, #sitemaps, #xml-sitemaps