Feature Plugin: XML Sitemaps

Last year, a group of contributors posted a proposal to implement native XML Sitemaps in WordPress CoreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress. which received lots of interest and feedback from the community. Since then, we have been working on a XML Sitemap feature plugin (MVPMinimum Viable Product "A minimum viable product (MVP) is a product with just enough features to satisfy early customers, and to provide feedback for future product development." - WikiPedia) which is now available for testing and feedback.

Props to the contributors working on this pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party and co-authoring the content of this post: Sander van Dragt, Kirsty Burgoine, Adrian McShane, Ruxandra Gradina, Joe McGill, Thierry Muller, Pascal Birchler 

Feature overview

As a quick reminder of what this project is trying to achieve, here are the main features as described in the initial project proposal, which we would encourage you to read in its entirety.

XML Sitemaps will be enabled by default making the following content types indexable

– Homepage
– Posts page
– Core Post Types (Pages and Posts)
– Custom Post Types
– Core Taxonomies (Tags and Categories)
– Custom Taxonomies
– Users (Authors)

Additionally, the robots.txt file exposed by WordPress will reference the sitemap index.

Additionally, an XML Sitemaps APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. ships with the plugin aiming for developers to build on top of it. 

The approach

In order to fulfil these initial requirements, we researched the way several existing popular plugins implement this functionality, and came up with an approach that we believe combines many of the best ideas from each.

The sitemap index

A crucial feature of the sitemap plugin is the sitemap index. This is the main XML file that contains the listing of all the sitemap pages exposed by your WordPress site and the time each was last modified. By default, the plugin creates a sitemap index at /sitemap.xml which includes sitemaps for all supported content, separated into groups by post types, taxonomies, and users.

Sitemap pages

Each sitemap page will be available at a URLURL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org using the following structure, sitemap-{object-type}-{object-subtype}-{page}.xml. Some examples of this structure applied to real content include:

  • Post type – posts: sitemap-posts-post-1.xml 
  • Post type – pages: sitemap-posts-page-1.xml
  • TaxonomyTaxonomy A taxonomy is a way to group things together. In WordPress, some common taxonomies are category, link, tag, or post format. https://codex.wordpress.org/Taxonomies#Default_Taxonomies. – categories: sitemap-taxonomies-category-1.xml
  • Users – sitemap-users-1.xml (note that the WP_User object doesn’t support sub-types)

The official sitemaps protocol asserts that each sitemap can contain a maximum of 50,000 URLs and must be no larger than 50MB (52,428,800 bytes). However, in practice, we found that performance begins to degrade when trying to generate a query that returns more than a few thousand URLs, so for that reason, we’ve decided to limit the default implementation to a maximum of 2,000 URLs per sitemap, which can be modified by using a filterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. on the core_sitemaps_max_urls hook.

Sitemap pages for each public post type (except attachments) will be generated, which include URLs to individual post pages. Likewise, sitemaps will be generated for all public taxonomies, which include URLs to taxonomy archive pages, and sitemaps will be generated for all users with published public posts, which includes the URL for each user’s author archive page. The list of supported sub-types for posts and taxonomies can be filtered using the core_sitemaps_post_types and core_sitemaps_taxonomies filters, respectively. Additionally, URLs for any object type can be added or removed using the following filters:

  • Post types: core_sitemaps_posts_url_list
  • Taxonomies: core_sitemaps_taxonomies_url_list
  • Users: core_sitemaps_users_url_list

Performance and scalability

Adding an XML Sitemaps caching mechanism was specifically listed as a non-goal of the project, so we have not included one. However, we did want to ensure that the initial version of the plugin took scalability into consideration, so we spent time researching the major scalability issues present in current popular implementations and ways of solving those problems.

By using best practices for making our main queries performant, we were able to eliminate most scalability problems from individual sitemap pages. However, the main performance problem is generating last modified times for each page in the sitemap index. It’s not scalable to calculate these values dynamically, so instead, we’ve started with an implementation that updates these values using a WP_Cron task that runs twice daily, and saves these values in the options table.

We’ve also begun researching and writing up an implementation for a more robust sitemap page caching mechanism, using custom post types to store and update sitemap data, which can be further explored if the initial implementation proves to be insufficient as an initial implementation for core. (See: #1 and #39 for more details).

Next steps

Announcing the first version of this feature plugin is a major milestone, but is only an early step in the process of having this functionality included in WordPress Core. Now is when we need your help to test, validate, and improve what we have built to ensure that we meet the needs of the broad WordPress community. We are also encouraging sitemaps plugins authors to integrate with the plugin, specifically leveraging the sitemaps API to extend its core functionalities.

We will kick start weekly meetings on WordPress SlackSlack Slack is a Collaborative Group Chat Platform https://slack.com/. The WordPress community has its own Slack Channel at https://make.wordpress.org/chat/. in the very near future. In the meantime, we would encourage anyone interested to join now and begin discussion about this feature. Additionally, you can leave questions and feedback in the comments section of this post or as new issues on the GitHub repo.

Thanks for reading!

#feature-plugins, #feature-projects, #seo, #sitemaps, #xml-sitemaps