XML Sitemaps Meeting: February 25th, 2020

Last week we held the first of many weekly meetings for the XML Sitemaps feature project on Slack.

Meeting Recap: February 18th

We had quite a few people attending, not all of whom were familiar with the project. Thus, we started off with a small recap of the project’s scope and goals. After that we discussed various different topics:

  • How to modify the sitemaps to include/exclude certain URLS
    A pull request has been opened to add a FAQ section to the readme that aims to answer these kind of questions.
    Also, a new way to filter WP_Query instances used for sitemaps has been proposed.
  • Why are there no changefreq and priority fields?
    Those are optional fields in the sitemaps protocol and not typically consumed by search engines. The feature plugin follows other solutions like Yoast SEO who also don’t include those fields.
    Developers can still add those fields if they really want too.
  • Will there be UI controls to include/exclude content from sitemaps?
    Adding UI controls is currently a non-goal for the project.
  • Calculating the last modified date for URLs
    This is rather difficult and computationally expensive in WordPress. Given that sitemaps are first and foremost a discovery mechanism for content, having this data is not necessarily required. We will explore omitting this functionality (GitHub issue).
  • The default limit of 2000 URLs per sitemap is considered high and might need to be re-evaluated.
  • Potential compatibility issues with other XML Sitemaps plugins have been discussed.
    If a site ends up having two sitemaps by accident that wouldn’t be bad. However, the current /sitemap.xml URL might clash with other plugins. A GitHub issue has been opened to suggesting using /wp-sitemap.xml as the base. This would avoid conflicts in this regard.

Agenda: February 25th

The next meeting will be held on Tuesday, February 25 at 16.00 CET

For tomorrow’s meeting, the agenda is rather brief:

  • Updates since last week (merged changes, new issues)
  • Next steps for proposed lastmod changes
  • Next steps for URL naming change
  • Planning release of version 0.2.0

This meeting is held in the #core-sitemaps channel , to join the meeting, you’ll need an account on the Making WordPress Slack.

#agenda, #feature-plugins, #feature-projects, #seo, #xml-sitemaps

XML Sitemaps Kickoff Meeting Announcement

A few weeks ago an update was posted for the XML Sitemaps feature project to give everyone an idea of where it is heading.

Now, we want to gather more contributors around the feature plugin and get your feedback on the project. For this, we’re kicking off regular meetings in the brand new #core-sitemaps Slack channel.

The first meeting will be held on Tuesday, February 18 at 16.00 CET and will serve as an introduction to the project and an opportunity to discuss the next steps. As such, there is currently no formal agenda for this inaugural meeting.

However, if you have anything specific that you’d like to propose being discussed in this meeting, feel free to leave a comment below.

This meeting is held in the #core-sitemaps channel , to join the meeting, you’ll need an account on the Making WordPress Slack.

#feature-plugins, #feature-projects, #seo, #xml-sitemaps

Feature Plugin: XML Sitemaps

Last year, a group of contributors posted a proposal to implement native XML Sitemaps in WordPress Core which received lots of interest and feedback from the community. Since then, we have been working on a XML Sitemap feature plugin (MVP) which is now available for testing and feedback.

Props to the contributors working on this plugin and co-authoring the content of this post: Sander van Dragt, Kirsty Burgoine, Adrian McShane, Ruxandra Gradina, Joe McGill, Thierry Muller, Pascal Birchler 

Feature overview

As a quick reminder of what this project is trying to achieve, here are the main features as described in the initial project proposal, which we would encourage you to read in its entirety.

XML Sitemaps will be enabled by default making the following content types indexable

– Homepage
– Posts page
– Core Post Types (Pages and Posts)
– Custom Post Types
– Core Taxonomies (Tags and Categories)
– Custom Taxonomies
– Users (Authors)

Additionally, the robots.txt file exposed by WordPress will reference the sitemap index.

Additionally, an XML Sitemaps API ships with the plugin aiming for developers to build on top of it. 

The approach

In order to fulfil these initial requirements, we researched the way several existing popular plugins implement this functionality, and came up with an approach that we believe combines many of the best ideas from each.

The sitemap index

A crucial feature of the sitemap plugin is the sitemap index. This is the main XML file that contains the listing of all the sitemap pages exposed by your WordPress site and the time each was last modified. By default, the plugin creates a sitemap index at /sitemap.xml which includes sitemaps for all supported content, separated into groups by post types, taxonomies, and users.

Sitemap pages

Each sitemap page will be available at a URL using the following structure, sitemap-{object-type}-{object-subtype}-{page}.xml. Some examples of this structure applied to real content include:

  • Post type – posts: sitemap-posts-post-1.xml 
  • Post type – pages: sitemap-posts-page-1.xml
  • Taxonomy – categories: sitemap-taxonomies-category-1.xml
  • Users – sitemap-users-1.xml (note that the WP_User object doesn’t support sub-types)

The official sitemaps protocol asserts that each sitemap can contain a maximum of 50,000 URLs and must be no larger than 50MB (52,428,800 bytes). However, in practice, we found that performance begins to degrade when trying to generate a query that returns more than a few thousand URLs, so for that reason, we’ve decided to limit the default implementation to a maximum of 2,000 URLs per sitemap, which can be modified by using a filter on the core_sitemaps_max_urls hook.

Sitemap pages for each public post type (except attachments) will be generated, which include URLs to individual post pages. Likewise, sitemaps will be generated for all public taxonomies, which include URLs to taxonomy archive pages, and sitemaps will be generated for all users with published public posts, which includes the URL for each user’s author archive page. The list of supported sub-types for posts and taxonomies can be filtered using the core_sitemaps_post_types and core_sitemaps_taxonomies filters, respectively. Additionally, URLs for any object type can be added or removed using the following filters:

  • Post types: core_sitemaps_posts_url_list
  • Taxonomies: core_sitemaps_taxonomies_url_list
  • Users: core_sitemaps_users_url_list

Performance and scalability

Adding an XML Sitemaps caching mechanism was specifically listed as a non-goal of the project, so we have not included one. However, we did want to ensure that the initial version of the plugin took scalability into consideration, so we spent time researching the major scalability issues present in current popular implementations and ways of solving those problems.

By using best practices for making our main queries performant, we were able to eliminate most scalability problems from individual sitemap pages. However, the main performance problem is generating last modified times for each page in the sitemap index. It’s not scalable to calculate these values dynamically, so instead, we’ve started with an implementation that updates these values using a WP_Cron task that runs twice daily, and saves these values in the options table.

We’ve also begun researching and writing up an implementation for a more robust sitemap page caching mechanism, using custom post types to store and update sitemap data, which can be further explored if the initial implementation proves to be insufficient as an initial implementation for core. (See: #1 and #39 for more details).

Next steps

Announcing the first version of this feature plugin is a major milestone, but is only an early step in the process of having this functionality included in WordPress Core. Now is when we need your help to test, validate, and improve what we have built to ensure that we meet the needs of the broad WordPress community. We are also encouraging sitemaps plugins authors to integrate with the plugin, specifically leveraging the sitemaps API to extend its core functionalities.

We will kick start weekly meetings on WordPress Slack in the very near future. In the meantime, we would encourage anyone interested to join now and begin discussion about this feature. Additionally, you can leave questions and feedback in the comments section of this post or as new issues on the GitHub repo.

Thanks for reading!

#feature-plugins, #feature-projects, #seo, #xml-sitemaps

XML Sitemaps Feature Project Proposal

Note: a follow post was published with more recent information about this project.

While web crawlers usually discover pages from links within the site and from other sites, sitemaps supplement this approach by allowing crawlers to pick up all URLs included in the sitemap and learn about those URLs using the associated metadata.

Today, WordPress core does not generate XML Sitemaps by default, affecting a high number of WordPress websites search engine discoverability. 4 out of the top 15 plugins on WordPress plugin repository currently ship with their own implementation of XML sitemaps, pointing to a universal need for this feature and a great potential to join forces.

This post proposes integration of XML Sitemaps to WordPress Core as a feature project. The proposal was created as a collaboration between Yoast*, Google** and various contributors.

Proposed Solution

In a nutshell, the goal of the proposal is to integrate basic XML Sitemaps in WordPress Core and introduce an XML Sitemaps API to make it fully extendable. Below is a diagram of the proposed XML Sitemaps structure:

XML Sitemaps will be enabled by default making the following content types indexable

  • Homepage
  • Posts page
  • Core Post Types (Pages and Posts)
  • Custom Post Types
  • Core Taxonomies (Tags and Categories)
  • Custom Taxonomies
  • Users (Authors)

Additionally, the robots.txt file exposed by WordPress will reference the sitemap index.

Developers

An XML Sitemaps API will be introduced as part of the integration allowing extensibility. At a high level, below is a list of the ways the XML Sitemaps may be manipulated via the API:

  • Add extra sitemaps and sitemap entries
  • Add extra attributes to sitemap entries
  • Provide a custom XML Stylesheet
  • Exclude a specific post type from the sitemap
  • Exclude a specific post from the sitemap
  • Exclude a specific taxonomy from the sitemap
  • Exclude a specific term from the sitemap
  • Exclude a specific author from the sitemap
  • Exclude a specific authors with a specific role from the sitemap

Non Goals

While the initial XML Sitemaps integration will fulfill search engines minimum requirements and cover most WordPress content types, below is a list of features which will not be included in the initial integration:

  • Image sitemaps
  • Video sitemaps
  • News sitemaps
  • User-facing changes like UI controls to exclude individual posts or pages from the sitemap
  • XML Sitemaps caching mechanisms

i18n

The XML Sitemaps will leverage standard internationalization functionality provided by WordPress core.

Since there are plans by WordPress leadership to officially support multilingual websites in WordPress, the XML Sitemaps will be flexible enough to list localized content in the future as per web development best practices.

What’s next?

Your thoughts on this proposal would be greatly valued. Please share your feedback, questions or interest in collaboration by commenting on this post. After that we can decide on how to best proceed with this proposed project and set up a meeting on Slack to kick things off.

* @joostdevalk, @omarreiss, @jonoalderson, @herregroen

** @swissspidy @albertomedina @westonruter @flixos90 @tweetythierry

#feature-plugins, #feature-projects, #proposal, #seo, #xml-sitemaps

wp_title() now consistently handles reve …

wp_title() now consistently handles reverse order title breadcrumbs. So in “right” mode, you get “July (sep) 2008 (sep) Blogname.” SEO plugin authors may want to take note.

#seo, #title, #wp_title