XML Sitemaps Meeting: March 24th, 2020

In case you were looking for an blog post about the XML Sitemaps feature project last week, worry no more. Work on the plugin is progressing smoothly and steadily, we just didn’t publish an agenda post last week. That means it is time for a double update today!

Meeting Recap: March 10th & 17th

For reference, check my previous blog post from March 10th:

A lot has happened since then. Here’s the summary, not necessarily in the right order:

  • SimpleXML dependency
    We received great feedback from a variety of big hosting providers, all saying that this PHP extension is widely available and we can rely on it safely.
    Current status: no action needed.
  • Rewrite rule conflict with plugins
    As we realized that the new /wp-sitemap.xml URL format clashes with big existing plugins, we decided to look into alternate names for both the rewrite rules as well as the query params. See GitHub issue for details.
    Current status: needs contributors.
  • Rewrite rule issues with custom providers
    It was reported that adding custom sitemap providers might require flushing rewrite rules. Ideally, that shouldn’t be needed.
    Current status: needs decision.
  • Last modified date (lastmod)
    We decided to continue with the proposed PR to remove lastmod from sitemaps (at least for now), but need to make sure there is appropriate documentation. It’s something that can always be added back if needed.
    Current status: has PR, needs documentation.
  • Query Filters
    Valuable feedback emerged from testing, which led to the decision to close the existing PR to make query instances filterable in favor of a simpler approach. In its place, we should make the query arguments filterable, and also add filters to short-circuit queries.
    Current status: needs contributors.

Please let me know in the comments if I got something wrong in this summary!

Agenda: March 24th

The next meeting will be held on Tuesday, March 24 at 16.00 CET.

Today’s agenda is rather straightforward so far:

Want to add anything to the above? Please leave a comment here or reach out on Slack.

This meeting is held in the #core-sitemaps channel , to join the meeting, you’ll need an account on the Making WordPress Slack.

#agenda, #feature-plugins, #feature-projects, #xml-sitemaps

XML Sitemaps Meeting: March 10th, 2020

A lot has happened since last week’s meeting for the XML Sitemaps feature project. Here’s a quick rundown of what we’ve discussed & did, as well as a brief agenda for today’s meeting.

Meeting Recap: March 3rd

For reference, please check out last week’s agenda post:

The tl;dr of our discussion:

  • Disabling sitemaps for private sites
    Mentioned the currently open PR and how it could be used to kill two birds with one stone by making that process filterable; thus making it easier for plugins to disable the sitemaps feature.
    Current status: needs tests
  • Prefixing sitemap URLs
    The main PR for this change has been merged, a new issue has been opened for @kraftbj to handle 404 requests.
  • SimpleXML dependency
    We went over potential alternatives to this extension, but ultimately settled on sticking with the status quo as initial feedback indicated a rather wide availability of SimpleXML. We then discussed how we should gracefully handle the unavailability of said extension and decided on using wp_die to output a nicely formatted error message in XML with HTTP status 501 (“Not implemented”).
    Current status: merged!
  • @joemcgill proposed looking into how to best transition the code base to something more in line with WordPress core. Something that we can discuss in a future meeting, once the plugin is more stable.
  • Added @pbiron, @kraftbj, and @pfefferle as new contributors to the GitHub repository. 🎉

Agenda: March 10th

The next meeting will be held on Tuesday, March 10 at 16.00 CET.

PSA: Unfortunately I won’t be able to lead today’s meeting, but thankfully @tweetythierry stepped up to help out with this.

Today’s agenda is rather straightforward so far:

  • Released version 0.2.0 of the plugin (changelog)
  • Plugin compatibility with new URL structure
    Yoast SEO’s rewrite rules seem to clash with ours
  • SimpleXML dependency: blog post on make/hosting (@pbiron)
  • Currently open issues and pull requests
  • Open floor

Want to add anything to the above? Please leave a comment here or reach out on Slack.

This meeting is held in the #core-sitemaps channel , to join the meeting, you’ll need an account on the Making WordPress Slack.

#agenda, #feature-plugins, #feature-projects, #xml-sitemaps

XML Sitemaps Meeting: March 3rd, 2020

Another week passed by with quite a productive meeting for the XML Sitemaps feature project. Here’s a short summary, as well as the agenda for today’s meeting.

Meeting Recap: February 25th

In case you missed it, I recommend checking out last week’s post with everything that happened so far:

As planned, we went over some of the existing issues, but we also discussed some things that came up on short notice. Here’s the gist:

  • We reiterated on the idea to remove the lastmod field. @swissspidy offered to start a PR that explores this so it can actually be tested in the wild. @joemcgill offered to post some stats about the performance of this last modified date calculation.
  • There was a discussion, also after the meeting, about changing URLs to have a /wp- prefix and whether that prefix should be filterable. The consensus was that a filter is unnecessary. A new PR was added to implement this.
    @kraftbj offered his help to implement automatic redirects from /sitemap.xml to /wp-sitemap.xml for improved discoverability.
  • Next up was the SimpleXML dependency and how the plugin should behave when that PHP extension is missing.
    We tend towards just disabling sitemaps if that’s the case, but perhaps provide some messaging about it.
    @kraftbj offered to try to get some stats about the availability of SimpleXML via Jetpack, as well as to help with a PR.
    @pbiron reached out on the hosting community channel, and is looking for specific questions that we could ask in a make/hosting post.
  • Last but not least, there was an open question about leveraging the REST API for sitemaps. It was not fully clear though how that would be beneficial. As of now, there are no plans to explore this.

Agenda: March 3rd

The next meeting will be held on Tuesday, March 3 at 16.00 CET

This meeting is held in the #core-sitemaps channel , to join the meeting, you’ll need an account on the Making WordPress Slack.

#agenda, #feature-plugins, #feature-projects, #xml-sitemaps

XML Sitemaps Meeting: February 25th, 2020

Last week we held the first of many weekly meetings for the XML Sitemaps feature project on Slack.

Meeting Recap: February 18th

We had quite a few people attending, not all of whom were familiar with the project. Thus, we started off with a small recap of the project’s scope and goals. After that we discussed various different topics:

  • How to modify the sitemaps to include/exclude certain URLS
    A pull request has been opened to add a FAQ section to the readme that aims to answer these kind of questions.
    Also, a new way to filter WP_Query instances used for sitemaps has been proposed.
  • Why are there no changefreq and priority fields?
    Those are optional fields in the sitemaps protocol and not typically consumed by search engines. The feature plugin follows other solutions like Yoast SEO who also don’t include those fields.
    Developers can still add those fields if they really want too.
  • Will there be UI controls to include/exclude content from sitemaps?
    Adding UI controls is currently a non-goal for the project.
  • Calculating the last modified date for URLs
    This is rather difficult and computationally expensive in WordPress. Given that sitemaps are first and foremost a discovery mechanism for content, having this data is not necessarily required. We will explore omitting this functionality (GitHub issue).
  • The default limit of 2000 URLs per sitemap is considered high and might need to be re-evaluated.
  • Potential compatibility issues with other XML Sitemaps plugins have been discussed.
    If a site ends up having two sitemaps by accident that wouldn’t be bad. However, the current /sitemap.xml URL might clash with other plugins. A GitHub issue has been opened to suggesting using /wp-sitemap.xml as the base. This would avoid conflicts in this regard.

Agenda: February 25th

The next meeting will be held on Tuesday, February 25 at 16.00 CET

For tomorrow’s meeting, the agenda is rather brief:

  • Updates since last week (merged changes, new issues)
  • Next steps for proposed lastmod changes
  • Next steps for URL naming change
  • Planning release of version 0.2.0

This meeting is held in the #core-sitemaps channel , to join the meeting, you’ll need an account on the Making WordPress Slack.

#agenda, #feature-plugins, #feature-projects, #seo, #xml-sitemaps

XML Sitemaps Kickoff Meeting Announcement

A few weeks ago an update was posted for the XML Sitemaps feature project to give everyone an idea of where it is heading.

Now, we want to gather more contributors around the feature plugin and get your feedback on the project. For this, we’re kicking off regular meetings in the brand new #core-sitemaps Slack channel.

The first meeting will be held on Tuesday, February 18 at 16.00 CET and will serve as an introduction to the project and an opportunity to discuss the next steps. As such, there is currently no formal agenda for this inaugural meeting.

However, if you have anything specific that you’d like to propose being discussed in this meeting, feel free to leave a comment below.

This meeting is held in the #core-sitemaps channel , to join the meeting, you’ll need an account on the Making WordPress Slack.

#feature-plugins, #feature-projects, #seo, #xml-sitemaps

Feature Plugin: XML Sitemaps

Last year, a group of contributors posted a proposal to implement native XML Sitemaps in WordPress Core which received lots of interest and feedback from the community. Since then, we have been working on a XML Sitemap feature plugin (MVP) which is now available for testing and feedback.

Props to the contributors working on this plugin and co-authoring the content of this post: Sander van Dragt, Kirsty Burgoine, Adrian McShane, Ruxandra Gradina, Joe McGill, Thierry Muller, Pascal Birchler 

Feature overview

As a quick reminder of what this project is trying to achieve, here are the main features as described in the initial project proposal, which we would encourage you to read in its entirety.

XML Sitemaps will be enabled by default making the following content types indexable

– Homepage
– Posts page
– Core Post Types (Pages and Posts)
– Custom Post Types
– Core Taxonomies (Tags and Categories)
– Custom Taxonomies
– Users (Authors)

Additionally, the robots.txt file exposed by WordPress will reference the sitemap index.

Additionally, an XML Sitemaps API ships with the plugin aiming for developers to build on top of it. 

The approach

In order to fulfil these initial requirements, we researched the way several existing popular plugins implement this functionality, and came up with an approach that we believe combines many of the best ideas from each.

The sitemap index

A crucial feature of the sitemap plugin is the sitemap index. This is the main XML file that contains the listing of all the sitemap pages exposed by your WordPress site and the time each was last modified. By default, the plugin creates a sitemap index at /sitemap.xml which includes sitemaps for all supported content, separated into groups by post types, taxonomies, and users.

Sitemap pages

Each sitemap page will be available at a URL using the following structure, sitemap-{object-type}-{object-subtype}-{page}.xml. Some examples of this structure applied to real content include:

  • Post type – posts: sitemap-posts-post-1.xml 
  • Post type – pages: sitemap-posts-page-1.xml
  • Taxonomy – categories: sitemap-taxonomies-category-1.xml
  • Users – sitemap-users-1.xml (note that the WP_User object doesn’t support sub-types)

The official sitemaps protocol asserts that each sitemap can contain a maximum of 50,000 URLs and must be no larger than 50MB (52,428,800 bytes). However, in practice, we found that performance begins to degrade when trying to generate a query that returns more than a few thousand URLs, so for that reason, we’ve decided to limit the default implementation to a maximum of 2,000 URLs per sitemap, which can be modified by using a filter on the core_sitemaps_max_urls hook.

Sitemap pages for each public post type (except attachments) will be generated, which include URLs to individual post pages. Likewise, sitemaps will be generated for all public taxonomies, which include URLs to taxonomy archive pages, and sitemaps will be generated for all users with published public posts, which includes the URL for each user’s author archive page. The list of supported sub-types for posts and taxonomies can be filtered using the core_sitemaps_post_types and core_sitemaps_taxonomies filters, respectively. Additionally, URLs for any object type can be added or removed using the following filters:

  • Post types: core_sitemaps_posts_url_list
  • Taxonomies: core_sitemaps_taxonomies_url_list
  • Users: core_sitemaps_users_url_list

Performance and scalability

Adding an XML Sitemaps caching mechanism was specifically listed as a non-goal of the project, so we have not included one. However, we did want to ensure that the initial version of the plugin took scalability into consideration, so we spent time researching the major scalability issues present in current popular implementations and ways of solving those problems.

By using best practices for making our main queries performant, we were able to eliminate most scalability problems from individual sitemap pages. However, the main performance problem is generating last modified times for each page in the sitemap index. It’s not scalable to calculate these values dynamically, so instead, we’ve started with an implementation that updates these values using a WP_Cron task that runs twice daily, and saves these values in the options table.

We’ve also begun researching and writing up an implementation for a more robust sitemap page caching mechanism, using custom post types to store and update sitemap data, which can be further explored if the initial implementation proves to be insufficient as an initial implementation for core. (See: #1 and #39 for more details).

Next steps

Announcing the first version of this feature plugin is a major milestone, but is only an early step in the process of having this functionality included in WordPress Core. Now is when we need your help to test, validate, and improve what we have built to ensure that we meet the needs of the broad WordPress community. We are also encouraging sitemaps plugins authors to integrate with the plugin, specifically leveraging the sitemaps API to extend its core functionalities.

We will kick start weekly meetings on WordPress Slack in the very near future. In the meantime, we would encourage anyone interested to join now and begin discussion about this feature. Additionally, you can leave questions and feedback in the comments section of this post or as new issues on the GitHub repo.

Thanks for reading!

#feature-plugins, #feature-projects, #seo, #xml-sitemaps

XML Sitemaps Feature Project Proposal

Note: a follow post was published with more recent information about this project.

While web crawlers usually discover pages from links within the site and from other sites, sitemaps supplement this approach by allowing crawlers to pick up all URLs included in the sitemap and learn about those URLs using the associated metadata.

Today, WordPress core does not generate XML Sitemaps by default, affecting a high number of WordPress websites search engine discoverability. 4 out of the top 15 plugins on WordPress plugin repository currently ship with their own implementation of XML sitemaps, pointing to a universal need for this feature and a great potential to join forces.

This post proposes integration of XML Sitemaps to WordPress Core as a feature project. The proposal was created as a collaboration between Yoast*, Google** and various contributors.

Proposed Solution

In a nutshell, the goal of the proposal is to integrate basic XML Sitemaps in WordPress Core and introduce an XML Sitemaps API to make it fully extendable. Below is a diagram of the proposed XML Sitemaps structure:

XML Sitemaps will be enabled by default making the following content types indexable

  • Homepage
  • Posts page
  • Core Post Types (Pages and Posts)
  • Custom Post Types
  • Core Taxonomies (Tags and Categories)
  • Custom Taxonomies
  • Users (Authors)

Additionally, the robots.txt file exposed by WordPress will reference the sitemap index.

Developers

An XML Sitemaps API will be introduced as part of the integration allowing extensibility. At a high level, below is a list of the ways the XML Sitemaps may be manipulated via the API:

  • Add extra sitemaps and sitemap entries
  • Add extra attributes to sitemap entries
  • Provide a custom XML Stylesheet
  • Exclude a specific post type from the sitemap
  • Exclude a specific post from the sitemap
  • Exclude a specific taxonomy from the sitemap
  • Exclude a specific term from the sitemap
  • Exclude a specific author from the sitemap
  • Exclude a specific authors with a specific role from the sitemap

Non Goals

While the initial XML Sitemaps integration will fulfill search engines minimum requirements and cover most WordPress content types, below is a list of features which will not be included in the initial integration:

  • Image sitemaps
  • Video sitemaps
  • News sitemaps
  • User-facing changes like UI controls to exclude individual posts or pages from the sitemap
  • XML Sitemaps caching mechanisms

i18n

The XML Sitemaps will leverage standard internationalization functionality provided by WordPress core.

Since there are plans by WordPress leadership to officially support multilingual websites in WordPress, the XML Sitemaps will be flexible enough to list localized content in the future as per web development best practices.

What’s next?

Your thoughts on this proposal would be greatly valued. Please share your feedback, questions or interest in collaboration by commenting on this post. After that we can decide on how to best proceed with this proposed project and set up a meeting on Slack to kick things off.

* @joostdevalk, @omarreiss, @jonoalderson, @herregroen

** @swissspidy @albertomedina @westonruter @flixos90 @tweetythierry

#feature-plugins, #feature-projects, #proposal, #seo, #xml-sitemaps