XML Sitemaps Feature Project Proposal

While web crawlers usually discover pages from links within the site and from other sites, sitemaps supplement this approach by allowing crawlers to pick up all URLs included in the sitemap and learn about those URLs using the associated metadata.

Today, WordPress core does not generate XML Sitemaps by default, affecting a high number of WordPress websites search engine discoverability. 4 out of the top 15 plugins on WordPress plugin repository currently ship with their own implementation of XML sitemaps, pointing to a universal need for this feature and a great potential to join forces.

This post proposes integration of XML Sitemaps to WordPress Core as a feature project. The proposal was created as a collaboration between Yoast*, Google** and various contributors.

Proposed Solution

In a nutshell, the goal of the proposal is to integrate basic XML Sitemaps in WordPress Core and introduce an XML Sitemaps API to make it fully extendable. Below is a diagram of the proposed XML Sitemaps structure:

XML Sitemaps will be enabled by default making the following content types indexable

  • Homepage
  • Posts page
  • Core Post Types (Pages and Posts)
  • Custom Post Types
  • Core Taxonomies (Tags and Categories)
  • Custom Taxonomies
  • Users (Authors)

Additionally, the robots.txt file exposed by WordPress will reference the sitemap index.

Developers

An XML Sitemaps API will be introduced as part of the integration allowing extensibility. At a high level, below is a list of the ways the XML Sitemaps may be manipulated via the API:

  • Add extra sitemaps and sitemap entries
  • Add extra attributes to sitemap entries
  • Provide a custom XML Stylesheet
  • Exclude a specific post type from the sitemap
  • Exclude a specific post from the sitemap
  • Exclude a specific taxonomy from the sitemap
  • Exclude a specific term from the sitemap
  • Exclude a specific author from the sitemap
  • Exclude a specific authors with a specific role from the sitemap

Non Goals

While the initial XML Sitemaps integration will fulfill search engines minimum requirements and cover most WordPress content types, below is a list of features which will not be included in the initial integration:

  • Image sitemaps
  • Video sitemaps
  • News sitemaps
  • User-facing changes like UI controls to exclude individual posts or pages from the sitemap
  • XML Sitemaps caching mechanisms

i18n

The XML Sitemaps will leverage standard internationalization functionality provided by WordPress core.

Since there are plans by WordPress leadership to officially support multilingual websites in WordPress, the XML Sitemaps will be flexible enough to list localized content in the future as per web development best practices.

What’s next?

Your thoughts on this proposal would be greatly valued. Please share your feedback, questions or interest in collaboration by commenting on this post. After that we can decide on how to best proceed with this proposed project and set up a meeting on Slack to kick things off.

* @joostdevalk, @omarreiss, @jonoalderson, @herregroen

** @swissspidy @albertomedina @westonruter @flixos90 @tweetythierry

#feature-plugins, #feature-projects, #proposal, #seo

Preferred Languages: The Prototype

A bit over six months ago I set the foundation for the Preferred Languages feature project. After highlighting the problems many WordPress users around the world are facing, some time was spent on researching popular platforms and other systems to see how they handle the issue of setting multiple preferred languages. However, this didn’t really help to move one step forward and the project became dormant — until now.

Personally, I’ve been experiencing the same problems with so-called language fallbacks over and over again. During WordCamp Bilbao I decided to revive the Preferred Languages project. In order to do this I used the previously mentioned GitHub repository to build a super-simple proof-of-concept plugin to have a new basis for discussion. As originally envisioned, this repository can be used as a playground for discussions, prototypes, and eventually a working solution.

This new plugin lets you select multiple preferred languages in your settings. WordPress then tries to load the translations for the first language that’s available, falling back to the next language in your list.

A sneak peek of an early version quickly made the rounds on Twitter:

After some very positive feedback I worked a bit more on it and wanted to share the latest version here with a larger group of people. Here’s what it currently looks like on both the settings screen and when editing your profile:

Without much further ado, please check out the plugin on GitHub, test it on your local WordPress site, and leave some feedback!

#feature-projects, #i18n, #preferred-languages

Preferred Languages: Research

Following the first meeting of the Preferred Languages project, some time was spent on researching popular platforms and other systems to see how they handle the issue of setting multiple preferred languages. To recap, this is needed to show things in the most suitable language for users in case their requested language is not installed.

Browsers & Operating Systems

All major browsers and operating systems have some UI where the user can set their preferred languages. Mostly used for the Accept-Language HTTP header, one can set multiple languages and order them by preference. These systems usually know multiple variants like German and German (Switzerland), but not formal / informal variants.

Worth noting that on these systems you can often choose more settings that are influenced by the language, like temperature units and date formats. Related: #18146.

The Accept-Language HTTP Header

RFC 7231 about the HTTP/1.1 standard says the following about this header:

The “Accept-Language” header field can be used by user agents to indicate the set of natural languages that are preferred in the response. Each language-range can be given an associated quality value representing an estimate of the user’s preference for the languages specified by that range.

This header is quite powerful. For example, Accept-Language: da, en-gb;q=0.8, en;q=0.7 would mean: “I prefer Danish, but will accept British English and other types of English”. I can only recommend reading more about it in the RFC, as it helps getting a better understanding of the problems it tries to solve.

This header is usually used by websites to redirect users to the correct version or display a hint like “This content is also available in XY”. While each language in the header has a specific numeric priority, there is usually only a drag & drop interface to determine the order.

Unicode Common Locale Data Repository

The Unicode CLDR provides key building blocks for software to support the world’s languages, with the largest and most extensive standard repository of locale data available. It contains an interesting chart about Language Matching, data that is used to match the user’s desired language/locales against an application’s supported languages/locales.

There’s also a technical standard about Unicode Locale Data Markup Language (LDML), an XML format for the exchange of structured locale data. The section about Locale Inheritance and Language Matching is particularly interesting. For instance, it describes finding the most well suited language using a weighted graph and gives a better picture of dealing with more complex language fallbacks.

It also takes geographic “closeness” into account, arguing that English (Slovakia) should fall back to something within Europe (e.g. British English) in preference to something far away and unrelated like English (Singapore). It’s clearly stated that these fallbacks aren’t as simple as just saying Spanish (Mexico) -> Spanish (Spain) -> English (US).

Note that this technical standard is about finding the best supported locale based on the requested list of languages. The requested list could come from different sources, such as such as the user’s list of preferred languages in the OS Settings, or from a browser’s Accept-Language list.

Wikipedia

Wikipedia is built on the MediaWiki software, which has a hardcoded list of fallback chains for some locales. This is where MediaWiki will fall back on a different language if it cannot find what it needs. For example, French (Cajun) automatically falls back to French (France) when it doesn’t have all messages defined in it. Unfortunately, there’s only little documentation about this.

Apart from that, MediaWiki distinguishes between various kinds of languages: the site content language, the user interface language and the page content language. The latter can differ from the first two and it influences the language the user views the page in, which depends on the user’s preferences, the available languages, and the defined fallbacks.

It’s worth noting that on Wikipedia, you can not only select your preferred language, but also your preferred gender. In addition to that, you can select multiple different locales for both display and input.

Joomla

I tried the official Joomla Demo to test its language management which is a bit overwhelming at first. After installing all the available languages the user is eventually able to select their preferred language for the front end and the back end. There are a few other settings for language switching on multilingual sites, but that’s pretty much it. So basically the same settings as currently in WordPress 4.7.

Joomla User Settings

Joomla User Settings

 

Drupal

Drupal 8 has a rather powerful user interface text language detection mechanism. There is a per session, per user and per browser option in the detection settings. However, users can only choose one language, so they cannot say (in core at least) that they want German primarily and Spanish if German is not available. But the language selected by the user is part of the larger fallback system, so it may fall back further down to other options.

The Language fallback module allows defining one fallback for a language, while the Language Hierarchy module provides a GUI to change the language fallback system. It allows setting up language hierarchies where translations of a site’s content, settings and interface can fall back to parent language translations, without ever falling back to English. This module might be the most interesting one for our research.

Drupal Language Hierarchy

Drupal Language Hierarchy

TYPO3

TYPO3 has had a locale hierarchy since 2011. Quote:

TYPO3 4.6 comes with a clever fallback mechanism when a label is not found in the requested language; instead of returning the default (English) version, it allows you to define your own hierarchy of locales.

By default, French (Canada) will first use French before falling back to English. Similarly, missing labels in Brazilian Portuguese will first try to return Portuguese labels before the English ones. This feature also accommodates completely custom fallbacks.

The same goes for the underlying Flow Framework and the Neos CMS. This is only a front end thing though. On the back end, a user can only configure one language. If there’s no translation in that language, it will fall back to the site’s default and eventually to English.

TYPO3 Back end Language

TYPO3 Back end Language Setting

WordPress.com

On WordPress.com there’s a user interface language selection in the account settings. One cannot select multiple preferred languages though, but only one.

WordPress.com User Interface Language

WordPress.com User Interface Language Setting

Facebook

Facebook only has a really simple language settings. Although a user can select multiple languages they understand for use in the News Feed, they can only choose one specific locale for the overall UI.The locales that Facebook supports are referenced in the Facebook Locales XML file. That file includes multiple variants for various languages, but only one variant for others. For example, there’s only de_DE, but no de_CH. Plus, you can’t choose between formal and informal variants.

Facebook User Language Setting

Facebook User Language Setting

Summary

We’ve now covered a bunch of well-known web platforms and content management systems to see how they are handling this problem. Various techniques exist to assist with finding the right translation. Sometimes they are automated, but most of the time the choice is left up to the individual user.

This research should help with the next steps of the Preferred Languages project as these observations need to be adapted for WordPress so that we can learn from them. Feel free to leave any comments about this research in the comments. After that, I try to schedule a new meeting in December where the next steps can be discussed.

#feature-projects, #i18n, #preferred-languages

Preferred Languages Chat Summary: October 26

After introducing the Preferred Languages project less than two weeks ago, we held our first meeting on Wednesday. This is a summary of the Preferred Languages chat from October 26. (Slack log)

Attendees: @casiepa, @chantalc, @petya, @swissspidy

Topics

After going through initial feedback on the introduction post, there was a short recap of ideas that have been suggested so far:

  • Have a translated setting of preferred languages, where the translators would define the locales they should “fall back” to in case there’s no translation available. E.g. de_CH states to prefer de_DE and de_DE_formal after that.
    • Pros: simple solution, no UI changes needed
    • Cons: very subjective, unclear how the expected translations would be installed
  • Rely on the Accept-Language header every browser sends, which is set based on the user’s system preferences.
    • Pros: no UI changes needed, it’s a reliable standard
    • Cons: no support for formal language variants, unclear how the expected translations would be installed
  • Extend the UI to allow setting multiple preferred languages
    • Pros: leaves the choice to the user, clear how to install the expected translations, could be used together with Accept-Language
    • Cons: makes the UI option more complex

Besides that, it was suggested to display a note to the user when there’s a missing translation to encourage them to translate WordPress. However, this can be worked on independently in #23348.

Before we get into details and possible solutions, more (non-technical) research needs to be done. We’re mostly interested in how other platforms and content management systems handle this. Think Wikipedia, Facebook, Drupal, Joomla, and so on. WordPress isn’t the only system facing this situation.

Research results can be shared in the comments to this post. Every help in that regard is highly appreciated! So far I’ve collected some information about Drupal and Joomla. After that, I’ll publish the results in a separate blog post.

The next meeting will be on Wednesday, 9 November 2016, 17:00 UTC in the #core-i18n Slack channel.

#feature-projects, #i18n, #preferred-languages, #summary

Preferred Languages

With the introduction of language packs two years ago ([29630]), it’s easier than ever for users to change the main language of their site. However, in some cases a single locale (i.e. the language variant, like Canadian French) is not enough.

Let’s say a site of mine is running German (Switzerland), which there is a language pack for. However, most plugins only have a German (Germany) translation, or perhaps only even a de_DE_formal translation. As a native German speaker, I’d prefer to read a German (Germany) translation instead of English, if a German (Switzerland) version did not exist. Instead of getting translations as I’d wish (as the translations are very similar), WordPress falls back to the original English strings. That’s a poor user experience for many non-English speakers. Now, since the addition of a user-specific language (#29783), this issue is even more important.

There’s been a long discussion about this issue in #28197, where possible solutions have been suggested without any consensus. Instead of directly talking about how this can be technically implemented, we should first explore the actual user experience problems and see what’s possible and how it might look.

For this, I began researching how other software approach this problem. Those of us interested in this problem can learn from existing solutions and proceed from there.

These screenshots should give you a better understanding of what I’m talking about:

As you can see, these software products create a “fallback chain” for the user’s preferred languages. In theory, I could also set my preferred languages to es_ES -> de_DE -> en_GB -> en_US, if that was the order in which I preferred translations.

To keep momentum and continue thinking through this problem, I want to kick off a feature project, about improving the experience for WordPress users who use the product in non-English languages, of which multiple locales may exist.

Get Involved

Your feedback is incredibly important to ensuring we get this right. Leave any thoughts in the comments below. Would you like to help out? Awesome. Let’s have an initial chat on Wednesday, 26 October 2016, 17:00 UTC in the #core-i18n Slack channel and go from there.

I’ve set up a GitHub repository that can be used as a playground for discussions, prototypes, and eventually a working solution. For this, design and accessibility feedback would be very helpful. I’m confident that we can build something that we can propose for inclusion in a future WordPress release!

#feature-projects, #i18n, #preferred-languages

Shiny Updates v3 Kickoff Chat

While the biggest chunk of Shiny Updates v2 was merged into WordPress 4.6, there were still plenty of ideas that didn’t make it in time. I think it would be worthwhile to bring more shininess to WordPress 4.8 or 4.9. Regular chats have been dormant for a while, but I’d like to continue them starting Wednesday, 19 October 2016, 17:00 UTC in the #feature-shinyupdates Slack channel.

Topics for the first chat will include a brief update on current shiny updates bugs in WordPress core and the planned goals for Shiny Updates v3 (e.g. update-core.php). Some of the ideas can be found on GitHub.

Now is a great time to get involved and help making WordPress updates even more shiny. Please come join us next week and contribute to the continuing abolishment of The Bleak Screen of Sadness™.

#feature-projects, #shiny-updates

Customize Changesets (formerly Transactions) Merge Proposal

This is a merge proposal and overview of Customize Changesets (#30937), a project formerly known and proposed as Customizer Transactions back in January 2015. The customizer is WordPress’s framework for doing live previews of any change on your site. One of the biggest problems the customizer faces right now is that changes are ephemeral. If you navigate away, you lose what you are working on. Additionally, you can not share proposed changes with others, nor can you take the changes you are working on and save to continue working later.

Imagine a WordPress user named Tina. Tina is building a website for her daughter’s band. The band has been getting more and more popular, and Tina wants to experiment with some new widgets in the sidebar but also wants to be able to share the proposed solutions with her husband Matthew. Right now, Tina and Matthew would need to be in the same room to collaborate. Or Tina would need to take screenshots, but that kind of defeats the purpose of live preview. If only there was a way to make customizer changes persistent without publishing them.

Introducing Customize Changesets

Customize changesets make changes in the customizer persistent, like autosave drafts. Users can make changes to one theme and switch to another in the customizer without losing the changes upon switching. A customizer session can be bookmarked to come back to later or this URL can be shared with someone else to review and make additional changes (the URLs expire after a week without changes). The new APIs make possible many new user-facing features in future releases and feature plugins, including saving long-lived drafts, submitting changesets as pending for review, scheduling changes, seeing the previewed state on the frontend without being in an iframe, sharing preview URLs with others who do not have customizer access, and others.

Customize changesets allow each change you make in the customizer for a given live preview session to be persistent in the database. A unique identifier (a UUID like f67efbbf-c663-4271-ab1c-95ce1d447979) for each live preview session is generated and as soon as a change is made, the change setting value is sent in an Ajax request to be written into a custom post type whose post_name is the UUID for the customizer session. Once the changes have been written into the changeset post, then any request to WordPress (including to the REST API) can be made with a customize_changeset_uuid query param with the desired UUID and this will result in the customizer being bootstrapped and the changes from that changeset being applied for preview in the response. The unique UUID means that customizer sessions can be sent to other users and also that they can be used as query parameters on the front end.

Design and Technical Decisions

For the initial core merge, no UI changes are being proposed. The feature will only be exposed as the new query parameter on the URL. Adding a UI to this feature will happen in a future release. As such, the proposal for customize changesets is similar to the proposal for including the REST API infrastructure: it provides a foundation for new core features in future releases and a platform for plugins to add new features. Nevertheless, while the customize changesets patch doesn’t introduce any new features it does fix several long-standing issues related to incompatibilities between JavaScript running on the site’s frontend when previewed in the customizer. Under the hood, the customize changesets patch touches on many of the lowest level pieces of the customizer. Please check out the Customize Changesets Technical Design Decisions to see what is happening under the hood.

Testing

Please test! If you use any plugins that extend the customizer, please ensure that there aren’t any regressions. The patch is intended to be fully backwards compatible and users shouldn’t notice any difference in normal use. Two things to look for when testing is as soon as you make a change, you should see a customize_uuid query param added to the URL. You should be able to reload and find your changes persist (note the AYS dialog is retained because there is no UI yet for listing changesets). Also, when you navigate around the preview it should feel much more natural like normal browsing as opposed to having a fade effect. Otherwise, previewing settings that require refresh should still work as normal, as will settings that preview with JavaScript and selective refresh.

The patch is in a GitHub pull request and you can apply the patch via:

grunt patch:https://github.com/xwp/wordpress-develop/pull/161

If you’re using the Git repo from develop.git.wordpress.org then you can check out the branch directly via:

git remote add -f xwp https://github.com/xwp/wordpress-develop.git && git checkout xwp/trac-30937

I’d appreciate code review feedback directly on the pull request. For any revisions to the patch, please open a pull request to that trac-30937 branch if possible.

The Future

In future releases we can explore new UIs to take advantage of the new capabilities that changesets provide. New UIs can provide a way to schedule changes, the ability to undo the last change, show an audit log (revision history) for changes, collaborative editing of a customizer changeset, and so on. Future feature projects will explore many of these and feature plugins will start to prototype them.

Thanks to @jorbin who contributed to this proposal post.

#4-7, #customize, #feature-projects, #proposal

WP_Hook: Next Generation Actions and Filters

WordPress 4.7 introduces a significant reworking of action and filter iteration to address bugs that arose from recursive callbacks and from callbacks that changed the hooked callbacks on currently running actions/filters.

What does this mean for you, the plugin or theme developer? In almost all cases, nothing. Everything should continue to run as expected, and this should fix a number of hard-to-trace bugs when different plugins are stepping on each others callbacks.

Who is affected?

If your plugin directly accesses the $wp_filter global rather than using the public hooks API, you might run into compatibility issues.

Case 1: Directly setting callbacks in the $wp_filter array

$wp_filter['save_post'][10]['my_special_key'] = array( 'function' => 'my_callback_function', 'accepted_args' => 2 );

This will no longer work, and will cause a fatal error. $wp_filter['save_post'] is no longer a simple array. Rather, it is an object that implements the ArrayAccess interface.

You have two options to work around. The first (and preferred) method is to use the add_filter() or add_action() functions.

add_action( 'save_post', 'my_callback_function', 10, 2 );

If, for some reason, you absolutely cannot, you can still work around this.

if ( ! isset( $wp_filter['save_post'] ) ) {
    $wp_filter['save_post'] = new WP_Hook();
}
$wp_filter[ 'save_post' ]->callbacks[10]['my_special_key'] = array( 'function' => 'my_callback_function', 'accepted_args' => 2 );

Case 2: Directly unsetting callbacks in the $wp_filter array

unset( $wp_filter['save_post'][10][ $my_callback_id ] );

This will fail for the same reason as case one. To work around, you can use the standard remove_filter() / remove_action() functions.

remove_action( 'save_post', 'my_callback_function', 10, 2 );

Or, if you absolutely must access the array directly:

if ( isset( $wp_filter[ 'save_post' ] ) ) {
    unset( $wp_filter['save_post']->callbacks[10][ $my_callback_id ] );
}

Case 3: Checking if a hook is an array

if ( isset( $wp_filter['save_post'] ) && is_array( $wp_filter['save_post'] ) ) {
    // do something
}

This will always return false. $wp_filter['save_post'] is a WP_Hook object. To check if a hook has callbacks, use has_action() or has_filter().

if ( has_action( 'save_post' ) ) {
    // do something
}

Case 4: Moving the $wp_filter array pointer manually

If you’re calling next() or prev() on the $wp_filter array pointer to manually manage the order that callbacks are called in (or if you’re doing it to work around #17817), you will likely be unsuccessful. Use callback priorities, add_action() / add_filter(), and remove_action() / remove_filter() appropriately and let WordPress manage execution order.

Other Cases

Almost all other cases where you might manipulate the $wp_filter global directly should continue to function as expected. The WP_Hook object implements the ArrayAccess and IteratorAggregate interfaces so that, while it’s not recommended, you may continue to iterate over callbacks in $wp_filter or directly retrieve priorities from the callback array.

Props

While there were many contributors of the course of this ticket, all of whom are listed in the changeset, I’d like to especially thank @jbrinley, for his work in architecting this solution, researching and testing potential compatibility issues, and for authoring the bulk of this post.

What’s Next

We’re confident in how solid the code is for the majority of sites, and for the major edge cases, so now it’s time to add the code to the nightly build, so all y’all can test it against your sites and plugins.

This is being treated as an early commit of a feature project, so the final decision for whether this code will be kept in WordPress 4.7 will be made no later than beta 1, which gives us a month and a half to see how it effects usage in the wider world.

The current nightly build contains this update, for your testing enjoyment.

If you have any feedback on the code, please leave a comment on this post. Please post any bugs you find to Core Trac.

#4-7, #dev-notes, #feature-projects, #hooks, #plugins

Shiny Updates V2: Partial Merge Approved

During the extra meeting held today, Monday June 13th, 2016 at 13:00 MDT, Shiny Updates V2 was approved for a partial merge to core. The changes that will be merged are install, update, and delete for themes and plugins in single and multisite.

This affects:

  • plugin.php
  • plugin-install.php
  • import.php
  • themes.php
  • themes-install.php
  • The “more-detail” modals in those screens

The parts that will not be merged are the changes to update-core.php. The team now has time until Wednesday, June 15th 23:59 UTC to get the core patch committed.

Congratulations and thanks to the Shiny Updates team for all the hard work they’ve put in this release. Most notably @obenland, @swissspidy, and @mapk. Thanks to everyone who reviewed, helped test, or otherwise contributed to Shiny Updates V2, we hope to see you involved with Shiny Updates V3. 🙂

The full chat log is located here: https://wordpress.slack.com/archives/core/p1465844429000062

#4-6, #feature-projects, #shiny-updates

New time for feature projects chat on Tuesday, April 19

Time change

Per the comments on the introduction post, the next feature project chat will be held at April 20 01:00 UTC. This will alternate with the first meeting’s time of 17:00 UTC (1:00PM Eastern) (sorry for the initial error, math is hard). Since there are a number of people in place where only one of the times is at a reasonable hour, as activity picks up we will likely move to weekly meetings that alternate times; for now, we will remain at biweekly (i.e. each time will occur every 4 weeks).

Meeting Agenda

  • A look at feature project pages.
  • Check-ins with existing feature projects.
  • Call for feature projects and feature project ideas.
  • Open floor.

To help keep us on track and prevent us from missing things, please comment below with any feature projects/ideas and a brief statement of purpose, as seen on the feature projects page. If you have items for the open floor, also add those in the comments.

#feature-projects