Preferred Languages: The Prototype

A bit over six months ago I set the foundation for the Preferred Languages feature project. After highlighting the problems many WordPress users around the world are facing, some time was spent on researching popular platforms and other systems to see how they handle the issue of setting multiple preferred languages. However, this didn’t really help to move one step forward and the project became dormant — until now.

Personally, I’ve been experiencing the same problems with so-called language fallbacks over and over again. During WordCamp Bilbao I decided to revive the Preferred Languages project. In order to do this I used the previously mentioned GitHub repository to build a super-simple proof-of-concept plugin to have a new basis for discussion. As originally envisioned, this repository can be used as a playground for discussions, prototypes, and eventually a working solution.

This new plugin lets you select multiple preferred languages in your settings. WordPress then tries to load the translations for the first language that’s available, falling back to the next language in your list.

A sneak peek of an early version quickly made the rounds on Twitter:

After some very positive feedback I worked a bit more on it and wanted to share the latest version here with a larger group of people. Here’s what it currently looks like on both the settings screen and when editing your profile:

Without much further ado, please check out the plugin on GitHub, test it on your local WordPress site, and leave some feedback!

#feature-projects, #i18n, #preferred-languages

Preferred Languages: Research

Following the first meeting of the Preferred Languages project, some time was spent on researching popular platforms and other systems to see how they handle the issue of setting multiple preferred languages. To recap, this is needed to show things in the most suitable language for users in case their requested language is not installed.

Browsers & Operating Systems

All major browsers and operating systems have some UI where the user can set their preferred languages. Mostly used for the Accept-Language HTTP header, one can set multiple languages and order them by preference. These systems usually know multiple variants like German and German (Switzerland), but not formal / informal variants.

Worth noting that on these systems you can often choose more settings that are influenced by the language, like temperature units and date formats. Related: #18146.

The Accept-Language HTTP Header

RFC 7231 about the HTTP/1.1 standard says the following about this header:

The “Accept-Language” header field can be used by user agents to indicate the set of natural languages that are preferred in the response. Each language-range can be given an associated quality value representing an estimate of the user’s preference for the languages specified by that range.

This header is quite powerful. For example, Accept-Language: da, en-gb;q=0.8, en;q=0.7 would mean: “I prefer Danish, but will accept British English and other types of English”. I can only recommend reading more about it in the RFC, as it helps getting a better understanding of the problems it tries to solve.

This header is usually used by websites to redirect users to the correct version or display a hint like “This content is also available in XY”. While each language in the header has a specific numeric priority, there is usually only a drag & drop interface to determine the order.

Unicode Common Locale Data Repository

The Unicode CLDR provides key building blocks for software to support the world’s languages, with the largest and most extensive standard repository of locale data available. It contains an interesting chart about Language Matching, data that is used to match the user’s desired language/locales against an application’s supported languages/locales.

There’s also a technical standard about Unicode Locale Data Markup Language (LDML), an XML format for the exchange of structured locale data. The section about Locale Inheritance and Language Matching is particularly interesting. For instance, it describes finding the most well suited language using a weighted graph and gives a better picture of dealing with more complex language fallbacks.

It also takes geographic “closeness” into account, arguing that English (Slovakia) should fall back to something within Europe (e.g. British English) in preference to something far away and unrelated like English (Singapore). It’s clearly stated that these fallbacks aren’t as simple as just saying Spanish (Mexico) -> Spanish (Spain) -> English (US).

Note that this technical standard is about finding the best supported locale based on the requested list of languages. The requested list could come from different sources, such as such as the user’s list of preferred languages in the OS Settings, or from a browser’s Accept-Language list.

Wikipedia

Wikipedia is built on the MediaWiki software, which has a hardcoded list of fallback chains for some locales. This is where MediaWiki will fall back on a different language if it cannot find what it needs. For example, French (Cajun) automatically falls back to French (France) when it doesn’t have all messages defined in it. Unfortunately, there’s only little documentation about this.

Apart from that, MediaWiki distinguishes between various kinds of languages: the site content language, the user interface language and the page content language. The latter can differ from the first two and it influences the language the user views the page in, which depends on the user’s preferences, the available languages, and the defined fallbacks.

It’s worth noting that on Wikipedia, you can not only select your preferred language, but also your preferred gender. In addition to that, you can select multiple different locales for both display and input.

Joomla

I tried the official Joomla Demo to test its language management which is a bit overwhelming at first. After installing all the available languages the user is eventually able to select their preferred language for the front end and the back end. There are a few other settings for language switching on multilingual sites, but that’s pretty much it. So basically the same settings as currently in WordPress 4.7.

Joomla User Settings

Joomla User Settings

 

Drupal

Drupal 8 has a rather powerful user interface text language detection mechanism. There is a per session, per user and per browser option in the detection settings. However, users can only choose one language, so they cannot say (in core at least) that they want German primarily and Spanish if German is not available. But the language selected by the user is part of the larger fallback system, so it may fall back further down to other options.

The Language fallback module allows defining one fallback for a language, while the Language Hierarchy module provides a GUI to change the language fallback system. It allows setting up language hierarchies where translations of a site’s content, settings and interface can fall back to parent language translations, without ever falling back to English. This module might be the most interesting one for our research.

Drupal Language Hierarchy

Drupal Language Hierarchy

TYPO3

TYPO3 has had a locale hierarchy since 2011. Quote:

TYPO3 4.6 comes with a clever fallback mechanism when a label is not found in the requested language; instead of returning the default (English) version, it allows you to define your own hierarchy of locales.

By default, French (Canada) will first use French before falling back to English. Similarly, missing labels in Brazilian Portuguese will first try to return Portuguese labels before the English ones. This feature also accommodates completely custom fallbacks.

The same goes for the underlying Flow Framework and the Neos CMS. This is only a front end thing though. On the back end, a user can only configure one language. If there’s no translation in that language, it will fall back to the site’s default and eventually to English.

TYPO3 Back end Language

TYPO3 Back end Language Setting

WordPress.com

On WordPress.com there’s a user interface language selection in the account settings. One cannot select multiple preferred languages though, but only one.

WordPress.com User Interface Language

WordPress.com User Interface Language Setting

Facebook

Facebook only has a really simple language settings. Although a user can select multiple languages they understand for use in the News Feed, they can only choose one specific locale for the overall UI.The locales that Facebook supports are referenced in the Facebook Locales XML file. That file includes multiple variants for various languages, but only one variant for others. For example, there’s only de_DE, but no de_CH. Plus, you can’t choose between formal and informal variants.

Facebook User Language Setting

Facebook User Language Setting

Summary

We’ve now covered a bunch of well-known web platforms and content management systems to see how they are handling this problem. Various techniques exist to assist with finding the right translation. Sometimes they are automated, but most of the time the choice is left up to the individual user.

This research should help with the next steps of the Preferred Languages project as these observations need to be adapted for WordPress so that we can learn from them. Feel free to leave any comments about this research in the comments. After that, I try to schedule a new meeting in December where the next steps can be discussed.

#feature-projects, #i18n, #preferred-languages

Preferred Languages Chat Summary: October 26

After introducing the Preferred Languages project less than two weeks ago, we held our first meeting on Wednesday. This is a summary of the Preferred Languages chat from October 26. (Slack log)

Attendees: @casiepa, @chantalc, @petya, @swissspidy

Topics

After going through initial feedback on the introduction post, there was a short recap of ideas that have been suggested so far:

  • Have a translated setting of preferred languages, where the translators would define the locales they should “fall back” to in case there’s no translation available. E.g. de_CH states to prefer de_DE and de_DE_formal after that.
    • Pros: simple solution, no UI changes needed
    • Cons: very subjective, unclear how the expected translations would be installed
  • Rely on the Accept-Language header every browser sends, which is set based on the user’s system preferences.
    • Pros: no UI changes needed, it’s a reliable standard
    • Cons: no support for formal language variants, unclear how the expected translations would be installed
  • Extend the UI to allow setting multiple preferred languages
    • Pros: leaves the choice to the user, clear how to install the expected translations, could be used together with Accept-Language
    • Cons: makes the UI option more complex

Besides that, it was suggested to display a note to the user when there’s a missing translation to encourage them to translate WordPress. However, this can be worked on independently in #23348.

Before we get into details and possible solutions, more (non-technical) research needs to be done. We’re mostly interested in how other platforms and content management systems handle this. Think Wikipedia, Facebook, Drupal, Joomla, and so on. WordPress isn’t the only system facing this situation.

Research results can be shared in the comments to this post. Every help in that regard is highly appreciated! So far I’ve collected some information about Drupal and Joomla. After that, I’ll publish the results in a separate blog post.

The next meeting will be on Wednesday, 9 November 2016, 17:00 UTC in the #core-i18n Slack channel.

#feature-projects, #i18n, #preferred-languages, #summary

Preferred Languages

With the introduction of language packs two years ago ([29630]), it’s easier than ever for users to change the main language of their site. However, in some cases a single locale (i.e. the language variant, like Canadian French) is not enough.

Let’s say a site of mine is running German (Switzerland), which there is a language pack for. However, most plugins only have a German (Germany) translation, or perhaps only even a de_DE_formal translation. As a native German speaker, I’d prefer to read a German (Germany) translation instead of English, if a German (Switzerland) version did not exist. Instead of getting translations as I’d wish (as the translations are very similar), WordPress falls back to the original English strings. That’s a poor user experience for many non-English speakers. Now, since the addition of a user-specific language (#29783), this issue is even more important.

There’s been a long discussion about this issue in #28197, where possible solutions have been suggested without any consensus. Instead of directly talking about how this can be technically implemented, we should first explore the actual user experience problems and see what’s possible and how it might look.

For this, I began researching how other software approach this problem. Those of us interested in this problem can learn from existing solutions and proceed from there.

These screenshots should give you a better understanding of what I’m talking about:

As you can see, these software products create a “fallback chain” for the user’s preferred languages. In theory, I could also set my preferred languages to es_ES -> de_DE -> en_GB -> en_US, if that was the order in which I preferred translations.

To keep momentum and continue thinking through this problem, I want to kick off a feature project, about improving the experience for WordPress users who use the product in non-English languages, of which multiple locales may exist.

Get Involved

Your feedback is incredibly important to ensuring we get this right. Leave any thoughts in the comments below. Would you like to help out? Awesome. Let’s have an initial chat on Wednesday, 26 October 2016, 17:00 UTC in the #core-i18n Slack channel and go from there.

I’ve set up a GitHub repository that can be used as a playground for discussions, prototypes, and eventually a working solution. For this, design and accessibility feedback would be very helpful. I’m confident that we can build something that we can propose for inclusion in a future WordPress release!

#feature-projects, #i18n, #preferred-languages