Preferred Languages: Research

Following the first meeting of the Preferred Languages project, some time was spent on researching popular platforms and other systems to see how they handle the issue of setting multiple preferred languages. To recap, this is needed to show things in the most suitable language for users in case their requested language is not installed.

Browsers & Operating Systems

All major browsers and operating systems have some UIUI User interface where the user can set their preferred languages. Mostly used for the Accept-Language HTTPHTTP HTTP is an acronym for Hyper Text Transfer Protocol. HTTP is the underlying protocol used by the World Wide Web and this protocol defines how messages are formatted and transmitted, and what actions Web servers and browsers should take in response to various commands. headerHeader The header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor’s opinion about your content and you/ your organization’s brand. It may also look different on different screen sizes., one can set multiple languages and order them by preference. These systems usually know multiple variants like German and German (Switzerland), but not formal / informal variants.

Worth noting that on these systems you can often choose more settings that are influenced by the language, like temperature units and date formats. Related: #18146.

The Accept-Language HTTP Header

RFC 7231 about the HTTP/1.1 standard says the following about this header:

The “Accept-Language” header field can be used by user agents to indicate the set of natural languages that are preferred in the response. Each language-range can be given an associated quality value representing an estimate of the user’s preference for the languages specified by that range.

This header is quite powerful. For example, Accept-Language: da, en-gb;q=0.8, en;q=0.7 would mean: “I prefer Danish, but will accept British English and other types of English”. I can only recommend reading more about it in the RFC, as it helps getting a better understanding of the problems it tries to solve.

This header is usually used by websites to redirect users to the correct version or display a hint like “This content is also available in XY”. While each language in the header has a specific numeric priority, there is usually only a drag & drop interface to determine the order.

Unicode Common LocaleLocale A locale is a combination of language and regional dialect. Usually locales correspond to countries, as is the case with Portuguese (Portugal) and Portuguese (Brazil). Other examples of locales include Canadian English and U.S. English. Data Repository

The Unicode CLDR provides key building blocks for software to support the world’s languages, with the largest and most extensive standard repository of locale data available. It contains an interesting chart about Language Matching, data that is used to match the user’s desired language/locales against an application’s supported languages/locales.

There’s also a technical standard about Unicode Locale Data Markup Language (LDML), an XML format for the exchange of structured locale data. The section about Locale Inheritance and Language Matching is particularly interesting. For instance, it describes finding the most well suited language using a weighted graph and gives a better picture of dealing with more complex language fallbacks.

It also takes geographic “closeness” into account, arguing that English (Slovakia) should fall back to something within Europe (e.g. British English) in preference to something far away and unrelated like English (Singapore). It’s clearly stated that these fallbacks aren’t as simple as just saying Spanish (Mexico) -> Spanish (Spain) -> English (US).

Note that this technical standard is about finding the best supported locale based on the requested list of languages. The requested list could come from different sources, such as such as the user’s list of preferred languages in the OS Settings, or from a browser’s Accept-Language list.

Wikipedia

Wikipedia is built on the MediaWiki software, which has a hardcoded list of fallback chains for some locales. This is where MediaWiki will fall back on a different language if it cannot find what it needs. For example, French (Cajun) automatically falls back to French (France) when it doesn’t have all messages defined in it. Unfortunately, there’s only little documentation about this.

Apart from that, MediaWiki distinguishes between various kinds of languages: the sitesite (versus network, blog) content language, the user interface language and the page content language. The latter can differ from the first two and it influences the language the user views the page in, which depends on the user’s preferences, the available languages, and the defined fallbacks.

It’s worth noting that on Wikipedia, you can not only select your preferred language, but also your preferred gender. In addition to that, you can select multiple different locales for both display and input.

Joomla

I tried the official Joomla Demo to test its language management which is a bit overwhelming at first. After installing all the available languages the user is eventually able to select their preferred language for the front end and the back end. There are a few other settings for language switching on multilingual sites, but that’s pretty much it. So basically the same settings as currently in WordPress 4.7.

Joomla User Settings

Joomla User Settings

 

Drupal

Drupal 8 has a rather powerful user interface text language detection mechanism. There is a per session, per user and per browser option in the detection settings. However, users can only choose one language, so they cannot say (in coreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress. at least) that they want German primarily and Spanish if German is not available. But the language selected by the user is part of the larger fallback system, so it may fall back further down to other options.

The Language fallback module allows defining one fallback for a language, while the Language Hierarchy module provides a GUI to change the language fallback system. It allows setting up language hierarchies where translations of a site’s content, settings and interface can fall back to parent language translations, without ever falling back to English. This module might be the most interesting one for our research.

Drupal Language Hierarchy

Drupal Language Hierarchy

TYPO3

TYPO3 has had a locale hierarchy since 2011. Quote:

TYPO3 4.6 comes with a clever fallback mechanism when a label is not found in the requested language; instead of returning the default (English) version, it allows you to define your own hierarchy of locales.

By default, French (Canada) will first use French before falling back to English. Similarly, missing labels in Brazilian Portuguese will first try to return Portuguese labels before the English ones. This feature also accommodates completely custom fallbacks.

The same goes for the underlying Flow Framework and the Neos CMS. This is only a front end thing though. On the back end, a user can only configure one language. If there’s no translationtranslation The process (or result) of changing text, words, and display formatting to support another language. Also see localization, internationalization. in that language, it will fall back to the site’s default and eventually to English.

TYPO3 Back end Language

TYPO3 Back end Language Setting

WordPress.comWordPress.com An online implementation of WordPress code that lets you immediately access a new WordPress environment to publish your content. WordPress.com is a private company owned by Automattic that hosts the largest multisite in the world. This is arguably the best place to start blogging if you have never touched WordPress before. https://wordpress.com/

On WordPress.com there’s a user interface language selection in the account settings. One cannot select multiple preferred languages though, but only one.

WordPress.com User Interface Language

WordPress.com User Interface Language Setting

Facebook

Facebook only has a really simple language settings. Although a user can select multiple languages they understand for use in the News Feed, they can only choose one specific locale for the overall UI.The locales that Facebook supports are referenced in the Facebook Locales XML file. That file includes multiple variants for various languages, but only one variant for others. For example, there’s only de_DE, but no de_CH. Plus, you can’t choose between formal and informal variants.

Facebook User Language Setting

Facebook User Language Setting

Summary

We’ve now covered a bunch of well-known web platforms and content management systems to see how they are handling this problem. Various techniques exist to assist with finding the right translation. Sometimes they are automated, but most of the time the choice is left up to the individual user.

This research should help with the next steps of the Preferred Languages project as these observations need to be adapted for WordPress so that we can learn from them. Feel free to leave any comments about this research in the comments. After that, I try to schedule a new meeting in December where the next steps can be discussed.

#feature-projects, #i18n, #preferred-languages