Internationalization goals for 4.0

Every few releases we’ve made a major push for improved internationalization. In 3.7, that was laying the groundwork for plugin and theme language packs (more on that soon). In 3.4, it was reducing all of the customizations that many localization teams needed to make. (There’s a page on that here.) In 4.0, I’d like to close the loop on a lot of this. Here’s what I’d like to accomplish, and I’ll need a lot of help.

  1. The first step installing WordPress should be to choose a language. The rest of the install process would then be in that language.
  2. You should be able to choose/switch a language from the general settings screen, after which the language pack should be downloaded.
  3. You should be able to search from the dashboard for plugins and themes that are available in your language.
  4. All localized packages should be able to be automatically generated and made available immediately as part of the core release process.
  5. Localized packages should only be used for initial downloads from WordPress.org. Instead, language packs should be transparently used for updates.

That’s the what. The how poses extensive challenges.

1) Choosing a language when installing WordPress. The rest of the install process would then be in that language.

osx-language

Want.

Split workflows: One issue that came up in #26879 (“friendlier welcome when installing WordPress”) is that setup-config.php is often not the initial entry point. A user installing WordPress through a hosting control panel is likely to be dumped onto the install screen (if not the dashboard directly).

Thus, we need to keep in mind that either screen might be the “intro” screen. This introduces technical challenges: If the user’s first step is setup-config.php, we don’t actually have WordPress fully loaded at this point, which makes actually installing a language pack more difficult. The install step has WordPress loaded in full, just without database tables. We should look into making setup-config.php load “all” of WordPress to make these environments easier to code.

Different approaches: There are two different ways to approach this: download language files immediately upon selection, or bundle barebones language files of the install screens for all supported languages. The latter is not the easiest thing to do (due to error messages and such, strings can come from all over) and also adds a delay to the adoption of new languages. The former would mean that we would want to pull a languages list from WordPress.org. (If we cannot reach WordPress.org, we would have no way of downloading the complete language pack, so this is not a big deal.) It’s still a bit of a challenge due to the split workflows problem.

Recommending a particular language: WordPress.org already recommends a language based on the browser’s settings (Accept-Language header) as well as using a more rough lookup based on IP address blocks. (HTML5 location could be used, but unless we’re also going to use that to set the user’s timezone, it seems excessive for a “choose your language” for the moment, especially since location is not as preferred anyway.)

Both would require the client to hit WordPress.org via JavaScript, versus a server HTTP request. We can do a server-side HTTP request to generate the list, then a client-side request to float recommended ones to the top. It’s possible to do it in two steps: the Accept-Language mapping can be done locally, while the IP-to-location table on WordPress.org has 2.9 million entries and would require a round-trip. Of course, if a user has downloaded a localized package directly from a local WordPress.org site, that would be the top recommendation.

Note that by language or locale we also must consider other translation variants, as there may be a language like Portuguese available in multiple locales (Brazil, Portugal) and further broken down by formal and informal variants. Each of these would be listed as their own “language.” c.f. #28303

2) You should be able to choose/switch a language from the general settings screen, after which the language pack should be downloaded.

WordPress MU had this feature and it still exists in multisite (though it’s pretty broken in terms of how it handles locales, #13069, #15677, #19760). This however only worked for already installed locales.

For single-site, the available languages should be fetched from WordPress.org and cached in a transient. (#13069 suggests using the GlotPress locale list, but as indicated above, we’d want to handle newly added or updated languages. This can then be displayed in a dropdown on the General Settings page. Upon selection and “Save Changes,” the language pack would be downloaded and enabled. We would likely use the oddly named “WPLANG” option since it’s what MU adopted and uses at both the site and network levels.

If we can’t reach WordPress.org, or if a filter is applied, then only installed languages will be listed. (An untrusted multisite network will likely have a default filter in place, to match existing behavior.) We should try to cache failures to .org (generally — not just here) so things aren’t terribly slow when developing a site without internet. We could possibly lazy-fetch the list only when the user expands the dropdown (or clicks a “Change” link or something).

Note this does not address two situations in particular: user-specific languages, or using the dashboard in English. These will be easier to do thanks to improvements in 4.0, but there still exist a number of edge cases where persistent translations can “bleed” into other situations. Examples include a comment moderation email being emailed to an English speaker but sent in the language of the logged-in commenter; or a string stored in the database like image metadata. When using a plugin, these are “edge cases.” When core adopts them, these become “bugs.” We will need to introduce a locale-switching method not unlike switch_to_blog() and also identify and work around any “persistence” issues. Not for 4.0, though we can get started on the groundwork.

3) You should be able to search from the dashboard for plugins and themes that are available in your language.

With language packs (more on that soon), plugins and themes will be able to be translated into any language. The goal would then be to have “localized” plugin and theme directories on the translated WordPress.org sites, listing just the plugins or themes available in their language. Note that even a plugin or theme without strings have a name and description that can be translated.

The plugin and theme install screens in the dashboard should similarly gain this ability. In all cases, there would be an option to expand the search to plugins/themes available in any language, of course — along with the invitation to translate them.

We will need to figure out how to make readme files translatable, which get parsed for the plugin directory (and eventually themes will gain these as well, possibly as part of these initiatives).

4) All localized packages should be able to be automatically generated and made available immediately as part of the core release process.

A localized package should merely consist of core WordPress plus a few translated files (the readme, the license, and the sample config file) and the PO/MO files. No other alterations should occur, which means translation teams will only need to translate, and builds can be automatically generated and made available immediately as part of the core release process.

Local modifications. Before 3.4, every install needed to make modifications to WordPress as part of their localized package, whether by actually replacing core files or using a {$locale}.php file. By my audit, this now applies to only 14 locales.

At the very least, we can allow the current system to work as-is for these locales. Ideally, though, we address the specific concerns of each locale and bring as many as possible into the fold. They include:

  • Some CSS modifications we can incorporate into core.
  • Workarounds for declension issues in a few locales, specifically regarding the names of months. #11226, also #21139, #24899, #22225
  • Adding major locale-specific oEmbed providers. #19980
  • Transliteration such as Romanization for Serbian. We can handle this by having two variants of Serbian, the way we have formal and informal European Portuguese. (But we’d automate these language packs, rather than forcing translations to happen twice.)
  • Significant changes in Uyghur (#19950), Farsi, Chinese, and Japanese (including the multibyte patch plugin).

Additional concerns:

License. Many locales also have an unofficial translation of the license. We should be able to collect any of these in a repository and include it automatically in a package. (Note that link is an explainer; no GPL version 2 translations are available from that page.)

Readme. Some locales directly translate readme.html. Others add a second file and use one of two forms: the translated word “readme” (example: leame.html would be Spanish) or using a suffix (“readme-ja.html” for Japanese). We should standardize this somehow. Technically the readme changes with each version due to the version bump, but we can automate that. Bigger readme changes are fairly rare, but we’ll still need to figure out how to have these translated and tracked.

wp-config-sample.php. This one is especially interesting. If a user has chosen a language on install, we don’t need the readme or license but we do need this file, as setup-config.php will display its contents in a textarea if we’re unable to write the file. We can store this file as a single translated string in a PO/MO file (in some automated fashion) or as part of the API response. We will also need some kind of way to translate and track this file to be included in localized packages that are downloaded.

5) Localized packages should only be used for initial downloads from WordPress.org. Instead, language packs should be transparently used for updates.

The WordPress.org API is set up to return a series of update “offers” — one of each type (“latest” a.k.a. reinstall, “update” and “autoupdate”), in both the site’s language and in English. This results in awkward double-upgrades even when no strings have been changed (updating first to English, then to the locale when it is available for that version) and is a lame experience. While auto-generating localized packages immediately will help, there’s no reason to actually use localized packages for updates. Any locale without local modifications can be set up as a language pack now.

As of 3.7, WordPress core supports receiving instructions for language packs. We’d simply stop issuing localized package offers, start issuing language pack offers, and rip out some code on update-core.php that has been handling the multiple-offer dance. This is the easiest of the five 4.0 tasks but is dependent on the very complex fourth task.

Next steps.

Here’s an overall outline of the things we need to do. I’ll be creating relevant core and meta Trac tickets in the coming days, and I’ve also referenced a few related tickets I was able to find.

Biggest problems to solve:

  • Figure out how to handle readmes, licenses, and wp-config-sample.php. This includes plugin (and theme) readmes as well. Remember we don’t want plugin developers to need to be involved in any way; and also that updates to these files will need to be handled elegantly (such that we have both ‘stable’ and ‘development’ tracks).
  • Eliminate all code modifications across all locales (as much as possible). Related, #20974.

WordPress.org/API work:

  • Create an API for core to use to fetch available locales for download.
  • Create an API for core to use to recommend a locale based on browser settings/location.
  • Auto-generate localized packages and core language packs.
  • Serve core language packs for updates, not localized packages. Related: #23113, #27164, #26914, #27752.
  • Mirror the plugin and theme directories to locale sites, with full translations (including readme data) and with selective searching.

Core work:

  • Add the ability to install a new language pack on demand.
  • Add a “switch to language” method, for languages that are installed. #26511
  • Add a screen that allows a locale to be chosen. May need to change how setup-config.php bootstraps WordPress.
  • Add a language chooser to the General Settings screen. #15677
  • Support searching for plugins/themes in a particular language, as well as leveraging translated data fields that come through from the API response. This mostly just means sending back the site’s language to WordPress.org. It also requires UI to reveal results from any languages. We could possibly filter/hide/show on the client side, if results were not paginated.
  • Take a machete to update-core.php’s localized build handling once everything else is done. Related: #25712, #28199.