JavaScript Internationalization: The Missing Pieces

Back in 2016, work started on building a proper JavaScript internationalization API and the tooling to support it throughout WordPress core and WordPress.org. Many ideas and patches were being discussed. A summary of that can be found in this blog post. With Gutenberg on the rise, JavaScript I18N is more urgent than ever. WordPress needs a robust solution for that, and some things have already been built. Let’s have a look at where we currently stand.

Status Quo

Right now, Gutenberg is using a custom built JS I18N library that is similar to the one originally proposed in 2016 as part of #20491. It lies on top of a library called Jed which bring Gettext functionality to JavaScript. This means developers can use the same __() function as in PHP and therefore don’t have to learn anything new. WordPress can take it from there.

Unfortunately, WordPress doesn’t yet support JS I18N library. Gutenberg (or any other plugin that uses said library, really) has to jump through quite some hoops to actually localize their JavaScript:

  1. Scan JavaScript files to extract internationalization functions and create a POT file using tools like babel-plugin-makepotPoedit or xgettext-js.
  2. Use that POT file to write the exact same internationalization functions in a “fake” PHP file that can be scanned by the WordPress.org translation platform. This will result in PO and MO files containing all of your plugin’s translations.
  3. Figure out a way to load these translations and make them available to your JavaScript using wp_add_inline_script(). Ideally you’d only load the ones needed by that specific script as you don’t want to print thousands of strings in that inline JS when you only need a few of them.

An example of that process can be found in my demo Gutenberg I18N Block plugin.

At this point you might want to go back to good old wp_localize_script() and simply keep using that for internationalization purposes. I don’t blame you.

However, this complicated process is only needed because the work on JavaScript internationalization is far from done yet. Gutenberg made it quite obvious where things need to be improved.

What’s Missing

Scanning JavaScript files for internationalization functions

First and foremost, the WordPress.org translation platform needs to be able to scan JavaScript files for internationalization functions in addition to just the PHP files. However, that’s not as straightforward as it sounds.

The platform uses a script called makepot.php to scan PHP files all across the WordPress.org ecosystem, i.e. core, meta, and all default themes. In addition to regular Gettext function calls it also scans plugin and theme file headers. Being included in many other libraries, makepot is a widely used tool. Most recently, its functionality was ported to a WP-CLI command to make string extraction easier to use.

On the other side we have babel-plugin-makepot, a tool written in JavaScript to scan JavaScript files. With the ECMAScript standard evolving so quickly, it is natural to write such a tool in the same language. However, it’s not a requirement, as this pull request for said WP-CLI command demonstrates. This opens some questions:

Can we simply use that Babel plugin on WordPress.org? What happens to makepot.php? What are the implications for all the developers out there not hosting their projects on WordPress.org? Not everyone uses the Babel transpiler, and certainly not everyone wants to use two separate tools just to extract some internationalization functions.

Loading only specific set of translations

All translations for a plugin or theme are stored in one single PO / MO file per locale. Loading these translations is a slow process.  We’ve made some improvements in that regard over the years, for example by introducing just-in-time loading of translations in WordPress 4.6.

However, if you only need a handful of translations for a single script in your plugin, it does not make sense to load the entire MO file which can be dozens of kilobytes in size. There’s currently no way to load only a specific set of translations in WordPress. This is something that came up in Gutenberg before, see issue 6015.

Binary MO files don’t make sense in a JavaScript context anyway. Lucky for us, GlotPress—the software that powers translate.wordpress.org—has been able to export translations in a Jed-compatible JSON format since 2016. We just need to use that to export a JSON file for all strings extracted from JavaScript files

So in theory the WordPress.org translation platform could export PO and MO files as usual for strings extracted from PHP files, and a JSON file for all strings coming from JavaScript files. This would be already a huge improvement. But can we take this even further?

Option A

Use a different text domain per JavaScript module. Export a JSON file per text domain. This is appealing, but has to be ruled out quickly: the text domain is not known to GlotPress and is not stored in the database or anything.

Option B

GlotPress doesn’t know about the text domain, but it does know a string’s source file. What if it would export one JSON file per source file it has scanned? This way WordPress has full control over the translations and one could specify which JSON files need to be loaded for a specific module.

The big drawback here: a single module might consist of dozens of source files. Having one JSON file for each of those is not going to scale well.

The built JavaScript file can’t be scanned either, because tools like UglifyJS rename functions and strip out comments.

Option C

Don’t do anything fancy. Sticking with a JSON file already guarantees that a plugin doesn’t unnecessarily load all the translations needed just in PHP. So the file size is definitely smaller. Still, this file alone can be very large for an application like Gutenberg.

Option D

Keep one single JSON file for all translations, but use some PHP code to only ever pass the strings to a module / script handle that it actually needs. However, there’s probably no real benefit in doing so.

Easily load translations

Up until WordPress 4.6, developers needed to use load_plugin_textdomain or load_theme_textdomain() to make sure translations are properly loaded. Now, you only need to use the various translation functions and the rest just works. The only requirement is that your translation files reside in wp-content/languages. This is usually the case when your project is hosted on WordPress.org.

We should aim for a similar experience for JavaScript translations as well. While just-in-time loading of translation files via HTTP isn’t really possible due to the asynchronous nature of JavaScript, WordPress should still make it as easy as possible.

Imagine having a plugin foo-plugin and you’re enqueuing your JavaScript like this:

wp_enqueue_script( 'foo-script', plugins_url( '/foo-script.js' , __FILE__ ) );

Ideally, all you’d need to do to translate it is calling a function like load_js_textdomain( 'foo-plugin' ).  WordPress would then do all the heavy lifting.

However, other options might exist, and this solution would need to be tested in the wild with bigger projects like Gutenberg.

Discussion

Bringing a JavaScript I18N API to WordPress will have a huge impact. We need to make sure we end up with a solid plan that works for as many plugins and themes as possible.

Ideally, we hold a separate JS I18N meeting with all the teams primarily involved: #core-i18n, #core-js, #core-editor, #meta-i18n and #cli. Everyone is welcome to attend though 🎉

I suggest the following date for such a meeting: Tuesday, May 8 15:00 UTC. Of course I’m open for other suggestions. The Slack channel would be #core-i18n.

At this meeting we can discuss the missing pieces outlined in this post and the overall next steps for JavaScript I18N in WordPress.

If you have any questions or concerns about this post or the overall topic, please leave a comment below.

+make.wordpress.org/polyglots