GlotPress translation handling improvement

Current process

Currently all officially prepared translations are stored in one table, where they can be translated per localeLocale Locale = language version, often a combination of a language code and a region code, for instance es_MX denotes Spanish as it’s used in Mexico. A list of all locales supported by WordPress in https://make.wordpress.org/polyglots/teams/. This implicates that for every single original line, there are also lines per locale that need translation. With the growing amount of coreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress./pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party/theme/metaMeta Meta is a term that refers to the inside workings of a group. For us, this is the team that works on internal WordPress sites like WordCamp Central and Make WordPress./patterns, it results in millions of records in this main table. The main problem that now arrives is that this database contains duplicates for the same original. Also it implicates that these duplicates need translation over and over for every locale. That is a bit inefficient and needs improvement.

Improvement

To solve reducing the amount of records and work to be done, you could think at the following procedure change:

  • Create a second table that holds default translations.
  • Define a list of default translations.
  • Adapt the current editor to be able to choose between the two tables.
  • Adapt the WordPress update module to handle both tables.
  • Adapt the import procedure for translations.
  • Adapt the standard translation function in WordPress so default lines are searched in the second table first.
  • Define a method to store the default translations with a mark “Default”.
  • Instruct authors to use the new method for translations.
  • After implementation, remove the duplicate records from the main table

Explanation

If you have a second table with default translations, you save millions of unnecessary records as they only exist once per locale. The second advantage is, that a default original only needs to be translated once per locale. Third advantage is, that the update is less big. Now only the new lines not being a default need to be sent to the sites. Forth advantage is, less duplicate translation/copy needs to be done over and over. This also means less work for the PTEProject Translation Editor A Project Translation Editor (often referred to as PTE) is a person, who has access to validate strings on a specific project (for example BuddyPress, WooCommerce or Twenty Fourteen) for one specific locale. A project translation editor can approve strings that are added by translation contributors. Per project translation, editors are appointed by a general translation editor after a request by the project author or by the contributors themselves./GTEGeneral Translation Editor A General Translation Editor (often referred to as GTE) is a person, who has global access to validate strings on all projects for a specific locale.. If the implementation is done properly, it does not make a difference retrieving the translation locally.

Flow

Definition of default translations

There are a couple of default translations to determine, so here are a few examples

  • Countries
  • Currencies
  • Languages
  • The nouns “Submit”, “Save”, “Previous”, “Next”, “Theme”, “Button”
  • Complete sentences like “%1$s (%2$s of %3$s)”, “%1$s <span>by %2$s</span>”, “%1$s comment<span class=”screen-reader-text”> on %2$s</span>”

Comments

  • It does mean a lot of work, as there need to be made changes in a lot of places.
  • If the change is made, it will save a huge amount of work for the translators/PTE/GTE.
  • If the change is made and the database is sanitized, then it will decrease the number of records needed for translations.
  • A team should be formed to define the default list of translations.
  • The authors can use the default list, so do not need to think of creating their originals.

Statements polyglots meeting 08-09-2021

Below a listing of statements mentioned during the polyglots meeting.

  • Hard to start translating due to the amount of lines present
  • Reducing those amounts of equal lines can be done, by implementing the proposal above
  • Themes will benefit most, as they contain a lot of lines which are equal in every theme
  • Plugins will also benefit, but less then the Themes
  • Core will not benefit that much, as it does contain less common lines
  • Jesús Amieiro will check the proposal, he is positive about the idea
  • The Spanish team has a local project to do this
  • Once upon a time we had automatic propagation between translation projects. That was a FAIL, since there are cases where exactly the same stringString A string is a translatable part of the software. A translation consists of a multitude of localized strings. needs to be translated differently, depending on context.
  • For a lot of blocks and blockBlock Block is the abstract term used to describe units of markup that, composed together, form the content or layout of a webpage using the WordPress editor. The idea combines concepts of what in the past may have achieved with shortcodes, custom HTML, and embed discovery into a single consistent API and user experience. patterns it would be usefull as well
  • A developer needs to actively look at the library and pick a string, not adding a new one for the same translation
  • Plugin/theme authors have an incentive to stick with the common list, if the list is extensive and well-translated.
  • With TM in place it is less a burden
  • TM has a lot of worse translations for the same line
  • If the list is used there is less translation to do
  • Often users pick the wrong line from TM
  • There are to many suggestions made by TM, and they are often very bad
  • Would it take the glossary in account, yes it does as the list is localized.
  • A default list should be defined which can be amended if necessary
  • It would take education of development to start using this mechanism
  • Benefits are if the list is setup, less corrections to be made, less translations

This list should help considering and designing this proposal. Feel free to add comments to it, so we will get a proper idea how to implement it.