The utf8mb4 Upgrade

In WordPress 4.2, we’re upgrading tables to utf8mb4, when we can. Your site will only upgrade when the following conditions are met:

  • You’re currently using the utf8 character set.
  • Your MySQL server is version 5.5.3 or higher (including all 10.x versions of MariaDB).
  • Your MySQL client libraries are version 5.5.3 or higher. If you’re using mysqlnd, 5.0.9 or higher.

The difference between utf8 and utf8mb4 is that the former can only store 3 byte characters, while the latter can store 4 byte characters. In Unicode terms, utf8 can only store characters in the Basic Multilingual Plane, while utf8mb4 can store any Unicode character. This greatly expands the language usability of WordPress, especially in countries that use Han character sets. Unicode isn’t without its problems, but it’s the best option available.

utf8mb4 is 100% backwards compatible with utf8.

Due to index size restrictions in MySQL, this does mean we need to re-create a handful of indexes to fit within MySQL’s rules. Using a standard configuration, MySQL allows 767 bytes per index, which for utf8 means 767 bytes / 3 bytes = 255 characters. For utf8mb4, that means 767 bytes / 4 bytes = 191 characters. The indexes that will be resized are:


wp_usermeta.meta_key
wp_terms.slug
wp_terms.name
wp_commentmeta.meta_key
wp.postmeta.meta_key
wp_posts.post_name

And from Multisite:


wp_site.domain
wp_sitemeta.meta_key
wp_signups.domain

Of course, the Multisite (and wp_usermeta) keys obey the DO_NOT_UPGRADE_GLOBAL_TABLES setting. The upgrade will only be attempted once, though we’ll probably add a check in a future WordPress version to see if we can upgrade now (say, if you’ve upgraded your MySQL server since upgrading to WordPress 4.2).

If you’re a plugin developer and your plugin includes custom tables, please test that your indexes fit within MySQL’s limits. MySQL won’t always produce an error when the index is too big, so you’ll need to manually check the size of each index, instead of relying on automated testing.

EDIT: One more thing…

If you’d like to upgrade your custom tables to utf8mb4 (and your indexes are all in order), you can do it really easily with the shiny new maybe_convert_table_to_utf8mb4( $tablename ) function. It’s available in `wp-admin/includes/upgrade.php`, and will sanity check that your tables are entirely utf8 before upgrading.

#4-2, #dev-notes, #utf8mb4, #wpdb