Emoji Chat Meeting Notes, February 12, 2015

The full meeting archive is available here.

1. Why we’re doing this

So, here’s a bit of back story.

As of r31349, WordPress partially supports emoji. ~60% of WordPress sites are running MySQL 5.5 or later (so can be upgraded to store emoji), and ~40% of browsers natively support emoji. Emoji are a wildly popular method of communication, so we can expect them to be heavily used as soon an they’re available. The problem is, 60%/40% means a really bad experience for a huge number of our users, who’ll try to use emoji, and fail.

This is where the emoji feature plugin comes in to play. In order to help the 40% of WordPress sites that can’t be upgraded to store emoji natively, the wp_encode_emoji() function will turn them into HTML entities. Due to the unimaginable joy that character sets brings me, this will only be applied to sites using the utf8 character set, which accounts for the vast majority of WordPress sites – utf8 has been the default character set since r4860.

To help the 60% of browsers that don’t display emoji natively, we’re using the Twemoji image set as a fallback. This lets us show emoji everywhere, without causing extra load where emoji are already supported, mobile browsers being the important example here.

Now, there have been some concerns brought up previously that I’d like to address.

“Is this really appropriate for core?”

Yes. (Obviously that’s my answer, or we wouldn’t be here.) WordPress is is the business of making communication simple and accessible for all. Tech users everywhere have clearly chosen emoji as a means of communication, so it’s up to us to make sure they can do that within WordPress as easily as possible, or risk being left behind.

“Should we be concerned about changing the images in the future? Wouldn’t we be altering users’ content?”

No. By using Twemoji only when we can’t provide native support for emoji, it’s a pretty clear message that while the general appearance of emoji stays the same, the actual sprite used can differ significantly between platforms. (For example, every emoji set except Android uses a left hand for :thumbsup:.) As more browsers add native support for emoji, Twemoji usage will drop, reducing even further any impact we can have on users.

And so, that brings us to today.

2. The current state of the plugin

The plugin is very close to done. The editor plugin needs some attention, which @azaozz will be providing soon. There are a few bugs to discuss, which are mostly around fallback behaviour in browsers that don’t support emoji natively. Apart from that, the basic functionality is pretty much how I would expect it to appear in core. It’s had a brief review from the accessibility team, with only some minor alterations needed. The Twemoji images won’t be included in wordpress.zip, as it’s a total of 3.4MB of images. They’re currently hosted on WP.com’s CDN, but we’re investigating other options for where to host them, probably the W.org CDN. Given that the wp-admin Dashboard also loads things from Google, I have no problem with hosting them on an external CDN. There will naturally be a filter on the URL, to allow local hosting for sites that don’t want to use the CDN.

One of the major concerns at the moment is that we’re going to be splitting data formats, depending on if the site uses the utf8mb4 or the utf8 character set. utf8mb4 stores emoji natively, while utf8 requires us to HTML encode the emoji characters. In the futures, we’ll look at upgrading sites to utf8mb4 if they’ve upgraded their MySQL since WordPress 4.2, but that leaves the potential for mixed encoding – old posts having HTML encoding, new posts having the native characters. A post will be automatically updated to native upon saving, but do we need to consider upgrade routines, to go through all old posts and convert them?

Export/import also needs thorough testing, particularly when importing and exporting between sites having different character sets.

3. Unicode 8.0: the future of emoji

To talk about the future of emoji, you need to know a little bit of history. At the basic level, emoji are all single characters defined within the Unicode standard. However, they also support modifiers. Modifiers are a second character following the first, which usually causes the two characters to be merged into a single character when rendered. A good example of this is flag emoji.

The character G is U+1F1EC. The character B is U+1F1E7 (these characters are different to their ASCII equivalent). When used individually, they’ll display as that letter. When combined next to each other, they’ll display as the British flag.

So, Unicode 8.0 will two interesting things: a set of 37 new emoji, and skin tone modifiers. When a skin tone modifier character is placed after any face or person emoji, the emoji will show with that skin tone. Unicode 8.0 is due to be finalised in August 2015, so we and (Twemoji) will be looking at adding support for these then.

From a technical perspective, it just means we need to be aware that emoji are not always one character, and the methods for detecting multi-character emoji are about to get more complex.

We’ll also be able to detect if a browser is able to render the new emoji and skin tones, and fall back to Twemoji if they can’t. I don’t have a timeline for when browsers will support the new emoji, so I think it’d be good for us to get ahead of the curve then.

utf8mb4 stores anything in the Unicode address space, including unallocated characters, so I don’t expect any problems with storage of new emoji.

#emoji, #x1f4a9