Emoji Chat Meeting Notes, February 12, 2015

The full meeting archive is available here.

1. Why we’re doing this

So, here’s a bit of back story.

As of r31349, WordPress partially supports emoji. ~60% of WordPress sites are running MySQLMySQL MySQL is a relational database management system. A database is a structured collection of data where content, configuration and other options are stored. https://www.mysql.com/. 5.5 or later (so can be upgraded to store emoji), and ~40% of browsers natively support emoji. Emoji are a wildly popular method of communication, so we can expect them to be heavily used as soon an they’re available. The problem is, 60%/40% means a really bad experience for a huge number of our users, who’ll try to use emoji, and fail.

This is where the emoji feature pluginFeature Plugin A plugin that was created with the intention of eventually being proposed for inclusion in WordPress Core. See Features as Plugins. comes in to play. In order to help the 40% of WordPress sites that can’t be upgraded to store emoji natively, the wp_encode_emoji() function will turn them into HTMLHTML HyperText Markup Language. The semantic scripting language primarily used for outputting content in web browsers. entities. Due to the unimaginable joy that character sets brings me, this will only be applied to sites using the utf8 character set, which accounts for the vast majority of WordPress sites – utf8 has been the default character set since r4860.

To help the 60% of browsers that don’t display emoji natively, we’re using the Twemoji image set as a fallback. This lets us show emoji everywhere, without causing extra load where emoji are already supported, mobile browsers being the important example here.

Now, there have been some concerns brought up previously that I’d like to address.

“Is this really appropriate for coreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress.?”

Yes. (Obviously that’s my answer, or we wouldn’t be here.) WordPress is is the business of making communication simple and accessible for all. Tech users everywhere have clearly chosen emoji as a means of communication, so it’s up to us to make sure they can do that within WordPress as easily as possible, or risk being left behind.

“Should we be concerned about changing the images in the future? Wouldn’t we be altering users’ content?”

No. By using Twemoji only when we can’t provide native support for emoji, it’s a pretty clear message that while the general appearance of emoji stays the same, the actual sprite used can differ significantly between platforms. (For example, every emoji set except Android uses a left hand for :thumbsup:.) As more browsers add native support for emoji, Twemoji usage will drop, reducing even further any impact we can have on users.

And so, that brings us to today.

2. The current state of the pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party

The plugin is very close to done. The editor plugin needs some attention, which @azaozz will be providing soon. There are a few bugs to discuss, which are mostly around fallback behaviour in browsers that don’t support emoji natively. Apart from that, the basic functionality is pretty much how I would expect it to appear in core. It’s had a brief review from the accessibilityAccessibility Accessibility (commonly shortened to a11y) refers to the design of products, devices, services, or environments for people with disabilities. The concept of accessible design ensures both “direct access” (i.e. unassisted) and “indirect access” meaning compatibility with a person’s assistive technology (for example, computer screen readers). (https://en.wikipedia.org/wiki/Accessibility) team, with only some minor alterations needed. The Twemoji images won’t be included in wordpress.zip, as it’s a total of 3.4MB of images. They’re currently hosted on WP.com’s CDN, but we’re investigating other options for where to host them, probably the W.org CDN. Given that the wp-adminadmin (and super admin) Dashboard also loads things from Google, I have no problem with hosting them on an external CDN. There will naturally be a filterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. on the URLURL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org, to allow local hosting for sites that don’t want to use the CDN.

One of the major concerns at the moment is that we’re going to be splitting data formats, depending on if the site uses the utf8mb4 or the utf8 character set. utf8mb4 stores emoji natively, while utf8 requires us to HTML encode the emoji characters. In the futures, we’ll look at upgrading sites to utf8mb4 if they’ve upgraded their MySQL since WordPress 4.2, but that leaves the potential for mixed encoding – old posts having HTML encoding, new posts having the native characters. A post will be automatically updated to native upon saving, but do we need to consider upgrade routines, to go through all old posts and convert them?

Export/import also needs thorough testing, particularly when importing and exporting between sites having different character sets.

3. Unicode 8.0: the future of emoji

To talk about the future of emoji, you need to know a little bit of history. At the basic level, emoji are all single characters defined within the Unicode standard. However, they also support modifiers. Modifiers are a second character following the first, which usually causes the two characters to be merged into a single character when rendered. A good example of this is flag emoji.

The character G is U+1F1EC. The character B is U+1F1E7 (these characters are different to their ASCII equivalent). When used individually, they’ll display as that letter. When combined next to each other, they’ll display as the British flag.

So, Unicode 8.0 will two interesting things: a set of 37 new emoji, and skin tone modifiers. When a skin tone modifier character is placed after any face or person emoji, the emoji will show with that skin tone. Unicode 8.0 is due to be finalised in August 2015, so we and (Twemoji) will be looking at adding support for these then.

From a technical perspective, it just means we need to be aware that emoji are not always one character, and the methods for detecting multi-character emoji are about to get more complex.

We’ll also be able to detect if a browser is able to render the new emoji and skin tones, and fall back to Twemoji if they can’t. I don’t have a timeline for when browsers will support the new emoji, so I think it’d be good for us to get ahead of the curve then.

utf8mb4 stores anything in the Unicode address space, including unallocated characters, so I don’t expect any problems with storage of new emoji.

#emoji, #x1f4a9