Extending Unicode support in email addresses.

Eleven years ago, in Core-31992, someone proposed allowing non-US-ASCII email address support in WordPress. The software world has changed considerably since then: internationalized domain names and paths are uniformly handled in browsers, email systems support the wide range of Unicode characters as raw UTF-8, and UTF-8 is the only recommended text encoding for interchange between systems. This means that people are free to use their own names when communicating with others, whether they are Jake, Klรกra, เฆ†เฆฐเฆฟเฆฏเฆผเฆพ , เด…เดฎเตฝ, or any other name containing letters outside the A-Z range. Unfortuantely, WordPress has not kept up with these changes, and thatโ€™s what this post is all about.

This post is a request for comment on adding that support. There are a number of complications with potentially far-reaching implications.

TL;DR

  • WordPressโ€™ email sanitization is based on US-ASCII characters and needs to be relaxed to allow for valid UTF-8, but this introduces new risks, including but not limited to: confusable characters, equivalence through normalization, and non-visible characters.
  • Sites whose databases cannot store full UTF-8 may fail to save valid email addresses. This could be confusing to the site owner and to people attempting to sign up on the site unless properly communicated.
  • Any additional code that assumes emails are encoded as single-byte US-ASCII will need updating, specifically because it was always an invariant before that emails would not contain multi-byte Unicode characters. Filters may start seeing characters they believed were impossible to receive.

If you have experience with email issues, deployDeploy Launching code from a local development environment to the production web server, so that it's available to visitors. email services, or know about certain critical aspects of this proposal, please share your thoughts here or in Core-31992.

Continue reading โ†’

#charset, #email, #unicode