{"id":121742,"date":"2026-05-22T19:46:43","date_gmt":"2026-05-22T19:46:43","guid":{"rendered":"https:\/\/make.wordpress.org\/core\/?p=121742"},"modified":"2026-05-22T20:00:48","modified_gmt":"2026-05-22T20:00:48","slug":"extending-unicode-support-in-email-addresses-usernames-and-slugs","status":"publish","type":"post","link":"https:\/\/make.wordpress.org\/core\/2026\/05\/22\/extending-unicode-support-in-email-addresses-usernames-and-slugs\/","title":{"rendered":"Extending Unicode support in email addresses."},"content":{"rendered":"<p class=\"wp-block-paragraph\">Eleven years ago, in <a href=\"https:\/\/core.trac.wordpress.org\/ticket\/31992\">Core-31992<\/a>, someone proposed allowing non-US-ASCII email address support in WordPress. The software world has changed considerably since then: internationalized domain names and paths are uniformly handled in browsers, email systems support the wide range of Unicode characters as raw UTF-8, and UTF-8 is the only recommended text encoding for interchange between systems. This means that people are free to use their own names when communicating with others, whether they are <em>Jake<\/em>, <em>Kl\u00e1ra<\/em>, <em>\u0986\u09b0\u09bf\u09af\u09bc\u09be <\/em>, <em>\u0d05\u0d2e\u0d7d<\/em>, or any other name containing letters outside the A-Z range. Unfortuantely, WordPress has not kept up with these changes, and that\u2019s what this post is all about.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This post is a <em>request for comment <\/em>on adding that support. There are a number of complications with potentially far-reaching implications.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">TL;DR<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>WordPress\u2019 email sanitization is based on US-ASCII characters and needs to be relaxed to allow for valid UTF-8, but this introduces new risks, including but not limited to: confusable characters, equivalence through normalization, and non-visible characters.<\/li>\n\n\n\n<li>Sites whose databases cannot store full UTF-8 may fail to save valid email addresses. This could be confusing to the site owner and to people attempting to sign up on the site unless properly communicated.<\/li>\n\n\n\n<li>Any additional code that assumes emails are encoded as single-byte US-ASCII will need updating, specifically because it was always an invariant before that emails would not contain multi-byte Unicode characters. Filters may start seeing characters they believed were impossible to receive.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">If you have experience with email issues, <span tabindex='0' class='glossary-item-container'>deploy<span class='glossary-item-hidden-content'><span class='glossary-item-header'>Deploy<\/span> <span class='glossary-item-description'>Launching code from a local development environment to the production web server, so that it's available to visitors.<\/span><\/span><\/span> email services, or know about certain critical aspects of this proposal, please share your thoughts here or in <a href=\"https:\/\/core.trac.wordpress.org\/ticket\/31992\">Core-31992<\/a>.<\/p>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading\">Unicode in email addresses was historically more complicated.<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">When email sprung up, servers were passing US-ASCII as a 7-bit encoding. The need to send text with characters beyond that range appeared shortly afterwards, and MIME text encoding was standardized in RFC 2047. This is what WordPress refers to in its <code>wp_iso_descrambler()<\/code> function: a mechanism for transmitting non-ASCII characters using only ASCII bytes. Critically, <em>it only applied to certain headers<\/em> and <em>could not be applied to email addresses<\/em>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\">This funny-looking string indicates that it is encoded\u2026\n - with the ISO-8859-2 character set.\n - using the quoted form, with escaped hex-codes for non-ASCII characters.\n\n=?ISO-8859-2?Q?=A3=F3d=BC?=\n\nIt encodes the latin2 string \"\u0141\u00f3d\u017a\"<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">While MIME encoding alleviated the problem of sending non-English content, it did nothing to remove the need for people to <em>romanize<\/em> or <em>ASCIIize<\/em> their names and institutions. <a href=\"https:\/\/www.rfc-editor.org\/rfc\/rfc3492\">Punycode<\/a> opened the door for internationalized domain names, again by encoding non-US-ASCII bytes through all-ASCII characters, but this applied <em>only<\/em> to domain names and remained unrecognizable when not parsed.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\">This indecipherable string encodes a state machine which, when decoded, produces a UTF-8 byte stream.\n\nxn--l8je6s7a45b.com\n\nIt encodes the Japanese domain \"\u3042\u30fc\u308b\u3044\u3093.com\"<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">As protocols gained more functionality for unescaped UTF-8, such as in IMAP\u2019s <a href=\"https:\/\/datatracker.ietf.org\/doc\/html\/rfc9755\">UTF-8 extension<\/a>, more and more servers started allowing non-US-ASCII bytes as long as they were valid UTF-8. Even still, this did not change the state for email addresses, unfortunately, as the old restrictions on that <span tabindex='0' class='glossary-item-container'>header<span class='glossary-item-hidden-content'><span class='glossary-item-header'>Header<\/span> <span class='glossary-item-description'>The header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor\u2019s opinion about your content and you\/ your organization\u2019s brand. It may also look different on different screen sizes.<\/span><\/span><\/span> still applied.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Eventually, major <a href=\"https:\/\/archive.fosdem.org\/2025\/schedule\/event\/fosdem-2025-6235-how-email-addresses-are-growing-to-support-unicode\/\">email providers started allowing and passing valid UTF-8<\/a> sequences <em>as<\/em> email addresses, making them a <em>de-facto<\/em> supported feature. A comprehensive take is standardized in <a href=\"https:\/\/datatracker.ietf.org\/doc\/html\/rfc6530#section-4.2\">RFC 6530<\/a>. See last year\u2019s <a href=\"https:\/\/archive.fosdem.org\/2025\/schedule\/event\/fosdem-2025-6235-how-email-addresses-are-growing-to-support-unicode\/\">talk at FOSDEM<\/a> for more information.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What is the proposal for WordPress?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Allow storing Unicode email addresses. (<a href=\"https:\/\/core.trac.wordpress.org\/ticket\/31992\">Core-31992<\/a>)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Functions like <code>is_email()<\/code>, <code>sanitize_email()<\/code> and <code>antispambot()<\/code> need to be extended to support non-ASCII addresses. <code>PHPMailer<\/code> updates in WordPress 6.9 already made it possible for WordPress to <em>send<\/em> to Unicode addresses, but it\u2019s not possible for users to use or store them on their account.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/github.com\/WordPress\/wordpress-develop\/pull\/5237\">PR#5237<\/a> unlocks saving Unicode email addresses by modifying these functions, as long as the database permits it. Its validation is locked to the behaviors of <code>&lt;input type=email&gt;<\/code> elements to ensure compatibility with the browser and a predictable experience.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Back in April, during <span tabindex='0' class='glossary-item-container'>WordCamp<span class='glossary-item-hidden-content'><span class='glossary-item-header'>WordCamp<\/span> <span class='glossary-item-description'>WordCamps are casual, locally-organized conferences covering everything related to WordPress. They're one of the places where the WordPress community comes together to teach one another what they\u2019ve learned throughout the year and share the joy. <a href=\"https:\/\/central.wordcamp.org\/about\/\">Learn more<\/a>.<\/span><\/span><\/span> Vienna, geoTLD.group and ICANN sponsored a <em>contributor challenge<\/em> to work on this very problem. <a href=\"https:\/\/profiles.wordpress.org\/agulbra\/\" class=\"mention\"><span class=\"mentions-prefix\">@<\/span>agulbra<\/a>, <a href=\"https:\/\/profiles.wordpress.org\/akirk\/\" class=\"mention\"><span class=\"mentions-prefix\">@<\/span>akirk<\/a>, <a href=\"https:\/\/profiles.wordpress.org\/benniledl\/\" class=\"mention\"><span class=\"mentions-prefix\">@<\/span>benniledl<\/a>, and <a href=\"https:\/\/profiles.wordpress.org\/dmsnell\/\" class=\"mention\"><span class=\"mentions-prefix\">@<\/span>dmsnell<\/a> worked together on this problem and proposed a new <code>WP_Email_Address<\/code> class which can parse email addresses and return the decoded local and domain parts. This class is then used by a <span tabindex='0' class='glossary-item-container'>filter<span class='glossary-item-hidden-content'><span class='glossary-item-header'>Filter<\/span> <span class='glossary-item-description'>Filters are one of the two types of Hooks <a href=\"https:\/\/codex.wordpress.org\/Plugin_API\/Hooks\">https:\/\/codex.wordpress.org\/Plugin_API\/Hooks<\/a>. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output.<\/span><\/span><\/span> to replace the decisions from <code>is_email()<\/code> <code>sanitize_email()<\/code> with their new counterparts: <code>wp_is_unicode_email()<\/code> and <code>wp_sanitize_unicode_email()<\/code>. This approach provides a path for interoperability with modern standards while preserving the ability to maintain the legacy behaviors, and it provides a helpful new class for structurally working with email addresses in various forms and places.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">While Unicode email addresses should be supported, it\u2019s still necessary to be able to apply legacy restrictions in some cases, such as for WordPress\u2019 own <em>sender address<\/em>\/<code>RETURN FROM<\/code> address, which must remain US-ASCII-only<sup data-fn=\"493e4114-45ee-4718-b686-9476ad5b82f4\" class=\"fn\"><a href=\"#493e4114-45ee-4718-b686-9476ad5b82f4\" id=\"493e4114-45ee-4718-b686-9476ad5b82f4-link\">1<\/a><\/sup>. This proposal is exclusively about supporting Unicode email addresses for WordPress user accounts.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">What could go wrong with storing Unicode email addresses?<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">If the <a href=\"https:\/\/core.trac.wordpress.org\/ticket\/62172\">database or site doesn\u2019t support UTF-8<\/a> then there is a problem, because there is no guarantee that the email addresses will be able to be stored and retrieved without corruption. The linked pull request includes a new <a href=\"https:\/\/github.com\/arnt\/wordpress-develop\/blob\/84cc59e1bbb593eed1e6f264d574c54919d0ecd6\/src\/wp-includes\/default-filters.php#L90\">filter which restricts Unicode email support<\/a> to sites with <code>utf8mb4<\/code> databases. That\u2019s a solid and simple restriction that nevertheless allows the overwhelming majority of sites to support the addresses. But this restriction needs to be communicated to site owners in a clear way.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Existing filters and <span tabindex='0' class='glossary-item-container'>plugin<span class='glossary-item-hidden-content'><span class='glossary-item-header'>Plugin<\/span> <span class='glossary-item-description'>A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory <a href=\"https:\/\/wordpress.org\/plugins\/\">https:\/\/wordpress.org\/plugins\/<\/a> or can be cost-based plugin from a third-party.<\/span><\/span><\/span> or theme code expecting all-US-ASCII email addresses might start receiving data that was never expected. Things as simple as calls to <code>strlen()<\/code> will return incorrect values when applied to UTF-8 strings containing multi-byte characters, and validation scripts and sanitization scripts need to be aware of the changes. For example, <code>antispambot()<\/code> needs updating because it assumes every byte is representable as a hex escape sequence, which is not the case for multibyte strings. Further, Unicode normalization properties means that two strings, which are essentially equivalent, may be treated as two distinct strings by <span tabindex='0' class='glossary-item-container'>PHP<span class='glossary-item-hidden-content'><span class='glossary-item-header'>PHP<\/span> <span class='glossary-item-description'>The web scripting language in which WordPress is primarily architected. WordPress requires PHP 7.4 or higher<\/span><\/span><\/span>, and various functions need to agree on how to handle these to avoid conflating addresses.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The task of adding full Unicode support to identifiers in WordPress is worthwhile, despite being a broad and fuzzy challenge.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>WordPress can start parsing addresses on supporting sites using modern standards.<\/li>\n\n\n\n<li>Plugins can disable the modern email parsing.<\/li>\n\n\n\n<li>An audit of <span tabindex='0' class='glossary-item-container'>Core<span class='glossary-item-hidden-content'><span class='glossary-item-header'>Core<\/span> <span class='glossary-item-description'>Core is the set of software required to run WordPress. The Core Development Team builds WordPress.<\/span><\/span><\/span> and plugins is necessary to uncover where assumptions about US-ASCII email characters will be broken when WordPress starts allowing Unicode email addresses.<\/li>\n\n\n\n<li>Your feedback will help make this process smooth and successful.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\">\n\n\n\n<p class=\"wp-block-paragraph\">Props to <a href=\"https:\/\/profiles.wordpress.org\/dmsnell\">Dennis Snell<\/a> for help with this <span tabindex='0' class='glossary-item-container'>blog<span class='glossary-item-hidden-content'><span class='glossary-item-header'>blog<\/span> <span class='glossary-item-description'>(versus network, site)<\/span><\/span><\/span> posting, as well as to <a href=\"https:\/\/profiles.wordpress.org\/sirlouen\/\">Manuel Camargo<\/a>, <a href=\"https:\/\/profiles.wordpress.org\/justlevine\/\">Dovid Levine<\/a>, <a href=\"https:\/\/profiles.wordpress.org\/tusharbharti\/\">Tushar Bharti<\/a>, <a href=\"https:\/\/profiles.wordpress.org\/mukesh27\/\">Mukesh Panchal<\/a>, and Dennis for help with the code.<\/p>\n\n\n<ol class=\"wp-block-footnotes\"><li id=\"493e4114-45ee-4718-b686-9476ad5b82f4\">Sender addresses may use non-US-ASCII characters as an email alias, but the actual address portion should remain US-ASCII compatible \u2013 for example <code>From: \"\u092e\u0947\u0930\u0940 \u0938\u093e\u0907\u091f\" &lt;noreply@mysite.in&gt;<\/code>, which most software displays as <code>From: \u092e\u0947\u0930\u0940 \u0938\u093e\u0907\u091f<\/code>. <a href=\"#493e4114-45ee-4718-b686-9476ad5b82f4-link\" aria-label=\"Jump to footnote reference 1\">\u21a9\ufe0e<\/a><\/li><\/ol>\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n<p class=\"o2-appended-tags\"><a href=\"https:\/\/make.wordpress.org\/core\/tag\/charset\/\" class=\"tag\"><span class=\"tag-prefix\">#<\/span>charset<\/a>, <a href=\"https:\/\/make.wordpress.org\/core\/tag\/email\/\" class=\"tag\"><span class=\"tag-prefix\">#<\/span>email<\/a>, <a href=\"https:\/\/make.wordpress.org\/core\/tag\/unicode\/\" class=\"tag\"><span class=\"tag-prefix\">#<\/span>unicode<\/a><\/p><nav class='o2-post-footer-actions'><ul class='o2-post-footer-action-row'><li class='o2-post-footer-action'><a href=\"https:\/\/login.wordpress.org\/?redirect_to=https%3A%2F%2Fmake.wordpress.org%2Fcore%2F2026%2F05%2F22%2Fextending-unicode-support-in-email-addresses-usernames-and-slugs%2F%23respond&#038;locale=en_US\" title=\"Login to Reply\"  class=\"genericon  genericon-reply\"  data-action=\"login-to-reply\"  data-actionstate=\"default\" >Login to Reply<\/a><\/li><\/ul><div class='o2-post-footer-action-likes'><\/div><ul class='o2-post-footer-action-row'><\/ul><\/nav>","protected":false},"excerpt":{"rendered":"<p>Eleven years ago, in Core-31992, someone proposed allowing non-US-ASCII email address support in WordPress. The software world has changed considerably since then: internationalized domain names and paths are uniformly handled in browsers, email systems support the wide range of Unicode characters as raw UTF-8, and UTF-8 is the only recommended text encoding for interchange between [&hellip;]<\/p>\n","protected":false},"author":20976625,"featured_media":0,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"[{\"id\":\"493e4114-45ee-4718-b686-9476ad5b82f4\",\"content\":\"Sender addresses may use non-US-ASCII characters as an email alias, but the actual address portion should remain US-ASCII compatible \\u2013 for example <code>From: \\\"\\u092e\\u0947\\u0930\\u0940 \\u0938\\u093e\\u0907\\u091f\\\" &lt;noreply@mysite.in&gt;<\\\/code>, which most software displays as <code>From: \\u092e\\u0947\\u0930\\u0940 \\u0938\\u093e\\u0907\\u091f<\\\/code>.\"}]","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[1174],"tags":[5848,100,5849],"class_list":["post-121742","post","type-post","status-publish","format-standard","hentry","category-proposals","tag-charset","tag-email","tag-unicode","mentions-agulbra","mentions-akirk","mentions-benniledl","mentions-dmsnell","author-agulbra"],"revision_note":"","jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2AvED-vFA","_links":{"self":[{"href":"https:\/\/make.wordpress.org\/core\/wp-json\/wp\/v2\/posts\/121742","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/make.wordpress.org\/core\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/make.wordpress.org\/core\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/make.wordpress.org\/core\/wp-json\/wp\/v2\/users\/20976625"}],"replies":[{"embeddable":true,"href":"https:\/\/make.wordpress.org\/core\/wp-json\/wp\/v2\/comments?post=121742"}],"version-history":[{"count":68,"href":"https:\/\/make.wordpress.org\/core\/wp-json\/wp\/v2\/posts\/121742\/revisions"}],"predecessor-version":[{"id":123538,"href":"https:\/\/make.wordpress.org\/core\/wp-json\/wp\/v2\/posts\/121742\/revisions\/123538"}],"wp:attachment":[{"href":"https:\/\/make.wordpress.org\/core\/wp-json\/wp\/v2\/media?parent=121742"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/make.wordpress.org\/core\/wp-json\/wp\/v2\/categories?post=121742"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/make.wordpress.org\/core\/wp-json\/wp\/v2\/tags?post=121742"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}