Potential roadmap for multisite: Subdirectories, subdomains, open registration, and domain mapping

Following up on the potential roadmap for terms and taxonomy meta, here is a potential roadmap for multisite. This is based on years of discussions among core developers and other contributors, which has taken place on IRC, tickets, blog posts, comment threads, at WordCamps and contributor days, and at last October’s community summit.

The Current State of Multisite

Historically, WordPress MU was the “multi-user” version of WordPress. Any user could register their own subdomain, exactly how WordPress.com worked. Most of multisite remains centered around open registration and signups.

The difficulties of setting up a wildcard DNS led to the introduction of a subdirectory option for new sites, as a fallback. Indeed, the old constant to toggle the subdirectory/subdomain option specifically referenced virtual hosts (VHOST, defined as ‘yes’ or ‘no’, rather than boolean). Wildcard DNS is still checked on setup whenever someone is creating a network with a subdomain.

In a subdomain setup, the valid domains are quite limited. Subdomains cannot be nested and can only be common characters, and top-level domains cannot use www or ports. When creating a new site in the network admin, it still asks for an email address, and a user with that email will be created if the user doesn’t already exist.

The paradigm of subdirectories versus subdomains is very restrictive. Over time, many have sought two related but distinct concepts that would diversify the address space used for sites.

Domain Mapping and Multiple Networks

Generally, domain mapping is the concept of having a top-level domain, such as example.org, being a site inside of another network, such as example.com. Otherwise, sites on example.com would simply be subdomains or subdirectories — so, blog.example.com and photos.example.com, or example.com/blog and example.com/photos. Either way, if the goal is for example.org to be a site on the same network, something must give.

By default, WordPress MU has only one network (MU terminology was ‘site’, with sites being ‘blogs’), and constants are put in place during network creation to even avoid querying the wp_site table. Multiple networks are possible, but typically, only one network is desired. The only things “global” to entire multisite installs (rather than just a single network) are must-use plugins (“mu-plugins”) and users. Sites are assigned to a particular network; network-activated plugins and network-enabled themes are distinct only to that network; and each network has its own settings, network admin, and even super administrators (unless defined globally using $super_admins).

There is no way to even store a globally accessible option accessible by all networks. switch_to_blog() (MU terminology for a site) does not actually toggle the current network ($current_site, per MU), and thus get_site_option() can only return current network information.

WordPress stores a decent amount of “global” information as meta against a single network, which duplicates efforts. This includes plugin, theme, and core update checks; and user counts. (Site counts are per network.) Multiple networks still rely on a single installation of WordPress and a single wp-content directory.

Domain mapping should be our primary focus here. It would probably be best if the demand for better support for multiple networks (like a user interface) is reduced through robust domain mapping support.

On API and Naming

At some point, we should be able to introduce a get_network_option() function to replace get_site_option(), which could follow context switches and accept a network argument. A network ID of “0″ could be used in wp_sitemeta (network meta) to store truly global cross-network options.

A larger point on API is that we should not rename API for the sake of renaming API. A lot of functions still use “blog” when they mean sites, or “site” when they mean network, and many remain prefixed with “wpmu”.

We’ve tried really hard to move toward “network” wherever possible, at the API and UI levels. We long ago stopped introducing “site” things when they mean network as it is just too damn confusing. We haven’t quite made the switch to always going with site over blog at the API level, though we have at the UI level, but that’s just because it is more difficult for “site” to mean two different things, than it is to start using “network”. Either way you, still need to figure out whether a particular reference to “site” means “blog” or “network”, so we might as well use “network” wherever possible and lessen the number of ambiguous “site” instances (especially those that refer to the old MU definition).

Not renaming these functions is actually key to the evolution of multisite. There are quite a few functions to rename, but doing so without completely replacing their internals and functionality in the process is a huge missed opportunity — it wrongly suggests things have improved while also cutting into our valuable function namespace. Three plus years later, we’re still finding oddities in old MU code, so it makes sense to bide our time and only undergo renames when they let us shed dead weight. The side effect of keeping the barrier to entry high for multisite is not necessarily a bad one — even three years after merge, the product as a whole is still very weak (which is being kind).

The Path Forward: Supporting Sites at Any URL

It should be abundantly clear that creating a second network is often overkill and not necessarily desired for many setups. WordPress.org uses them, but it is a mixed bag. Each subdomain is its own network, which are subdirectory setups. It makes a lot of sense for make.wordpress.org to be its own self-contained network, as they share similar characteristics. But other subdomains are typically just a single site.

But, because of the subdomain/subdirectory paradigm, many use multiple networks to function as domain mapping. The best way to avoid misuse of multiple networks as domain mapping (which, aside from being counter-intuitive, isn’t generally desired) is to introduce proper domain mapping into core. This requires a complete rewrite of ms-settings.php — which takes a URL and figures out which site it should be on.

Essentially, it should take the entire subdomain and at most one path segment, and query the wp_blogs table to find the associated site. Something like domain = %s AND path IN ('/', '/first-segment/'). From there, the current site can be inferred.

The reason to search only one path segment is it immediately becomes an issue of caching. We don’t know how many levels are actually supported, which means we don’t know if /2013/02/01/some-blog-post/feed/ is a site, or a post on the main site. Literally any main site URL would thus cause a hit against the blogs table. This is not ideal. Nested sites are of course going to be desired in many situations, and it should be dead-simple for these URLs to be “trapped” via sunrise.php and mapped to a particular site.

If a network is “small” (less than 10,000 sites per our wp_is_large_network() API), we could potentially query and cache SELECT blog_id, path FROM $wpdb->blogs WHERE domain = %s AND path <> '/' into a single cache key for lookup. Or we could specifically break it down into caching a shortest path, like any paths that start with ‘a’ in one cache key, any paths that start with ‘b’ in another cache key, etc. That way, we can verify that a site exists or not as long as a cache is in place. (The alternative would be to cache individual successes, but then we do a lot of DB queries if we’re unsure whether something is just uncached or isn’t a site — we’d have to cache in the negative too.)

Domain mapping in core requires two major things. A complete dissolution of the existing subdomain versus subdirectory paradigm, and cross-domain “remote” logins. Remote login requires some juggling to make sure that domain B can make a request over domain A to confirm someone is logged in, and set a cookie if so. Generally this requires that logins occur on an “unmapped” domain (think the main site in a network, typically), but there are numerous ways to set up remote logins. Unfortunately most current implementations in the wild (including on WordPress.com) are fairly messy and can cause redirect loops, race conditions, and other problems. If done poorly, security issues can result. So this must be done with care.

Dissolving the subdirectory/subdomain paradigm is actually not so bad. We need to stop thinking about a network only consisting of differing subdomains, or only consisting of differing paths. ms-settings.php would need to be rewritten. Site creation and management will need some changes.

Dealing with URL Conflicts

Perhaps the greatest change will be addressing the issue of the main site gaining a ‘/blog’ prefix. This is ostensibly to avoid top-level pages on the main site from clashing with sub-sites. With arbitrary domain support (via domain mapping primarily, and secondarily via secondary networks), any site with path ‘/’ can clash with any other site with the same domain but a different path. With multiple path segments (nested sites), any site with path ‘/X/’ can have pages that clash with site ‘/X/Y/’.

Ultimately, this requires two-way blacklisting. Before a site is created, it must be checked against top-level URLs of the possibly conflicting site. And, before a page is created, it must be checked against sub-sites that already exist. If an ‘/about/’ page already exists on example.com/, an /about/ site cannot be created. But if an example.com/blog/ site already exists, a /blog/ page cannot be created on example.com. This gets complicated quickly, and is a very strong argument for only supporting one path segment in core by default, and allowing plugins to handle these potential conflicts on their own. In most cases, simply ignoring the potential conflicts is going to be sufficient.

Registration and Open Networks

Supporting any URL opens up possible problems with registration, even after dealing with unique URLs. The concept of an open network where users can create their own sites was the popular impetus for WordPress MU, but is decidedly not the primary use case for multisite anymore. Most operate closed, trusted networks, where site signups are disabled (and often user signups are disabled as well). In the case of a closed network, the blacklisting is merely to avoid shooting oneself in the foot. In the case of an open registration network, there does still need to be the choice of what, exactly, someone can register for. Even in a world of domain mapping, it should only be expected that “open registration” still only allows for subdomains or subdirectories, just as it does now. So when I am referring to removing that paradigm, I actually mean downsizing it to being a part of open registration only.

Trusting Users in a Closed Network

Despite being the narrower use case, the concept of open registration still implicitly dictates a lot of our current situation and “roadmap.” An administrator for a single site is considered wholly untrusted. Because multisite has the concept of a super administrator, there of course some powers that should be reserved for them. I would argue that would include the use of unfiltered HTML, as well as the administration of anything “global” — editing and deleting users; and editing, installing, and updating plugins and themes.

But, when a network is not designed for “open registration,” there are a number of undue burdens that should be lifted for administrators. Uploadable file types are severely restricted, and the amount they can upload is capped. They cannot activate installed plugins, though there is an option for this. They cannot add users to their sites without knowing their email address (ostensibly to prevent spam), and the user must still go through a “confirmation” process. New sites must go through an “activation” process. They cannot create new users.

I don’t think WordPress needs to decide that the multisite feature is only geared for closed networks. Rather, a single option — set on install, and controllable via network settings — can control this entire paradigm very effectively. It will then be much easier to consolidate a lot of the signup, activation, and administration funkiness in multisite under this paradigm into an “open network” concept. The “Allow new registrations” setting is roughly analogous, but would largely be superseded by whether a network is open, as well as fine-grained registration controls.

One use case we should consider is the concept of a closed network and an open network sharing one multi-network multisite installation. For example, wordpress.org may be a closed network, but meetups.wordpress.org may be an open network to enable others to create meetups. And wordcamp.org may also be an “open” network on the same installation, even if individuals can’t create sites. “Open” is thus wider than registration alone.

Essentially, I am proposing we trust single-site administrators in a closed network to not be spammy, and to be given wide-ranging control of their own sites; but that we do not extend that trust to important areas of security.

A strong benefit here is that functionality in a closed network starts to more closely resemble a single site rather than the inherent restrictions of an open network. By shifting these paradigms, networks will become easier to manage and more straightforward to use over time. Eventually, closed networks can and should become easier to install — even if multisite never reaches the point where it is the recommended tool for creating a second blog for your cat.

Moving Forward

There is not a specific, multiple-release plan for this roadmap. Rather, every step we take should aim to work toward outlined goals. While people may like domain mapping, it doesn’t make sense to bolt it on just yet — a plugin can do that for them perfectly well now. It would be ill-advised to implement it without making WordPress work significantly smoother for other forms of domain and URL management — subdirectory/subdomain installs, domain mapping, and multiple networks. The best approach is one that smartly and carefully balances the needs of users with our long-term objectives.