Potential roadmap for taxonomy meta and post relationships

In the days before the WordPress Community Summit in October, a number of contributing developers huddled together to discuss a number of long-term architecture ideas. Among them were taxonomy meta and post relationships. (Ah, I see I now have your attention!) This post is more or less a proposed roadmap. It’s an ambitious plan that is designed to take place over perhaps five or more releases.

(During WordCamp San Francisco’s keynote, @matt talked a little about a new aim to build teams of contributors and reviewers around individual core components. There will be a lot more on that in the coming days and weeks. For now, here’s a post that covers two components, post types and taxonomy.)

The discussion included all core committers at the time — @ryan, @markjaquith, @westi, @azaozz, @nacin, @dd32, @koopersmith, and @duck_ — and a number of contributing developers, including @aaroncampbell, @dh-shredder, @helen, and @scribu.

At the moment, terms are represented in WordPress using two different IDs: a term ID, and a term taxonomy ID. A term ID (and name and slug) can actually appear in multiple taxonomies, so to identify a particular term, you must have either the term ID and the corresponding taxonomy, or just the term taxonomy ID. This dates back to the original taxonomy schema in WordPress 2.3. At the time, the concept of “shared terms” seemed like it could be an important abstraction. Hindsight is 20/20, and shared terms are the bane of taxonomies in WordPress.

So when we talk about term meta, we’re actually talking about term taxonomy meta — meta associated with a term taxonomy ID, not a term ID. The problem is, the public ID used in the API and elsewhere is the term ID (and, by necessity, a taxonomy is also passed). This confusion — and the need for there to be only one object identifier in our metadata API (term taxonomy ID, not two, as in term ID and taxonomy) — has long forced us to table the discussion of term metadata.

(There are separate conceptual issues here — at what point does a term with metadata simply become a post-like object that can relate to other posts? And given post relationships, could terms and posts actually converge in their underlying schema? I’m not actually going to answer those questions in this post. Purely talking schema and architecture at this point.)

At WordCamp San Francisco last year, four of us — me, Gary Pendergast (@pento), @scribu, and @koopersmith — came up with a rather crazy way to make major changes to our table schema while still being backwards compatible. In fact, we came up with two ways to do it. This was the plan everyone heard and discussed at the summit.

It was clear that shared terms had to go. The first step is removing a UNIQUE index on the slug field in the wp_terms table. (This is dependent on #17689.) Then, we stop creating new shared terms. Step three, on an upgrade routine, we actively look for any existing shared terms and split them.

These three initial steps must happen over two to three major releases, as we’re talking about a bug fix, a schema change, an API change, and an upgrade routine — in that order.

Then comes the fun part, in yet another major release. With shared terms split, term ID and term taxonomy ID will be identical  on every install.  If we moved the slug and name fields from wp_terms to wp_term_taxonomy, we could actually drop wp_terms.

How can we remove an entire table but still be backwards compatible? We came up with two solutions:

  1. Because all fields in wp_terms will exist in wp_term_taxonomy, wp_terms can be recreated as a MySQL view to be a read-only mirror of term data, thus being compatible with all existing queries.
  2. Because all fields in wp_terms will exist in wp_term_taxonomy, and because table aliases like `t`and `tt` are always used when joining these two tables, $wpdb->terms can simply be set to $wpdb->term_taxonomy. A query that previously joined wp_terms with wp_term_taxonomy would just join itself.

In all: Using the second approach (yes, it works), it took about 20 lines of code to make WordPress run without a wp_terms table. Wow, right?

So by this point, we would finally have a sane taxonomy schema. Less joins, a cleaner API (probably helped by a new WP_Term object to model WP_Post and WP_User), no more shared terms headaches, and a single, sane ID for a single taxonomy’s term.

Once that is all finished, we can finally have term meta. Maybe. (Kidding. (Kind of.))

Where do post relationships come in? The existing Posts 2 Posts plugin by @scribu is fantastic and serves the niche well. But we’re not really comfortable making any architecture or API changes along these lines while our taxonomy schema is still in a far from ideal state.

The post relationships plugin supports posts to posts, and posts to users. Core taxonomy relationships supports posts to terms, but it can also be rigged to relate users to terms. (It also supported links to terms, yet another object type.) We didn’t fully iron out this idea yet, but one idea is to convert the current wp_term_relationships table to a more generic object relationships table, which can support posts to posts, posts to users, terms to users, and of course posts to terms (and, really, any arbitrary relationship).

A disclaimer: This post doesn’t promise anything. Do not rely on the contents of this post for future projects.  It will take us some time to lay out the proper groundwork and make sure we get this right. Do not expect this to happen in WordPress 3.7, or even 3.8. (Actually, do not expect this to happen at all.)

That said, I’m really glad to get this information out there and I’m excited to hear any feedback you may have. We are always thinking toward the future, and a lot of contributing developers have mile-long roadmaps in their heads — it’s long past time to get those on paper.