Taking advantage of UUIDs
Now that we require MySQL 4.1.2, we can look into using UUIDs in WordPress.
Universally unique identifiers are numbers that are unique in time and space. In the database world, they’re primarily used in order to work with data that is created in a decentralized manner. It allows to characterize a piece of data that isn’t stored in the database yet, to merge data from separate databases without needing to work around ID conflicts, etc.
The lack of genuine UUIDs introduces all sorts of quirks in the WP editor when we’re dealing with yet to be saved posts. #9471, #10456, #11145, #11990, and many more underscore the architectural problem that we’re facing. The media-related tickets exacerbate the issue.
Mark recently opened #11889 and suggested that we a) always create a new post when hitting the new post button, and b) have a garbage collector clean things up periodically. It is a solution, but I feel that using UUIDs — which were designed to deal with this kind of stuff — would be simpler.
Using them would mean pre-assigning a UUID to new posts, and placing it in a hidden field. Since this UUID characterizes our yet to be saved post, we can safely pass it around to an autosave handler, a media thickbox, etc., allowing to create the post on the fly when necessary.
Further down the stream, we introduce a slight change in our insert/update post handler: when saving a post whose ID is not assigned yet, we start by verifying if that post exists already — by querying the posts table’s post_guid field to know if that UUID exists.
scribu 7:25 pm on January 24, 2010 Permalink
You’ve been pushing for this aproach for some time now, Denis.
Since you’ve got the best grip on your ideea, how about a patch?
Oncle Tom 7:34 pm on January 24, 2010 Permalink
Is this a good idea to rely on such a database specific feature? How will it works if, one day, a database abstraction is used, to support other DB engines?
It should be code logic no?
Matt 12:52 am on January 25, 2010 Permalink
Philosophically, I’m a fan of fixing things with the smallest possible change that works, because it decreases the chance that an architecture change will introduce new and impossible-to-anticipate bugs.
Also worth pointing out two old but classic essays on abstraction and architecture:
http://www.joelonsoftware.com/articles/fog0000000018.html
http://www.codinghorror.com/blog/archives/000165.html
Mark Jaquith 5:59 pm on January 25, 2010 Permalink
That fix would probably be the “Create a draft whenever you visit the ‘Add New X’ screen” method. The bulk of the code will be the cleanup (change post_status on a real save, cleanup the unused items after a long enough time).
And I suppose there is another benefit to this: it is DB agnostic.
Ryan 6:25 pm on January 25, 2010 Permalink
If we keep guid as a varchar and wrap the MySQL syntax in a short-circuitable function we can remain DB agnostic. I don’t really want a schema change to the posts table in 3.0 anyway since we want to make MU upgrades to 3.0 as easy as possible.
Even if we introduce UUID, I kinda like auto-creating a draft. It is very simple and fairly clean. The only messy part is the GC, but that doesn’t bother me too much. Regardless, both ways seem like solid approaches that would allow us to clean up the gallery mess.
Brian Layman 8:49 pm on January 25, 2010 Permalink
A big difference I see is that one causes a DB write while the other doesn’t.
Denis made two points advantages of the UUID:
1. It allows to characterize a piece of data that isn’t stored in the database yet,
2. to merge data from separate databases without needing to work around ID conflicts, etc.
And I guess that qualifies in the first one because it avoids the need to store something. I discount the second one because we are way too far down the line for ppl not to worry about ID conflicts when merging DBs, IMHO.
)
(BTW can we get some cookies around here to save our name, email and site for commenting
hakre 6:23 pm on January 28, 2010 Permalink
An easy one would be: User clicks Add New Post, comes to a screen where it’s a need to enter the title of your new post before user can continue to the editor. Post will be created within the request where the user enters the editor.
hakre 1:44 pm on February 17, 2010 Permalink
That’s really an important thing, the smallest possible change. To achieve that in the long term, this resource is valuable as well:
http://www.joelonsoftware.com/articles/fog0000000043.html
Especially the point that explains when it makes sense to fix bug and how to handle changes in the code.
Mark Jaquith 4:28 am on January 25, 2010 Permalink
I wasn’t quite getting this at first, but now I think I see how it could work. Is this about right?:
Instead of populating a post ID field for a new draft, you’d populate a post_uuid field. The add attachment links, thumbnail selection link, postmeta forms, etc, would all pass this UUID if a post_id isn’t yet available. Whenever something ([whatever]) is created in relationship to a UUID and a post with that UUID doesn’t exist, you create it, and give the post that UUID (and then add the [whatever] pointing to the post ID of that newly created post). Forms that are still open and reference the UUID for the the newly created post will continue to work, except the UUID lookup will succeed, and they’ll attach [whatever] to the post ID of the post row with that UUID.
We’ll need a function that accepts a UUID and returns either the existing post object, or if-not-exists, creates one and returns the new object. get_post_by_uuid( $uuid ) or something.
The advantage is that we’re not creating post rows that add overhead and need cleaning up. We’d have to write code to handle the “did we get a UUID or a post ID?” stuff, but it’d probably allow us to fix all those bugs and get rid of a lot of hairy JS code.
Denis, can you affirm or deny my stated understanding?
hakre 12:38 pm on January 25, 2010 Permalink
I understand it this way: The UUID names post that is created / refered to by a user doing a specific post related action to a possibly non-existing-in-database post.
The UUID is used to describe the simple fact, that a user knows about a post first compared to our current post-related-data-structure(s) in core. I like that Idea.
Denis de Bernardy 3:12 am on February 5, 2010 Permalink
“Whenever something ([whatever]) is created in relationship to a UUID and a post with that UUID doesn’t exist, you create it, and give the post that UUID (and then add the [whatever] pointing to the post ID of that newly created post). Forms that are still open and reference the UUID for the the newly created post will continue to work, except the UUID lookup will succeed, and they’ll attach [whatever] to the post ID of the post row with that UUID.”
sorry I didn’t reply earlier (sick, then busy), but yes, that pretty much is exactly it…
miqrogroove 1:54 pm on January 25, 2010 Permalink
A third and very simple strategy would be to have the front end keep track of the image IDs whenever there is no parent ID, and then include that ID list with the first auto save. That way you have no GC, no schema change, and no empty drafts..
Brian Layman 5:30 pm on January 25, 2010 Permalink
Coming from a database background where 50gb dbs were not that unusual, I still cringe thinking at PHP’s loose typing and handling numbers as strings. Reading up on UUIDs it is neat to see that it is a actually stored as a 128bit integer. Which I thought was a really good thing, but because this will be passed around as a string, this is not the end of the story:
There’s a warning in the documentation that character sets play a significant role in UUIDs. I’m not sure how that will affect us, but you can setup a case where the character set of the column doesn’t match the values stored in that column and therefore the indexes fail.
Maybe this can be fixed by regenerating the index, I don’t know. Is it a concern? It might be if the blog is set to a different character set than the database right? I’ve run into problems like that before when the data was a different character set than the structure in which it was stored. Maybe it was only because we do a lot of tossing around of blogs…
Joseph Scott 6:17 pm on January 25, 2010 Permalink
Which type of UUID was being considered for this?
Brian Layman 7:14 pm on January 25, 2010 Permalink
I don’t think the MySQL UUID() function has various types.. http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_uuid
Joseph Scott 8:30 pm on January 25, 2010 Permalink
From the discussion so far it seemed implied that the UUID would be generated in PHP, not MySQL. That said it looks like the UUID showed up in MySQL 4.1.2 so we could potentially use it.
Peter Westwood 9:49 pm on January 25, 2010 Permalink
I am really against trying to use UUIDs as the linkage between posts and there metadata/attachments.
We have a perfectly good method of providing this linkage and a simple solution has been proposed to resolve the issue with autosave prior to a draft existing.
All we need is for get_default_post_to_edit() to ensure something exists in the database and most of the rest of the code won’t need touching.
I am with Matt on the small change is better.
UUIDs aren’t designed to solve the issue we have here – they are designed as a way of letting you generate IDs that are unique accross multiple sites/machines and are a candidate for using as the GUID in the posts table but not as the linkage/post_id
wpmuguru 5:58 pm on January 26, 2010 Permalink
+1
The problems being addressed in those tickets occur infrequently. Reinventing the post ID management system seems to be an excessive solution to the problem.
hakre 5:02 pm on January 28, 2010 Permalink
Westi, from what you write in your comment is that you did get it wrong. The UUID would not be used to replace the Post ID, it would be only in use as long as there ain’t no Post ID. You understand the difference?
Peter Westwood 5:50 pm on January 28, 2010 Permalink
Yes. But it would effectively replace post ID.
Either we would have to have all the code cope with matching based on ID or UUID or we would have to switch solely to using UUID.
Introducing UUID for posts that aren’t saved yet just introduces unnecessary complexity
hakre 6:11 pm on January 28, 2010 Permalink
The unnecessary (?) complexity was introduced by users who do upload attachments for non-existing posts because the UI made them think, they are actually uploading images to a new post.
The UUID is only a technical thingy to make the software again capable of that situation. The UUID itself is not introducing any more complexity. The opposite is the case, it makes things more simple and useable. The current data structure is not able to cope with the situation we have in the post editor.
Or what do you suggest as analysis why this bug is staying open so long unfixed?
Peter Westwood 7:52 pm on January 28, 2010 Permalink
If we introduce a UUID just for this situation we have to write a lot of extra code for WordPress to handle the action on an un-saved post to handle the action and every plugin author that wants to do something similar has to do this too.
If we just always create a draft then the current code in core and plugins will just work.
Brian Layman 5:13 pm on January 28, 2010 Permalink
I wasn’t going to reply because comments on this had slowed, but I think I got it wrong too. I don’t think that Denis implied any changes at all to the data structure. If you read the last paragraph, he is envisioning the UUID being stored in the post_guid field. So his proposal is one for the editing screens only and not an altering of the post table’s structure. If I understood it more correctly upon re-read that is.
Peter Westwood 5:51 pm on January 28, 2010 Permalink
This would still affect a lot of code which wouldn’t need changing if we just created a draft before outputting the page.
Jeremy Stark 5:53 pm on February 19, 2010 Permalink
I wrote a UUID plugin in order to integrate WordPress with another publishing system:
http://wordpress.org/extend/plugins/simple-uuid/installation/
It just adds a uuid to each posts metadata.