Denis de Bernardy 6:42 pm on January 24, 2010
Tags: featured bugs ( 4 )

Taking advantage of UUIDs

Now that we require MySQLMySQL MySQL is a relational database management system. A database is a structured collection of data where content, configuration and other options are stored. https://www.mysql.com/. 4.1.2, we can look into using UUIDs in WordPress.

Universally unique identifiers are numbers that are unique in time and space. In the database world, they’re primarily used in order to work with data that is created in a decentralized manner. It allows to characterize a piece of data that isn’t stored in the database yet, to merge data from separate databases without needing to work around ID conflicts, etc.

The lack of genuine UUIDs introduces all sorts of quirks in the WP editor when we’re dealing with yet to be saved posts. #9471, #10456, #11145, #11990, and many more underscore the architectural problem that we’re facing. The media-related tickets exacerbate the issue.

Mark recently opened #11889 and suggested that we a) always create a new post when hitting the new post button, and b) have a garbage collector clean things up periodically. It is a solution, but I feel that using UUIDs — which were designed to deal with this kind of stuff — would be simpler.

Using them would mean pre-assigning a UUID to new posts, and placing it in a hidden field. Since this UUID characterizes our yet to be saved post, we can safely pass it around to an autosave handler, a media thickbox, etc., allowing to create the post on the fly when necessary.

Further down the stream, we introduce a slight change in our insert/update post handler: when saving a post whose ID is not assigned yet, we start by verifying if that post exists already — by querying the posts table’s post_guid field to know if that UUID exists.

#featured-bugs

scribu 7:25 pm on January 24, 2010

You’ve been pushing for this aproach for some time now, Denis.

Since you’ve got the best grip on your ideea, how about a patchpatch A special text file that describes changes to code, by identifying the files and lines which are added, removed, and altered. It may also be referred to as a diff. A patch can be applied to a codebase for testing.?
Oncle Tom 7:34 pm on January 24, 2010

Is this a good idea to rely on such a database specific feature? How will it works if, one day, a database abstraction is used, to support other DB engines?

It should be code logic no?
Matt 12:52 am on January 25, 2010

Philosophically, I’m a fan of fixing things with the smallest possible change that works, because it decreases the chance that an architecture change will introduce new and impossible-to-anticipate bugs.

Also worth pointing out two old but classic essays on abstraction and architecture:

http://www.joelonsoftware.com/articles/fog0000000018.html
http://www.codinghorror.com/blog/archives/000165.html
- Mark Jaquith 5:59 pm on January 25, 2010
  
  That fix would probably be the “Create a draft whenever you visit the ‘Add New X’ screen” method. The bulk of the code will be the cleanup (change post_status on a real save, cleanup the unused items after a long enough time).
  
  And I suppose there is another benefit to this: it is DB agnostic.
  - Ryan 6:25 pm on January 25, 2010
    
    If we keep guid as a varchar and wrap the MySQLMySQL MySQL is a relational database management system. A database is a structured collection of data where content, configuration and other options are stored. https://www.mysql.com/. syntax in a short-circuitable function we can remain DB agnostic. I don’t really want a schema change to the posts table in 3.0 anyway since we want to make MU upgrades to 3.0 as easy as possible.
    
    Even if we introduce UUID, I kinda like auto-creating a draft. It is very simple and fairly clean. The only messy part is the GC, but that doesn’t bother me too much. Regardless, both ways seem like solid approaches that would allow us to clean up the gallery mess.
  - Brian Layman 8:49 pm on January 25, 2010
    
    A big difference I see is that one causes a DB write while the other doesn’t.
    
    Denis made two points advantages of the UUID:
    1. It allows to characterize a piece of data that isn’t stored in the database yet,
    2. to merge data from separate databases without needing to work around ID conflicts, etc.
    
    And I guess that qualifies in the first one because it avoids the need to store something. I discount the second one because we are way too far down the line for ppl not to worry about ID conflicts when merging DBs, IMHO.
    (BTW can we get some cookies around here to save our name, email and site for commenting 🙂 )
- hakre 6:23 pm on January 28, 2010
  
  An easy one would be: User clicks Add New Post, comes to a screen where it’s a need to enter the title of your new post before user can continue to the editor. Post will be created within the request where the user enters the editor.
- hakre 1:44 pm on February 17, 2010
  
  Philosophically, I’m a fan of fixing things with the smallest possible change that works, because it decreases the chance that an architecture change will introduce new and impossible-to-anticipate bugs.
  
  That’s really an important thing, the smallest possible change. To achieve that in the long term, this resource is valuable as well:
  
  http://www.joelonsoftware.com/articles/fog0000000043.html
  
  Especially the point that explains when it makes sense to fix bugbug A bug is an error or unexpected result. Performance improvements, code optimization, and are considered enhancements, not defects. After feature freeze, only bugs are dealt with, with regressions (adverse changes from the previous version) being the highest priority. and how to handle changes in the code.
Mark Jaquith 4:28 am on January 25, 2010

I wasn’t quite getting this at first, but now I think I see how it could work. Is this about right?:

Instead of populating a post ID field for a new draft, you’d populate a post_uuid field. The add attachment links, thumbnail selection link, postmeta forms, etc, would all pass this UUID if a post_id isn’t yet available. Whenever something ([whatever]) is created in relationship to a UUID and a post with that UUID doesn’t exist, you create it, and give the post that UUID (and then add the [whatever] pointing to the post ID of that newly created post). Forms that are still open and reference the UUID for the the newly created post will continue to work, except the UUID lookup will succeed, and they’ll attach [whatever] to the post ID of the post row with that UUID.

We’ll need a function that accepts a UUID and returns either the existing post object, or if-not-exists, creates one and returns the new object. get_post_by_uuid( $uuid ) or something.

The advantage is that we’re not creating post rows that add overhead and need cleaning up. We’d have to write code to handle the “did we get a UUID or a post ID?” stuff, but it’d probably allow us to fix all those bugs and get rid of a lot of hairy JSJS JavaScript, a web scripting language typically executed in the browser. Often used for advanced user interfaces and behaviors. code.

Denis, can you affirm or deny my stated understanding?
- hakre 12:38 pm on January 25, 2010
  
  I understand it this way: The UUID names post that is created / refered to by a user doing a specific post related action to a possibly non-existing-in-database post.
  
  The UUID is used to describe the simple fact, that a user knows about a post first compared to our current post-related-data-structure(s) in coreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress.. I like that Idea.
- Denis de Bernardy 3:12 am on February 5, 2010
  
  “Whenever something ([whatever]) is created in relationship to a UUID and a post with that UUID doesn’t exist, you create it, and give the post that UUID (and then add the [whatever] pointing to the post ID of that newly created post). Forms that are still open and reference the UUID for the the newly created post will continue to work, except the UUID lookup will succeed, and they’ll attach [whatever] to the post ID of the post row with that UUID.”
  
  sorry I didn’t reply earlier (sick, then busy), but yes, that pretty much is exactly it…
miqrogroove 1:54 pm on January 25, 2010

A third and very simple strategy would be to have the front end keep track of the image IDs whenever there is no parent ID, and then include that ID list with the first auto save. That way you have no GC, no schema change, and no empty drafts..
Brian Layman 5:30 pm on January 25, 2010

Coming from a database background where 50gb dbs were not that unusual, I still cringe thinking at PHPPHP The web scripting language in which WordPress is primarily architected. WordPress requires PHP 5.6.20 or higher’s loose typing and handling numbers as strings. Reading up on UUIDs it is neat to see that it is a actually stored as a 128bit integer. Which I thought was a really good thing, but because this will be passed around as a string, this is not the end of the story:

There’s a warning in the documentation that character sets play a significant role in UUIDs. I’m not sure how that will affect us, but you can setup a case where the character set of the column doesn’t match the values stored in that column and therefore the indexes fail.

Maybe this can be fixed by regenerating the index, I don’t know. Is it a concern? It might be if the blogblog (versus network, site) is set to a different character set than the database right? I’ve run into problems like that before when the data was a different character set than the structure in which it was stored. Maybe it was only because we do a lot of tossing around of blogs…
Joseph Scott 6:17 pm on January 25, 2010

Which type of UUID was being considered for this?
Brian Layman 7:14 pm on January 25, 2010

I don’t think the MySQLMySQL MySQL is a relational database management system. A database is a structured collection of data where content, configuration and other options are stored. https://www.mysql.com/. UUID() function has various types.. http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_uuid
- Joseph Scott 8:30 pm on January 25, 2010
  
  From the discussion so far it seemed implied that the UUID would be generated in PHPPHP The web scripting language in which WordPress is primarily architected. WordPress requires PHP 5.6.20 or higher, not MySQLMySQL MySQL is a relational database management system. A database is a structured collection of data where content, configuration and other options are stored. https://www.mysql.com/.. That said it looks like the UUID showed up in MySQL 4.1.2 so we could potentially use it.
Peter Westwood 9:49 pm on January 25, 2010

I am really against trying to use UUIDs as the linkage between posts and there metadata/attachments.

We have a perfectly good method of providing this linkage and a simple solution has been proposed to resolve the issue with autosave prior to a draft existing.

All we need is for get_default_post_to_edit() to ensure something exists in the database and most of the rest of the code won’t need touching.

I am with Matt on the small change is better.

UUIDs aren’t designed to solve the issue we have here – they are designed as a way of letting you generate IDs that are unique accross multiple sites/machines and are a candidate for using as the GUID in the posts table but not as the linkage/post_id
- wpmuguru 5:58 pm on January 26, 2010
  
  +1
  The problems being addressed in those tickets occur infrequently. Reinventing the post ID management system seems to be an excessive solution to the problem.
- hakre 5:02 pm on January 28, 2010
  
  Westi, from what you write in your comment is that you did get it wrong. The UUID would not be used to replace the Post ID, it would be only in use as long as there ain’t no Post ID. You understand the difference?
  - Peter Westwood 5:50 pm on January 28, 2010
    
    Yes. But it would effectively replace post ID.
    
    Either we would have to have all the code cope with matching based on ID or UUID or we would have to switch solely to using UUID.
    
    Introducing UUID for posts that aren’t saved yet just introduces unnecessary complexity
  - hakre 6:11 pm on January 28, 2010
    
    The unnecessary (?) complexity was introduced by users who do upload attachments for non-existing posts because the UIUI User interface made them think, they are actually uploading images to a new post.
    
    The UUID is only a technical thingy to make the software again capable of that situation. The UUID itself is not introducing any more complexity. The opposite is the case, it makes things more simple and useable. The current data structure is not able to cope with the situation we have in the post editor.
    
    Or what do you suggest as analysis why this bugbug A bug is an error or unexpected result. Performance improvements, code optimization, and are considered enhancements, not defects. After feature freeze, only bugs are dealt with, with regressions (adverse changes from the previous version) being the highest priority. is staying open so long unfixed?
  - Peter Westwood 7:52 pm on January 28, 2010
    
    If we introduce a UUID just for this situation we have to write a lot of extra code for WordPress to handle the action on an un-saved post to handle the action and every pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party author that wants to do something similar has to do this too.
    
    If we just always create a draft then the current code in coreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress. and plugins will just work.
Brian Layman 5:13 pm on January 28, 2010

I wasn’t going to reply because comments on this had slowed, but I think I got it wrong too. I don’t think that Denis implied any changes at all to the data structure. If you read the last paragraph, he is envisioning the UUID being stored in the post_guid field. So his proposal is one for the editing screens only and not an altering of the post table’s structure. If I understood it more correctly upon re-read that is.
- Peter Westwood 5:51 pm on January 28, 2010
  
  This would still affect a lot of code which wouldn’t need changing if we just created a draft before outputting the page.
Jeremy Stark 5:53 pm on February 19, 2010

I wrote a UUID pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party in order to integrate WordPress with another publishing system:

https://wordpress.org/extend/plugins/simple-uuid/installation/

It just adds a uuid to each posts metadata.

Comments are closed.

Welcome!

Communication

Taking advantage of UUIDs

Welcome!

Communication

Share this: