The Umbraco Database – and Content Versioning (Rollback)

Take a look at the lovely picture below. Int’it pretty?
Umbraco Database Diagram - Core Tables

Now working with Umbraco you really don’t (or shouldn’t) care too much about the database and the structure of this, the content is published to the cached XML structure (as described elsewhere), and this is what you work of.
However, I was looking at the very impressive rollback feature of the Umbraco CMS, where you can not only view all previous versions of your document, but also review “diffgrams” of all the document property values before selecting a version to roll back to.
I wanted to see how Umbraco achieved this at a database level (I’ve worked on version systems before, and it’s always a great opportunity to write some hugely over-complicated and convoluted code).

Referring to the tables in the diagram above (click it for full size version) – umbracoNode is your node tree in the content pane of the CMS, from here cmsDocuments get published, and then the actual XML content that gets published is stored in the cmsContent table (which has a 1-to-1 relation over to the cmsContentXML table that holds the actual XML).
Now you’d have thought that different versions of cmsContent would then get saved off in the cmsContentVersion table right…?
But no, this only holds a FK to the cmsContent table, a versionDate and then a “VersionId” which is a GUID.
This GUID turned out to be the key to the puzzle. Even though there’s no relational integrity enforcing this, the versionId guid links into both the cmsDocument and cmsPropertyData tables, both of which also hold versionId GUIDs.
So the cmsPropertyData table is really key in the Umbraco database – this is where your data (the actual values of the document properties) is saved, versioned and published to XML from – if you query it in SQL you can see a rolling history of all the little changes you have made to all the properties of all your documents, and if you order by the versionId – you will see that all properties get saved for each version (not just updated ones).

Overall I would say it’s a slightly surprising design, I’d expect them to version of the actual published XML data, rather than painstakingly save off the history of each property – but of course the problem with just saving the XML blop is that the document structure might change – users can of course add/remove/change properties which will cause problems when you then afterwards want to try and roll back.
What the cmsPropertyData table provides is a sort of dynamic history – where deleting a property from your document type, will also remove (I assume, as you have RI between cmsPropertyData and cmsPropertyType) the version history for that property’s data (across all documents).
The end results being that you always have a version history that is comparable to the current document type (save of course for added properties, but these can of course be compared to their default value).

Anyways – that’s guessing at the reason for the design – but it’s an interesting approach for sure.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: