13 March, 2012

Umbraco No Node Exists / cmscontentxml out of sync

For some time now we have had a problem when modifying document type alias' in umbraco (4.x.x), because it causes a complete flush of the XML caches used in umbraco. During these flushes of the caches if unexpected errors occurs, like DB connection lost, IIS timeout, AppPool flush or similar the cmscontentxml database table could become out-of-sync. This would lead to strange errors on the frontend of the sites in the installation:
  • No node exists with id 'xxxx'
  • Missing menu items
  • Macros throwing out errors
  • etc.
The problem has become almost a show-stopper for our umbraco installation as the number of nodes has increased, we are currently having 6000+ nodes and hosting 200+ websites on a single umbraco install. 

The errors are not showing up right after a change of the doctype alias, but after roughly 10-15 minutes, due to the thread updateing of the caches, and that umbraco will automaticly flush the cmscontentxml table to disk and memory caches if any of the two caches becomes invalid. Having "random" errors happening on 6000+ nodes is not acceptable.

So far we have "fixed" the issue by calling the http://myumbraco.com/Umbraco/dialogs/republish.aspx?xml=true which will force a full republish of all pages/nodes to the cmscontentxml table, and after that flush the disk and memory cache. The normal "right-click > republish" on the root node is only flushing the cmscontentxml table to disk and memory, but is not updating the database table, which is out-of-sync.

We have found several posts about the issue, but so far no good solution:
So we decided to come up with a proposal for a solution, that would ensure that regardless of any connectivity issues, or AppPool recycles, the cmscontentxml table would not get out of sync (this was the main reason for errors). Also modifying a DocumentType alias should NOT be a 10-15+ minutes wait-while-saving operation on large node umbraco installs.

Our solution modifies the DocumentType save event/method, by adding each node id that is using the DocumentType into the cmstask database table bycreating a new umbraco Task. These tasks are then loaded (one-by-one) using the umbraco.presentation.publishingService, and a flush of the database, disk  and memory cache for each node is done, and only after it is completed for each node the Task is deleted.
This should ensure that even a AppPool flush will just cause the publishingService to publish the node one more time.

A umbraco work item allready exists for this, see:
Also our patch is uploaded to it. (download patch)

####### December 20 - 2012 Update. #######
Due to the high number of requests, both here and from other sources, about this bug in Umbraco, I have created some additional release builds of the Umbraco Source with this patch applied, just click on the version you need the patched files for to download it. (both patch, dll's and source is available)
Umbraco Version 4.7.2
Umbraco Version 4.9.1
Umbraco Version 4.10.1
Umbraco Version 4.11.1

####### July 20 - 2013 Update. #######
Friday I received a distress request to help a company with a site that was down, and I spent my vacation-day-off helping them patching the Umbraco 4.11.10 version with this fix. I reviewed the code done by Shannon at Umbraco HQ, and although his code was a HUGH improvement, it was not a fix for this issue.
So here is the patch bundle for Umbraco, the dll's are built in release mode, but with pdb files.
Umbraco Version 4.11.10

####### August 2 - 2013 Update. #######
Be very careful about the latest 4.11.10 version of Umbraco and republish xml, there is a bug which will cause the sortOrder property on the nodes to be set to "0" in the output/cache XML. I'm currently testing a new patch and will release it after some testing.
see: http://issues.umbraco.org/issue/U4-2527#comment=67-8808
A quick google search on "umbraco sort order 0" reveals that it is a old bug, and is affecting many areas of Umbraco, media, content, basicly all objects that inherit from CMSNode