Umbraco Examine – Indexing Large Sites – ThreadAbort Timeout Error Workaround

This relates to a problem with Umbraco Examine indexing very large Umbraco sites. The issue is being looked at by the core Examine team, however if like us you can’t wait – the below describes a workaround we used:

As part of an Umbraco 4.7 (which ended up being a 4.7.1 upgrade for other reasons) we had an issue with the UmbracoExamine indexing timing out when initially creating the internal indexes for Members and Content respectively. This initial indexing action happens on the application start as the UmbracoEventManager constructor calls EnsureIndexesExist().
The item is logged in the work item here: http://examine.codeplex.com/workitem/10324 – and the good people at the Farm have already started working on a solution.
The current solution is to split the Lucene .add files into folders with 500 in each, which improves the IO performance. Another very welcome addition is a new RebuildOnAppStart configuration setting in the ExamineSettings.config file:

<Examine RebuildOnAppStart="false">


We used this setting in conjunction with the Examine Index admin package found here: http://our.umbraco.org/projects/backoffice-extensions/examine-index-admin – which allows you to kick off the indexSets one at a time, and monitor the progress.

Some other potential issues:

  • In a large site you might get >9 batch folders in the queue folder – currently this breaks (bear in mind we’re using untested source code here). Likely this will be fixed soon – but if you still get this, you need to add a Int32.Parse to the ordering statements in the LuceneIndexer.cs file.
  • The processing does not index all the queued batch folders, again you need to look at LuceneIndexer.cs, in the ForceProcessQueueItems() you need to order the names using an Int32.Parse and then convert this sort to a list before iterating.
  • The Examine Index package may time out before generating all the .add files (in the batched folders) as this is not done asynchronously. To solve this we added this to the package file at \usercontrols\packages\umbraco\ExamineIndexAdmin.ascx.cs (note this does not have to be compiled, uses the CodeFile attribute in the aspx):

    private int _orgTimeOut;
    
            private void Page_Init(object sender, System.EventArgs e)
            {
                _orgTimeOut = Server.ScriptTimeout;
                Server.ScriptTimeout = 3600; //1 hour (seconds)
            }
    
            private void Page_PreRender(object sender, System.EventArgs e)
            {
                Server.ScriptTimeout = _orgTimeOut;
            }
    

    This is adapted from here: http://codebetter.com/petervanooijen/2006/06/15/timeout-of-an-asp-net-page/.
    I'm not sure it's a super great solution, as it will affect all requests until the pre_render event reverts - but I wanted a solution that did not require us to modify the web.config.

All in all, the above is a viable interim solution for what is likely to be a temporary problem in the Umbraco Examine project. Once these indexes have been generated, the day-to-day indexing is unaffected by the size of your content (or number of members) and works perfectly.
As discussed, it also seems that the Umbraco Examine folks at The Farm are already aware of these issues, so I'm sure they will have an updated release that fixes this (properly) very soon.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: