<CharlieDigital/> Programming, Politics, and uhh…pineapples

6Sep/13Off

SharePoint, Large Lists, Content Iterator, and an Alternative

When retrieving large datasets from large SharePoint lists, SharePoint 2010 provides a new class called ContentIterator (CI) that is supposed to help make large dataset retrievals from large lists possible.

There's a bunch of great documentation on the web regarding this class, but one interesting observation that I've made is that it seems that it limits your query to one field only.  This means that in your query's where clause, you can only include one field, even when used with the order clause generated by ContentIterator.ItemEnumerationOrderByNVPField.

I tested with a list containing over 22,000 items with the default thresholds:

list-settings

And randomly generated data like so:

random-data

It turns out that if I use more than one field in the query, even with an index on each field in the query and using SPQueryThrottleOption.Override, the CI will fail the query with a threshold error.

What's one to do if you need to get all of the items in a list?

It seems that you should be able to just simply write a loop that executes the query and retrieves data, page-by-page, until you reach the end of the set.  So I rigged up the code myself:

/// <summary>
///     Executes a query and returns the result in batches.
/// </summary>
public class BatchQueryExector
{
    private SPQuery _query;

    private BatchQueryExector() {}

    /// <summary>
    ///     Creates an instance of the executor against the specified query.
    /// </summary>
    /// <param name="query">The query to execute.</param>
    /// <returns>The instance of the executor.</returns>
    public static BatchQueryExector WithQuery(SPQuery query)
    {
        BatchQueryExector executor = new BatchQueryExector();

        executor._query = query;

        return executor;
    }

    /// <summary>
    ///     Specifies the list the query will be executed over.
    /// </summary>
    /// <param name="list">The SharePoint list that contains the data.</param>
    /// <returns>An instance of <c>ExecutionContext</c>.</returns>
    public ExecutionContext OverList(SPList list)
    {
        return new ExecutionContext(_query, list);
    }

    /// <summary>
    ///     Inner class used to encapsulate the execution logic.
    /// </summary>
    public class ExecutionContext
    {
        private readonly SPList _list;
        private readonly SPQuery _query;

        /// <summary>
        ///     Creates a new instance of the context.
        /// </summary>
        /// <param name="query">The query to execute.</param>
        /// <param name="list">The SharePoint list that contains the data.</param>
        public ExecutionContext(SPQuery query, SPList list)
        {
            _query = query;
            _list = list;
        }

        /// <summary>
        ///     Retrieves the items in the list in batches based on the <c>RowLimit</c> and 
        ///     invokes the handler for each item.
        /// </summary>
        /// <param name="handler">A method which is invoked for each item.</param>
        public void GetItems(Action<SPListItem> handler)
        {
            string pagingToken = string.Empty;

            while (true)
            {
                _query.ListItemCollectionPosition = new SPListItemCollectionPosition(pagingToken);

                SPListItemCollection results = _list.GetItems(_query);

                foreach (SPListItem item in results)
                {
                    handler(item);
                }

                if (results.ListItemCollectionPosition == null)
                {
                    break; // EXIT; no more pages.
                }

                pagingToken = results.ListItemCollectionPosition.PagingInfo;
            }
        }
    }
}

This can be invoked like so:

internal class Program
{
    private static void Main(string[] args)
    {
        Program program = new Program();

        program.Run();
    }

    private void Run()
    {
        using (SPSite site = new SPSite("http://internal.dev.com"))
        using (SPWeb web = site.OpenWeb())
        {
            SPList list = web.Lists.TryGetList("TestPaging");

            SPQuery query = new SPQuery();
            query.Query = @"
<Where>        
        <And>
            <Eq>
                <FieldRef Name=""TstProgram"" />
                <Value Type=""Text"">Program 1</Value>
            </Eq>
            <Eq>
                <FieldRef Name=""TstDocumentType"" />
                <Value Type=""Text"">15 Day SUSAR</Value>
            </Eq>
        </And>      
</Where>";
            query.RowLimit = 100; // Effective batch size.
            query.QueryThrottleMode = SPQueryThrottleOption.Override;

            query.Query += ContentIterator.ItemEnumerationOrderByNVPField;

            Stopwatch stopwatch = new Stopwatch();

            stopwatch.Start();

            int count = 0;

            BatchQueryExector.WithQuery(query).OverList(list).GetItems(i => { count++; });

            stopwatch.Stop();

            Console.Out.WriteLine("{0}ms, {1} items", stopwatch.ElapsedMilliseconds, count);
        }
    }
}

It turns out that this works just fine provided that at least one of the columns in your query has an index. I tested with indices on all columns, on two columns, on one column, and on no columns.  With no indices, this query will fail as well (must have at least one index on one of the columns in your query in my testing, but my guess is that you will need more as the number of items increases).  With one to three indices, it made no difference in performance.  In fact, it got a little slower with three indices.

The batch size also had an impact.  Larger batch sizes were more efficient, which makes sense given that another database roundtrip is made for each batch.  For 32,000 items (I added more to test), a batch of 1000 (seems to be a sweet spot in my testing) completed in 2487ms for 7902 items.  A batch of 500 completed in 2550ms.  A batch of 100 completed in 3116ms.

It doesn't have the fancy bits of the CI, but it will work with multiple columns in your where clause.

To test for yourself, you can download this handy-dandy, multi-threaded list filler

Posted by Charles Chen

Filed under: SharePoint Comments Off