<CharlieDigital/> Programming, Politics, and uhh…pineapples

5Dec/13Off

A Simple Way to Improve CAML Query Performance

There are many ways to improve the performance of your CAML queries, but I've recently found that in some cases, it's as easy as switching the order of your filter operations.

In this case, I was searching across a list of 1,000,000 items for a set of 41.

The list consists of tasks with, among other fields, a Status and Assigned To field.

Both of these fields were indexed, but the following query was still running in the 10 second range:

<Where>
    <And>
        <And>
            <Neq>
                <FieldRef Name='Status' />
                <Value Type='Choice'>Completed</Value>
            </Neq>
            <Neq>
                <FieldRef Name='Status' />
                <Value Type='Choice'>Terminated</Value>
            </Neq>
        </And>                                             
        <Eq>
            <FieldRef Name='AssignedTo' LookupId='TRUE' />
            <Value Type='Int'>159</Value>
        </Eq>
    </And>
</Where>

One small tweak and the same query ran in 1.5s:

<Where>
    <And>
        <Eq>
            <FieldRef Name='AssignedTo' LookupId='TRUE' />
            <Value Type='Int'>159</Value>
        </Eq>	
        <And>
            <Neq>
                <FieldRef Name='Status' />
                <Value Type='Choice'>Completed</Value>
            </Neq>
            <Neq>
                <FieldRef Name='Status' />
                <Value Type='Choice'>Terminated</Value>
            </Neq>
        </And>                                             
    </And>
</Where>

All that was done was to shift the order of the query conditions.

The first query reads as "All tasks that are not Completed and not Terminated and Assigned To user 159".

The second query reads as "All tasks that are Assigned To user 159 that are not Completed and not Terminated".

I didn't trace the generated SQL, but it's not hard to imagine that the SQL now performs an initial filter on the data set against the user ID and returns a much smaller data set for subsequent operations (filter by Status).

So the lesson learned is that for large lists, you need to follow Microsoft's guidance on large lists, but also ensure that your queries are written to take advantage of the indexes and reduce the data set as early as possible (preferably against an indexed field.).

Posted by Charles Chen