<CharlieDigital/> Programming, Politics, and uhh…pineapples


SQL Query for Multi-Values In An Encoded String

Posted by Charles Chen

Consider a table with an textual column that encodes multi-values like so:


How can we query for all rows using an OR criteria?

For a single value, it's quite easy by searching for the string (in this case, an ISO 2 code).  But what if we need to search for the occurrence of one of n strings?

The following query achieves this using the Microsoft SQL XML data type, the nodes() function, and CROSS APPLY:

-- Create data source; this is just for demo purposes
    id int,
    contact nvarchar(100),
    countries nvarchar(100) 

-- Insert sample test data.
INSERT INTO @contacts VALUES (1, 'Charles', '1|US;2|CA;3|MX') -- US, CA, MX
INSERT INTO @contacts VALUES (2, 'Steven', '1|US;3|MX;2|CA') -- US, MX, CA
INSERT INTO @contacts VALUES (3, 'Arturo', '3|MX') -- MX
INSERT INTO @contacts VALUES (4, 'Silvia', '4|FR') -- FR
INSERT INTO @contacts VALUES (5, 'John', '2|CA;1|US') -- CA, US
INSERT INTO @contacts VALUES (5, 'Johan', '5|DE') -- DE

-- Query for all contacts in US OR MX OR CA (Charles, Steven, Arturo, John)
    DISTINCT T1.id,
        CAST('<a><c>' + REPLACE(countries, ';','</c><c>') + '</c></a>' AS XML) AS countriesXml
    FROM @contacts  
    ) AS T1
CROSS APPLY T1.countriesXml.nodes('/a/c') T2(c)
WHERE CAST(T2.c.query('string(.)') AS varchar(max)) IN ('1|US', '3|MX', '2|CA')

This should yield the values Charles, Steven, Arturo, and John by first converting the delimited values into XML by simply using string replacement.  Next, the XML is "shredded" using nodes().  For each base row, the shredding generates one row per node (for example, for Charles, we would have one row for US, one row for CA, and one row for MX).

Here is the result of the inner sub-select:

1	Charles	<a><c>1|US</c><c>2|CA</c><c>3|MX</c></a>
2	Steven	<a><c>1|US</c><c>3|MX</c><c>2|CA</c></a>
3	Arturo	<a><c>3|MX</c></a>
4	Silvia	<a><c>4|FR</c></a>
5	John	<a><c>2|CA</c><c>1|US</c></a>
5	Johan	<a><c>5|DE</c></a>

And here is the resultset after shredding:

1	Charles	1|US
1	Charles	2|CA
1	Charles	3|MX
2	Steven	1|US
2	Steven	3|MX
2	Steven	2|CA
3	Arturo	3|MX
4	Silvia	4|FR
5	John	2|CA
5	John	1|US
5	Johan	5|DE

You can see the intermediate resultset using this query:

		CAST('<a><c>' + REPLACE(countries, ';','</c><c>') + '</c></a>' AS XML) AS countriesXml
	FROM @contacts 	
	) AS T1
CROSS APPLY T1.countriesXml.nodes('/a/c') T2(c)

Finally, the DISTINCT clause collapses the resultset once again.

From my own testing, better performance can be achieved by creating a table variable with the target values and using a JOIN instead of an IN (about 1/5 of the time). For 100k records, using IN takes about 13.430s. Using a JOIN to a table variable takes about 2.293s.


Has SharePoint Jumped the Shark?

Posted by Charles Chen

I've worked with SharePoint now for some 7 years starting from SharePoint 2007 beta 2 and I've worked with portals and CMS  systems in some form or another since 2003.

Since 2006, most of my domain experience has been in life sciences, but I think that what I've observed applies to most other industries and verticals as well.

So the question is whether SharePoint has jumped the shark?  Is it the second coming of Lotus Notes?  Have businesses soured on it as an enterprise application platform?  Is it a failed vision?  Is it time for your organization to put it out to pasture?

One of my observations in the last two years or so is that increasingly, I have been called upon to defend the choice of building a solution of top of SharePoint (I've even heard audible snickers).  You can see the eyes rolling from certain folks when we mention that our solution at InnovoCommerce is built on top of SharePoint.  The skepticism and questions about scalability soon follow, but it's not unfounded: it's most often based on firsthand experience with the pains of owning and managing SharePoint.  As the saying goes: fool me once, shame on you; fool my twice, shame on me.

Indeed, I think that the popularity and rise of SharePoint has been its own worst enemy in a sense.  As adoption of SharePoint increased with 2007 and 2010 (in part, thanks to the incredible marketing of Microsoft that often oversold the ease and value), organizations rushed to roll it out and make it available to their users.  In many cases, proper governance and information architecture were either an afterthought or poorly thought out from the onset, leading to deployments which were a nightmare to manage and maintain as rogue site collections and sub-sites cropped up like toadstools after an early summer shower.

Now, after one or two upgrade cycles, and maybe half a decade of ownership, organizations are starting to sour on shelling out for another (multi-) million dollar project to upgrade to 2013.

Organizations eventually -- more or less -- run into the same key problems:

Scalability and Performance.

One of the most troublesome byproducts of poor information architecture and planning is a considerable degradation of performance.  SharePoint provides endless ways with which to shoot yourself in the foot with regards to building scalable, performant applications; it's a field of landmines that even experienced SharePoint architects have to navigate carefully on each project as understanding the scale and intended usage patterns of data is key to coming up with a plan to build a system that will scale.  The indiscriminate use of and reliance on item level permissions is a good example of this.  Failure to break up data into manageable scopes is another.  In the case of custom code, bad or lazy coding practices can also have severe consequences in terms of performance.

Fortunately, most performance issues can be managed -- if use cases are well designed and well understood -- through some thoughtful design up front to build an information architecture that will align with how SharePoint is built to scale.  On the custom code side, following guidelines and best practices along with code reviews can help prevent or alleviate many of the most common traps that developers fall into.  For this, experienced, battle-tested SharePoint architects and developers are worth their weight in gold (wink, wink)!

Of course, this feeds well into the next point...


Getting a large SharePoint architecture and deployment right is expensive (getting it wrong, even more so!).  From the cost of the hardware to the cost of the licenses of SharePoint (don't forget Windows Server, SQL Server, etc.) to the cost of the human resources required to plan the infrastructure, gather the requirements, design the architecture, manage the deployment, and offer on-going support.

It is a huge endeavor made all the more challenging when you factor in custom solutions that get built and installed on top of SharePoint and all that entails -- including risk.  Of course, many companies I've worked with have tried hard to minimize these costs through governance policies that severely limit the amount of customizations available -- often requiring several rounds of evaluation, validation, and approvals to get custom solutions installed.

While this does tend to help keep costs down from an ongoing maintenance perspective, I contend that it also severely limits the ROI and utility of SharePoint by crippling it.  Strip away the ability to build sophisticated business solutions on top of SharePoint, and you are left with a very expensive portal; it's like going to Chipotle and getting a rice bowl with only rice in it

In a way you could say that overzealous organizations addressed the cost of risk by squashing creativity and innovation in terms of mapping business processes into powerful solutions in SharePoint; IT simply didn't allow teams to come up with novel, creative, and innovative ways to extract ROI from SharePoint because of fear (I don't blame them!).  Oddly, by limiting SharePoint to basic ECM functionality, I believe that IT organizations simultaneously decreased their own value in terms of delivering real solutions to the business users and creating a justification for the investment in SharePoint -- everyone loses if you just treat SharePoint like a giant file share or a merely a portal.

Planned Obsolescence.

While I'm a huge advocate of SharePoint as an enterprise application platform (it's what I've been doing for years now) and not just a portal, working in the solution and product development side means that I also have an appreciation of the pain and costs associated with planned obsolescence.  You see, every three years or so, Microsoft releases a new version of SharePoint that will eventually "force" you to upgrade.  Perhaps it's a new feature of Office that can only be used with a certain version of SharePoint.  Or it's a critical fix or feature that addresses a key issue of your current deployment.  Or the product simply reaches end-of-life.  Or, from a product development perspective, you are simply forced to upgrade your codebase to capture or keep your customer base that is moving onto a new version.  It's expensive (I could imagine circumstances where migrations could cost as much or more than the original platform itself) and you are kind of forced to do it every few years.

It creates so much impedance and hand-wringing that some companies simply skip the migration and leave little zombie SharePoint deployments sitting around (our own document management portal at InnovoCommerce is still on SharePoint 2007, and we're a SharePoint shop with deep SharePoint experience).

From the product development side, it is a particularly challenging balancing act to decide how to allocate resources and time to upgrading your codebase while still developing capabilities and maintaining support for existing customers who may not move off of their current version of SharePoint for another two or three years (because of the aforementioned issue of cost and time).  When the fundamental architecture of the platform changes so drastically -- as it has with the shift from 2010 to 2013 -- this makes it all the more challenging (and expensive).

Analytics (Or Lack Thereof).

While SharePoint does offer many great features for building dashboards and KPIs, integration with SQL Server Reporting Services, and other reporting capabilities, I find it severely lacking from an analytics perspective.  A good example is that, for a platform that bills document management as one of the key features, you would think it would be simple to see a document access report that shows all documents, the number of times they were accessed, who accessed them, what actions each user took, etc.  You'd like to see it with some basic charts and visualizations that give you a good overview of the data (in an increasingly data driven and yet data saturated world).

Nope.  Instead, to get this type of basic document metrics, you have to build quite a complex solution (especially so given the barely sufficient audit log query APIs).

Another challenge from an analytics perspective is basically complexity.  For any significant deployment, it is a huge challenge to extract data from SharePoint to build sophisticated reports, especially if you follow best practices and segment data into separate site collections and databases not to mention the challenges and pitfalls of working directly against the raw SharePoint databases (practically a must if you plan on doing any serious analytics across your data).

What's the Future?

For many organizations, I can see SharePoint 2010 being their last dip into the SharePoint pool, at least as an IT owned and managed solution.  I believe that the experience that organizations had with owning 2007 and then migrating to 2010 has left a bad taste in the mouths of many IT organizations (not to mention the business sponsors).  Microsoft is making a strong push with Office 365, but many organizations are understandably reluctant to put their data into Microsoft's cloud (or any cloud, for that matter).  Other organizations might opt to migrate onto less costly, open source solutions like Alfresco, Liferay, or perhaps any number of cloud-based platforms including Google (one of our very large pharma customers has already moved their mail systems from Exchange and also use Google Docs).

It will be interesting to see how SharePoint 2013 and the future of SharePoint unfurls given much of the challenges organizations have had with 2007 and 2010.  In life sciences -- and I suspect with many other industries as well -- there has also been a large movement to downsize IT budgets and IT ownership of platforms, making ownership of SharePoint and delivering useful business solutions on top of it an increasingly challenging task.

While SharePoint 2013 is still plenty expensive when you factor in all of the licenses and expertise required to deploy and manage it, I think it addresses many of the key problems with regards to the deployment of custom solutions -- where I think the most business value is realized -- into SharePoint and is a huge leap forward from 2010 (a much bigger gap, in my opinion, than from 2007 to 2010).

At InnovoCommerce, we have yet to see any of our customers move in this direction (but then again, life sciences companies tend to move slowly anyways -- some of our customers are just going live with 2010) so it will be interesting to see how the landscape of enterprise portals and application platforms evolves over the next year or so.

We are increasingly mulling moving more and more of our codebase out of SharePoint as a mechanism for insulating against the cost of upgrade cycles as well as ensuring that our platforms can be deployed, managed, and integrated more readily and preparing ourselves for a future where our customers may be looking the other way when it comes to their enterprise portal and business application platform of choice.


Use ContentIterator to Read Items from Large SharePoint Lists

Posted by Charles Chen

New to me: ContentIterator for reading items from large lists.