<CharlieDigital/> Programming, Politics, and uhh…pineapples

4Apr/140

FluentNHibernate and SQL Date Generation

Posted by Charles Chen

So you'd like your SQL entry to have a system generated date/time, eh?

Here is a sample table:

CREATE TABLE [dbo].[AuditLog] (
	Id int IDENTITY(1,1) PRIMARY KEY,
	EventUtc datetime2(7) DEFAULT(SYSUTCDATETIME()) NOT NULL,
	EventOffsetUtc datetimeoffset(7) DEFAULT(SYSDATETIMEOFFSET()) NOT NULL,
	EntityContextUid uniqueidentifier,
	EntityContextName nvarchar(256),
	EntityContextType varchar(128),
	UserLogin nvarchar(128),
	EventName varchar(128),
	AppContext varchar(64),
	EventData nvarchar(max),
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

To spare you hours dealing with this error:

System.Data.SqlTypes.SqlTypeException:
   SqlDateTime overflow. Must be between
   1/1/1753 12:00:00 AM and 12/31/9999
   11:59:59 PM.

What you need to do is to use the following mapping for your date/time columns:

Map(a => a.EventUtc).Column("EventUtc")
	.CustomSqlType("datetime2(7)")
	.Not.Nullable()
	.Default("SYSUTCDATETIME()")
	.Generated.Insert();
Map(a => a.EventOffsetUtc).Column("EventOffsetUtc")
	.CustomSqlType("datetimeoffset(7)")
	.Not.Nullable()
	.Default("SYSDATETIMEOFFSET()")
	.Generated.Insert();

Special thanks to this Stackoverflow thread.

2Apr/140

What Alan Watts Can Teach Us About Leadership

Posted by Charles Chen

I was listening to a talk by Alan Watts and found one bit of advice that really connected to what I've learned about leading others.

The principle is that any time you -- as it were -- voluntarily let up control.

In other words, cease to cling to yourself; you have an excess of power because you are wasting energy all the time in self defense.

Trying to manage things, trying to force things to conform to your will.

The moment you stop doing that, that wasted energy is available.

Therefore you are in that sense -- having that energy available --  you are one with the divine principle; you have the energy.

When you are trying, however, to act as if you were god, that is to say you don't trust anybody and you are the dictator and you have to keep everybody in line you lose the divine energy because what you are simply doing is defending yourself.

One mistake that I've been guilty of is to try to force things to conform to my will on various projects (I still do it to varying degrees!).  It is usually with the best of intentions -- for a cleaner framework, a better product, a more efficient process -- but at the same time, it is true that a lot of energy is spent wasted in doing so.

What is the alternative, then?

I think Watts is right that a level of trust has to exist that the team around you can help you achieve your project goals.  Instead of expending the energy in controlling the members of the team, spend the energy in building that trust through training, mentorship, guidance, and giving up not just control, but responsibility.

Sometimes that trust will be unwarranted, but sometimes, that trust will pay itself back many-fold.

26Dec/13Off

Setting Batch Variables Dynamically

Posted by Charles Chen

One of the challenges of managing builds and deployments of source on multiple developer machines is that it can be complicated to contain and manage different variations in developer environments.

For example, often times, it is useful to know the server name or the connection string information so that local settings don't make it into source control.

How I've often tackled this is to add a batch file that every developer executes when getting the source for the first time.  This batch file asks for various settings and then saves the results to a text file which is then read back when executing other batch files.

Here is an example of such a batch file:

@ECHO OFF

ECHO ================================================================
ECHO Creates the developer specific local configuration for builds
ECHO that will allow developers to use local settings for specifc
ECHO aspects of the deployment.
ECHO ================================================================

:SPURL
SET SPURL=
SET /P SPURL=Enter SharePoint site URL (ex. http://mymachine:2345/): %=%
IF "%SPURL%"=="" GOTO SPURL

ECHO ----------------------------------------------------------------

:SQLCONN
SET SQLCONN=
ECHO Enter SQL connection string to the IC membership database 
ECHO (ex. Server=VM1;Database=IPDB;User ID=membershipUser;Password=P@ssw0rd;Application Name=HeCoreServices):
SET /P SQLCONN=
IF "%SQLCONN%"=="" GOTO SQLCONN

ECHO ----------------------------------------------------------------

ECHO SPURL=%SPURL% > build-configuration.txt
ECHO SQLCONN=%SQLCONN% >> build-configuration.txt

ECHO Completed; created file build-configuration.txt

PAUSE

This batch file will prompt the developer for two settings: the URL of a site and the connection string that the developer is using locally (which can vary by the database name, login, etc.).  The contents get written to a file called build-configuration.txt that looks like this:

SPURL=http://mymachine:2345
SQLCONN=Server=VM1;Database=IPDB;User ID=membershipUser;Password=P@ssw0rd;Application Name=HeCoreServices

This file is excluded from source control and developers can, of course, manually edit this file as well to create local settings.

Now when I'm ready to use these settings in another batch file, I can invoke it like so:

@ECHO OFF

SETLOCAL ENABLEDELAYEDEXPANSION

FOR /F "tokens=*" %%n IN (build-configuration.txt) DO (	
	ECHO %%n

	SET %%n
)

ECHO %SPURL%
ECHO %SQLCONN%

PAUSE

There are other ways to do this as well, but the downside to most approaches is that you have to know how many parameters you have or use less meaningful names.  This approach will let you set variables to your heart's content and read them in dynamically at execution.

Filed under: Awesome, Dev, Self Note No Comments
5Dec/13Off

A Simple Way to Improve CAML Query Performance

Posted by Charles Chen

There are many ways to improve the performance of your CAML queries, but I've recently found that in some cases, it's as easy as switching the order of your filter operations.

In this case, I was searching across a list of 1,000,000 items for a set of 41.

The list consists of tasks with, among other fields, a Status and Assigned To field.

Both of these fields were indexed, but the following query was still running in the 10 second range:

<Where>
    <And>
        <And>
            <Neq>
                <FieldRef Name='Status' />
                <Value Type='Choice'>Completed</Value>
            </Neq>
            <Neq>
                <FieldRef Name='Status' />
                <Value Type='Choice'>Terminated</Value>
            </Neq>
        </And>                                             
        <Eq>
            <FieldRef Name='AssignedTo' LookupId='TRUE' />
            <Value Type='Int'>159</Value>
        </Eq>
    </And>
</Where>

One small tweak and the same query ran in 1.5s:

<Where>
    <And>
        <Eq>
            <FieldRef Name='AssignedTo' LookupId='TRUE' />
            <Value Type='Int'>159</Value>
        </Eq>	
        <And>
            <Neq>
                <FieldRef Name='Status' />
                <Value Type='Choice'>Completed</Value>
            </Neq>
            <Neq>
                <FieldRef Name='Status' />
                <Value Type='Choice'>Terminated</Value>
            </Neq>
        </And>                                             
    </And>
</Where>

All that was done was to shift the order of the query conditions.

The first query reads as "All tasks that are not Completed and not Terminated and Assigned To user 159".

The second query reads as "All tasks that are Assigned To user 159 that are not Completed and not Terminated".

I didn't trace the generated SQL, but it's not hard to imagine that the SQL now performs an initial filter on the data set against the user ID and returns a much smaller data set for subsequent operations (filter by Status).

So the lesson learned is that for large lists, you need to follow Microsoft's guidance on large lists, but also ensure that your queries are written to take advantage of the indexes and reduce the data set as early as possible (preferably against an indexed field.).

26Nov/13Off

Preventing the Garbage Collector From Ruining Your Day

Posted by Charles Chen

If you're working with ZeroMQ, you may run into an exception with the message "Context was terminated".

It turns out that this is due to the garbage collector cleaning up (or attempting to clean up?) the ZmqContext.

Found this out via this handy thread on Stack, but what about cases where you can't use a using statement?

For example, in a Windows Service, I create the context on the OnStart method and destroy the context on the OnStop method.

In this case, an alternative is to use the GC.KeepAlive(Object obj) method to prevent the garbage collector from collecting the object until after the call to this method.  It seems counter intuitive, but it is actually a signal to tell the garbage collector that it can collect the object at any point after this call.

Filed under: .Net, Self Note, ZeroMQ No Comments
16Oct/13Off

Watch Out For SPListItemCollection.Count and Judicious Use of RowLimit

Posted by Charles Chen

This seemingly innocuous call can be quite dangerous when used incorrectly.

The reason is that this property invocation actually executes the query.

This is OK if you plan on iterating the results because the results are cached, but costly if you don't plan on iterating the results.  The following code sample can be used to test this effect for yourself:

static void Main(string[] args)
{
    using(SPSite site = new SPSite("http://internal.dev.com/sites/oncology"))
    using (SPWeb web = site.OpenWeb())
    {
        SPList list = web.Lists.TryGetList("General Tasks");

        SPQuery query = new SPQuery();
        query.RowLimit = 1;
        query.Query = @"
<Where>
<Contains>
<FieldRef Name='Title'/>
<Value Type='Text'>500KB_1x100_Type_I_R1</Value>
</Contains>
</Where>";
        query.QueryThrottleMode = SPQueryThrottleOption.Override;

        SPListItemCollection items = list.GetItems(query);

        Stopwatch timer = new Stopwatch();

        timer.Start();

        Console.Out.WriteLine("{0} items match the criteria.", items.Count);

        var timeForCount = timer.ElapsedMilliseconds;

        Console.Out.WriteLine("{0} milliseconds elapsed for count.", timer.ElapsedMilliseconds);

        foreach (var i in items)
        {
            Console.Out.WriteLine("{0} milliseconds elapsed for start of iteration.", timer.ElapsedMilliseconds - timeForCount);

            break;
        }
    }
}

(And of course, you can check the implementation of Count in Reflector or dotPeek)

You will see that the start of iteration will be very fast once you've invoked Count once.

Now here is where it gets interesting:

  1. The total time it takes to execute the query is longer for invoking Count versus just iterating (~3000ms vs ~3200ms, about 5-10% in my tests).
  2. When I set the RowLimit to 1, I can reduce the time by roughly 40-50% (~1600ms vs ~3200ms for a resultset of 230 out of a list of 150,000 items).

Try it yourself by commenting and uncommenting the RowLimit line and commenting and uncommenting the line that invokes Count.

What does this mean for you?  Well, if you don't need the count, then don't use it.  It's slower than just iterating the results. Where you plan on iterating the results anyways, don't invoke Count. If you need the count, you are better off doing a counter yourself in the iteration.

And in a use case where you don't plan on iterating the result set (for example, checking to see if there is at least one occurrence of a type of object), be sure to set the RowLimit in your query!

26Jul/13Off

SQL Query for Multi-Values In An Encoded String

Posted by Charles Chen

Consider a table with an textual column that encodes multi-values like so:

1|US;2|CA;3|MX

How can we query for all rows using an OR criteria?

For a single value, it's quite easy by searching for the string (in this case, an ISO 2 code).  But what if we need to search for the occurrence of one of n strings?

The following query achieves this using the Microsoft SQL XML data type, the nodes() function, and CROSS APPLY:

-- Create data source; this is just for demo purposes
DECLARE @contacts TABLE
(
    id int,
    contact nvarchar(100),
    countries nvarchar(100) 
)

-- Insert sample test data.
INSERT INTO @contacts VALUES (1, 'Charles', '1|US;2|CA;3|MX') -- US, CA, MX
INSERT INTO @contacts VALUES (2, 'Steven', '1|US;3|MX;2|CA') -- US, MX, CA
INSERT INTO @contacts VALUES (3, 'Arturo', '3|MX') -- MX
INSERT INTO @contacts VALUES (4, 'Silvia', '4|FR') -- FR
INSERT INTO @contacts VALUES (5, 'John', '2|CA;1|US') -- CA, US
INSERT INTO @contacts VALUES (5, 'Johan', '5|DE') -- DE

-- Query for all contacts in US OR MX OR CA (Charles, Steven, Arturo, John)
SELECT
    DISTINCT T1.id,
    T1.contact
FROM (
    SELECT 
        id,
        contact,        
        CAST('<a><c>' + REPLACE(countries, ';','</c><c>') + '</c></a>' AS XML) AS countriesXml
    FROM @contacts  
    ) AS T1
CROSS APPLY T1.countriesXml.nodes('/a/c') T2(c)
WHERE CAST(T2.c.query('string(.)') AS varchar(max)) IN ('1|US', '3|MX', '2|CA')

This should yield the values Charles, Steven, Arturo, and John by first converting the delimited values into XML by simply using string replacement.  Next, the XML is "shredded" using nodes().  For each base row, the shredding generates one row per node (for example, for Charles, we would have one row for US, one row for CA, and one row for MX).

Here is the result of the inner sub-select:

1	Charles	<a><c>1|US</c><c>2|CA</c><c>3|MX</c></a>
2	Steven	<a><c>1|US</c><c>3|MX</c><c>2|CA</c></a>
3	Arturo	<a><c>3|MX</c></a>
4	Silvia	<a><c>4|FR</c></a>
5	John	<a><c>2|CA</c><c>1|US</c></a>
5	Johan	<a><c>5|DE</c></a>

And here is the resultset after shredding:

1	Charles	1|US
1	Charles	2|CA
1	Charles	3|MX
2	Steven	1|US
2	Steven	3|MX
2	Steven	2|CA
3	Arturo	3|MX
4	Silvia	4|FR
5	John	2|CA
5	John	1|US
5	Johan	5|DE

You can see the intermediate resultset using this query:

SELECT
	T1.id,
	T1.contact,
	T2.c.query('string(.)')
FROM (
	SELECT 
		id,
		contact,		
		CAST('<a><c>' + REPLACE(countries, ';','</c><c>') + '</c></a>' AS XML) AS countriesXml
	FROM @contacts 	
	) AS T1
CROSS APPLY T1.countriesXml.nodes('/a/c') T2(c)

Finally, the DISTINCT clause collapses the resultset once again.

From my own testing, better performance can be achieved by creating a table variable with the target values and using a JOIN instead of an IN (about 1/5 of the time). For 100k records, using IN takes about 13.430s. Using a JOIN to a table variable takes about 2.293s.

11Jul/13Off

Use ContentIterator to Read Items from Large SharePoint Lists

Posted by Charles Chen

New to me: ContentIterator for reading items from large lists.

3Jun/13Off

Understanding Billing for Amazon EBS Provisioned IOPs

Posted by Charles Chen

I've been experimenting with Amazon EC2 and SharePoint to better understand how an enterprise architecture would be built on top of it as a platform and, as a part of this exercise, better understand the cost structure.

One key number that escaped me during my initial review of the pricing structure is the one highlighted below:

provisioned-iops-month

Now before we get into the details of what to watch for here, I should preface and say that for use cases that demand high performance disk I/O, provisioned IOPS is probably well worth it and, in fact, I definitely foresee us using this in select scenarios (i.e. high I/O database server instance).

In the screen cap below, you can see that I've got three provisioned IOPS volumes configured with 2000, 2000, and 1000 IOPS respectively (note that maximum IOPS on a volume is a multiple of the size of the volume):

disks

The question is what does "$0.10 per provisioned IOPS-month" actually mean?

Well, here's the skinny from the Amazon web site:

For example, if you provision a volume with 1000 IOPS, and keep this volume for 15 days in a 30 day month, then in the Virginia Region, you would be charged $50 for the IOPS that you provision ($0.10 per provisioned IOPS-Month * 1000 IOPS Provisioned * 15 days/30).

So in my case, for a 30 day month, it would work out to be (2000 + 2000 + 1000) * 0.10 * 1 = $500/mo. for the provisioned IOPS volumes.

This is something to keep your eye on as if you are using EC2 instances with provisioned IOPS volumes sporadically, you may want to take snapshots or create an AMI and discard the volumes when not in use.  Comparatively speaking, snapshot data is much friendlier on the wallet when your system isn't under active use.

Tagged as: No Comments
4Dec/12Off

Thoughts on Successful Team Management

Posted by Charles Chen

Reflecting on the past year and a half, I've come to some conclusions on how development teams can be successful in delivering software.  I don't think any of them are major or relevatory, but I think each team and each project has a different take on the same ideas.  Here are mine:

Iterate, Quickly

During our active build phase, we release a new build every week and each build is functional software that incrementally incorporates requirements and defect fixes from previous builds.  This has given us the benefit of allowing our customer to preview the software and help us identify flaws or areas requiring additional clarity in the requirements continuously and early in the process.

Automate

It might sound insane, but it is possible to release weekly builds because our solution incorporates a heavy dose of automation where it counts on many levels.

  • We've removed raw SQL from the equation, relying purely on FluentNHibernate and NHibernate to automatically generate our schema
  • We've invested in building tools to automate the export and re-import of configuration and data, allowing us to easily and quickly reset our development environments entirely with standard test data (bonus: the same tool allows us to move configuration and data from environment to environment)
  • We've invested in idiot-proofing our installs so that they are boiled down to a few scripts
  • We've built automated build scripts that package everything up neatly and even FTPs a build to our integration and test environments
  • Our domain data model is 90% generated automatically from content schemas (SharePoint content types) which we have to create anyways.

Because of the automation, tasks which would otherwise be expensive are cheap to execute.

It also cuts down on mistakes and missed steps.

Meet Infrequently

Our team is 100% geographically dispersed with developers and team members in Vietnam, Mexico, Virginia, New Jersey and California.

But relatively speaking, we meet very infrequently.  Two development team meetings a week: one at the start of the week -- our "A" meeting -- and one towards the end of the week -- our "B" meeting.  We use the "A" meeting to discuss our deliverables for the week and the "B" meeting to discuss the outcome of our weekly sprint walkthroughs, any adjustments that need to be made, and so on.

We also use these sessions as show-and-tell to let everyone see the progress and changes being made by different team members as well as to inform of upcoming breaking changes and mitigating actions required downstream.

Otherwise, developers are encouraged to have long spans of uninterrupted work time instead of constantly being pulled into meetings.  One-on-one sessions and communications occur as necessary, but this recipe has been very successful in minimizing the amount of time the team spends huddled up in conference calls and gives everyone more time to solve problems.

Meet with a Purpose

Every meeting should have an agenda and an outcome (an action, decision, or an issue for followup).  Demand a bullet-listed agenda when someone sends you a meeting request and provide one if you need to schedule a meeting.  Ensure that the goal and outcome of the meeting is clear for all parties and schedule new meetings to resolve items not on the agenda or do not contribute to the outcome.

Additionally, create a record of every meeting.  Who attended?  What was covered?  What was not covered?  What was left open?  What are the action items?  Ensure that this record is easily accessible (wiki or forum system is perfect for recording these details) and email a copy to all participants and other relevant parties to ensure that everyone has the same understanding of what was just discussed.  This basic task can often clear up misunderstandings before they become systemic issues.  I take the burden on myself to record and followup with major bullet points from the meetings and it's saved my butt many times when following up with customers.

This is the simple art of not running a terrible meeting.

Lead by Example

A bit of career advice for those with a passion for software development: never remove yourself from the process of creation.

I have witnessed it as the career ladder moves individuals up and up, further from the pure act of creation that is software development.  For those of us who feel invigorated when we solve a difficult programming task, for those of us who feel a great rush of exhilaration when a machinery of thousands of lines of code executes in harmony, it is our burden to tinker, to learn, and to create.

When you "ascend" from developer to architect or team lead or such, never leave your passion for creation behind; authority flows naturally from understanding, knowledge, and mastery -- not just a title.

I was inspired to reflect on this by an interview with James Dyson in Wired:

Wired: Now that Dyson is a sprawling, multinational corporation, how do you keep the spirit of innovation alive?

Dyson: We try to make the corporation like the garage. We don’t have technicians; our engineers and scientists actually go and build their own prototypes and test the rigs themselves. And the reason we do that—and I don’t force people to do that, by the way, they want to do it—is that when you’re building the prototype, you start to really understand how it’s made and what it might do and where its weaknesses might be. If you merely hand a drawing to somebody and say, “Would you make this, please?” and in two weeks he comes back with it and you hand it to someone else who does the test, you’re not experiencing it. You’re not understanding it. You’re not feeling it. Our engineers and scientists love doing that.

As a team lead, never just be a middle man with the developers and requirements; be an active participant in the process.  Work on the hard problems.  Understand the creation process and understand the challenges of the team from the bottom up and build your authority from your ability to innovate and solve problems.

If you watch shows like Iron Chef or Chopped, every one of the judges and every one of the Iron Chefs can be considered the vanguard of their craft and it is from there that their authority flows.  You would not watch Iron Chef if all the Iron Chefs did was design the menu and then watch their team cook.  You would not trust the judges on Chopped if they weren't great chefs in their own right that understood the ingredients, the techniques, and the skill required to pull off a dish.

The better you understand the system, the better you understand your team, the more effective you will be in understanding the root cause of an issue and how to route defects within the team.

Push Your Team Incrementally

As a young developer, I always found great satisfaction in solving new problems and new challenges.  I think it's important that throughout the process, you push your team members and give them tasks that challenge their knowledge and abilities to push them just a little bit.

Of course, there will be plenty of mundane code and defect fixing, but don't box in your team members intellectually.  Understand their capabilities individually and push them to try things that are just beyond their current level of capability, understanding, and comfort zone.  This will keep them engaged and improve their skills to boot.

Invest in Code Quailty, Especially Early On

It's a lot easier to write good code earlier on in the process than it is to come in and try to refactor later on.  Additionally, code written early on tends to be reused more often and patterns and solutions are copied by other developers later on.  So early in the process, it is important to keep code quality in mind and correct bad behaviors since the most influential code will be written earlier in the process rather than later.

What this means is that detailed code reviews are far more important at the beginning than at any other time in the project.  If you can correct bad programming practices or show a developer a better, more modular way of writing a piece of functionality early on, she will carry that knowledge throughout the project.

We rarely do code reviews now (1 year in) as I focused heavily on working one-on-one with developers as they joined the team to ensure that the code was up to standards.  I frequently rejected code and asked developers to rewrite whole pieces of functionality to help them understand why one design was better than another.

Put Your Best Developers on Your Least "Visible" Code

What this boils down to is the framework level components.  Your best developers should be working on the deepest innards of the system that power what the rest of the developers do at the presentation and business logic layers.  This code will be the most invoked and the most reused so it is important that it is:

Do not waste your best developer's time with defect fixes (unless there is sufficient bandwidth), even if they can do it better than anyone else on the team because it will throw off the balance of the team (your more junior developers might not be able to fix a low level defect as quickly, but there are many design issues and higher priority defects that they cannot solve effectively yet).

Document, Document, Document

Early on in our process, I had to decide between a wiki system or a Word document for our documentation.  Because of the fast, iterative nature of the project, I decided to use a wiki/forum system as it was more flexible and -- in a sense -- more "visible".

While our formal documentation is trailing, it is easy to assemble it from our weekly sprint guides which document every new feature with screenshots, details, and examples.

But at any given time, our customer and our delivery partner can load up the wiki and see exactly when we delivered a feature, how to install and configure the feature, how to use the feature, and so on.  By putting it all out there in lock-step with the weekly delivery, it is easy to ensure that the entire team is aware of the status of the project and progress being made and allows test teams to progress iteratively so that by the end of the project, most features have been tested for weeks.

Mid ways through the project, we moved from "status" focused meetings to "demo" focused meetings where we would do a weekly writeup and walkthrough of what changed, what was added, and what was fixed.  It also allowed for open forums for test and business teams to ask questions and get clarifications.

This approach has allows the customer to see progress and the customer will never be surprised at the end of the project as they will have seen the progress and documentation update on a weekly basis.


So far, we have done well with these basic guiding principles.

I'm sure I will revise and add to my own lessons learned as the project continues, but I think that this is a good starting point!