Dynamic SQL: Yea or Nay?
I’ve always been on the side of stored procedures in the classic debate over the merits of dynamic SQL. In reality, I can only think of one good scenario where dynamic SQL at the application layer should be used: programmatic batch inserts.
I won’t go into the performance debate, since there are tons of articles that already cover this area, but rather, I’d like to discuss the usability and development and architectural aspect of it.
In almost all other cases, it seems like the best choice is to have the application not generate dynamic SQL and use a stored procedure…always. There are certainly times when dynamic SQL is necessary, for example, when generating selects against a dynamic table structure, but in those cases, the variable portions of the query can be parameterized into the stored procedure and the procedure should generate the dynamic SQL.
Some would argue that if the underlying data models change, the application layer will usually be forced to change are ignoring other aspects of model changes that don’t necessitate application model changes. These include performance tuning, filtering by table JOINs and reuse of the data logic in nested stored procedures or functions.
When working with compiled code like .NET, the core issue is that fixing query errors involves a recompile and redeploy, which in most cases, is much more difficult than just fixing a completely disconnected (but not completely decoupled since there is a quasi-interface (the return result type and structure)) stored procedure.
For example, if a dataset today contains data from table A and tomorrow it needs to include data from table A and B (let’s say they both contain the same elements, but one is used for archives), it would be easy to update the procedure to UNION the results from the two datasets without affecting the application layer.
This isn’t the only scenario, for example, let’s say the requirement changes and now the data needs to be filtered by another table. It would be easy to add a new INNER JOIN to the query without affecting the application layer. Not only that, it also allows for the recombination of fields (for example a user name field today only needs to show first and last name, but tomorrow, it may need to show the middle initial as well – this change can be done at the database level and not affect the application or UI layers). It can also make it easy to change the underlying table structure so long as the return data isn’t expected to change: it provides a layer of decoupling between the application layer and the raw data storage.
In addition, having a stored procedure allows for easier testing of the data layer without the added overhead of having to execute the application runtime and walk through the debugger line by line just to figure out if the return data is correct; it is much more efficient to simply execute the query and simulate the use case to find if the data that is returned is correct. It becomes much easier and much less painful to simulate data access tests since they can be run, observed, and analyzed nearly instantly.
In larger organizations with dedicated DBAs, stored procedures have the added benefit of allowing SQL experts to add performance tuning to eek out extra performance without requiring the application to be rewritten or recompiled. Again, we see this decoupling of the application layer from the data layer. Of course you could always have templated SQL stored in XML files or something that would get rid of that recompile, but it is still likely to necessitate more redeployment if the application in question is distributed. This key point is not to be taken lightly since — as an example — an error in string formatting may require the replacement of binaries and services across dozens of servers. Not only that, testing in such a scenario still requires interaction with the application layer, adding to the possible failure points, time required, and general development pain.
My own conclusion is that using dynamic SQL (including LINQ) creates too tight of a coupling between the application layer binaries and the underlying data store; it’s great for RAD and testing, but in any application of significance (especially in highly distributed environments), dynamic SQL at the application layer seems like it’s a maintenance and testing disaster waiting to happen.
Nice post, Chuck! I am in no particular side and see pros and cons of both approaches. You have articulated excellently examples of situations when stored proc can be useful, but on the other hand, these things should be considered as well:
– Stored procs reduce reusability of data logic if it is to be shared among different database systems. I used to work on a project in which we had to maintain the data logic separately for SQL Server and Oracle just because stored procs were used. If we were to do differently, dynamic SQL (or a framework like Hibernate) might be a better idea.
– Regarding testing, from the unit testing perspective, I think it is easier to compile the tests and run the whole thing if those tests are in the binaries instead of in form of stored procedures. This is even more beneficial with the integration between unit test tools like JUnit/NUnit with CI build system and code coverage tools.
– Unless you put a lot of processing logic into the stored proc, then usually when the data logic change, the application layer will be affected in one way or another. In your scenario of the name field, I think a good design would not rely on the application logic, instead of the stored procedure, to determine the display form of the data. In other words, first name and last name fields are to be retrieved separately and stored in some objects or dataset – in that case, if middle name is introduced, the code has to be changed anyway. On the other hand, if such logic is to be put into the stored procs, the application logic is scattered among different layers and becomes difficult to test (#2), and much less portable (see #1). (And maybe the domain-driven design mindset makes me biased on this, but when I code stored procs, I don’t use those syntax that I consider more application than data logic like if/else, concatenation and so on.)
– Finally, it is easier to stay in one environment to do the development work instead of switching back and forth between various environments (or if there’s an DBA, having to come to him/her to request him/her to change even the most simplest things).
Of course, the point here is not to use anything over another, but about using the right tool for the right situation. For example if there is no constraint of multiple databases, the team is very comfortable with stored procs, you have excellent DBAs etc. then using stored procs can be a good idea.