Introduction
Code that looks harmless can often lead to negative consequences. Let’s dive into how a simple data copy with Entity Framework Core can create a performance bottleneck, why a stored procedure is sometimes a better solution, and what it has to do with team conventions.
I once had a curious discussion with a friend of mine (let’s call him Ben) about database operations. He challenged me to pinpoint the potential issues in the following code snippet:
public class GraphReplicator
{
public void CreateCopy(Guid sourceId, Guid targetId)
{
IEnumerable<Node> sourceNodes = dbContext.Nodes
.Where(node => node.StructureId == sourceId)
.AsEnumerable();
foreach(var node in sourceNodes)
{
Node newNode = CreateBasedOn(node, targetId);
dbContext.Nodes.Add(newNode);
}
dbContext.SaveChanges();
}
}
Listing 1
I deliberately kept the code synchronous for readability.
These few lines of code look simple and quite harmless. Is there a serious problem here? I don’t know how, by whom, and under what circumstances the CreateCopy
method is called, how often it is called, how many records are in the database, where the database is located relative to the application, whether it’s on a remote machine, how fast the network is, and so on. In short, challenge accepted! As Schrödinger’s cat would say, the problem both exists and doesn’t exist until we dig deeper.
Digging deeper
What does the CreateCopy
method do?
Using Entity Framework Core, the method gets some number of nodes from the database, creates copies of each, and saves them back to the database.
“gets” → “saves them back”.
The first thing that bothered me: two database queries. One to fetch data, another to send the copied data. It’s no secret that the most time-consuming operations are network communication, including such queries.
Rule of thumb: The fewer network requests (of any kind), the better.
Impact: at least a 2x slowdown.
“some number of”
Network communication is expensive, but things get worse when the volume of data increases. Since we don’t know how many entities are being copied, this could be a serious issue. What if sourceNodes
contains 1,000 nodes? Is it reasonable to transfer 1,000 nodes from the database over the network? What about 10,000? And remember—we then send their copies back to the database.
Rule of thumb: The smaller the volume of data transferred over the network, the better.
Impact: at least a 2x slowdown. (actually, for this case, we don’t need to transfer any data at all)
“creates copies”
Another issue: in-memory materialization. Fetching 1,000 nodes creates 1,000 objects in memory. Copying them creates another 1,000. This increases pressure on the garbage collector, and we all remember that those objects need to be cleared up from memory, right?
Rule of thumb: The fewer objects allocated on the heap, the less work the garbage collector has to do. (GC docs)
Impact: deterioration of overall system performance.
Look around
What about the calling code? I managed to uncover some important details. Turns out the CreateCopy
method is just one of three similar steps in a broader copy process (let’s call it CopyStructure
). Besides Graph
, there are two related entities: Permission
and Attribute
, each with its own replicator. The calling code invokes each replicator’s CreateCopy
method sequentially:
void CopyStructure(Guid sourceId, Guid targetId)
{
_graphReplicator.CreateCopy(sourceId, targetId);
_permissionReplicator.CreateCopy(sourceId, targetId);
_attributeReplicator.CreateCopy(sourceId, targetId);
}
Listing 2
They all follow the same algorithm:
- Fetch data from the database
- Create copies
- Save copies to the database
Diagram 1
Thus, the actions shown in Diagram 1 will be executed 3 times, which means 6 database queries!
Why this matters
You might think that transferring 10/100/1,000 (remember that I made this number up) objects, allocating 20/200/2,000 objects for GC, and making 6 database queries with delays of tens or hundreds of milliseconds is negligible compared to modern computing power. A single-user console app with a local database might not suffer much. But a multi-user web app with external integrations, background workers, and hundreds or thousands of users? All the issues above (and there may be more, I’ll share one more below) could signal systemic problems.
What I’m saying is: one simple method with a potential issue quickly turns into three methods. How long until we find more? What’s the chance there are dozens of similar spots? That this code is called thousands of times across the system? We build systems based on existing code, sometimes copying patterns from other parts or even other developers. And yes, sometimes we just “copy-paste.” We also have conventions for solving “typical” tasks. That’s why extrapolation here isn’t far-fetched.
Moreover, we often evaluate code in isolation, forgetting about, for example, multiuser/multithreading contexts. From that perspective, nothing prevents the CopyGraph
method from competing for resources (connections, network, database) even with itself. I don’t know the business logic behind this code, but I can imagine two, ten, or a hundred users triggering this copy process simultaneously.
In my opinion, evaluation must be done in the context of the entire system. Two queries here could mean thousands system-wide. “Extra” data transfers could become gigabytes per month. An extra 100 ms delay here could add up to seconds or even tens of seconds of user wait time. (don’t be surprised, I’ve seen this many times). But hey, you can always add caching ;) Resources captured by “bad” code are resources that won’t be available to someone else, for someone who might be doing something important.
Any code exists in the context of its application and is connected to dozens or hundreds of classes, methods, conditions, and system states. That’s why programming is hard.
The forgotten transaction
Question: Shouldn’t the code in listing 2 perform all three steps inside a single transaction? That would be logical. If _permissionReplicator
fails, shouldn’t we roll back the the previous step? If yes, things get even worse.
void CopyStructure(Guid sourceId, Guid targetId)
{
using(TransactionScope scope = new())
{
try
{
_graphReplicator.CreateCopy(sourceId, targetId);
_permissionReplicator.CreateCopy(sourceId, targetId);
_attributeReplicator.CreateCopy(sourceId, targetId);
scope.Complete();
}
catch
{
// handle
}
}
}
Listing 3
(Using Transactions in Entity Framework Core docs)
There is a great rule of thumb when working with a database: “A transaction should be as short-lived as possible”. Long transactions mean:
- Longer lock durations
- Higher chance of deadlocks
- Lower overall system performance
Waiting for network communication and data transfer between the application and database, in our example, significantly increases transaction duration.
Diagram 2 shows the resulting process. I’ve slightly reduced the number of arrows so the diagram wouldn’t look cluttered, but if you look closely, you’ll count 7 database queries (Copy Nodes: 2 + Copy Permissions: 2 + Copy Attributes: 2 + Commit).
Diagram 2
It’s hard to say definitively how bad it is.
Some systems are not very demanding in terms of performance. Perhaps this method is only intended to run once a week, with everyone fine waiting minutes and hours for it to finish.
On the other hand, this could be a critical flow in a high-traffic e-commerce system where the database handles dozens or hundreds of queries per second. The method might end up being called very frequently. In many scenarios, that would be completely unacceptable, especially in a multi-threaded, multi-user environment.
Solution dilemma
And here’s where Ben trapped me (remember, he started this). I hate stored procedures. I can’t stand them at all. But the most fitting solution to all the problems described above would be to move the logic into a stored procedure. Here, the database is both the source and the destination for the data, and that data isn’t being transformed or enriched along the way. Why not just “ask” the database to perform the copy for us?
Fun fact
Have you ever heard of correspondence chess (check Wikipedia)? Two hundred years ago, people literally mailed each other paper letters describing their moves. For example, “Kg1f3” means the Knight moved from g1 to f3. That’s all the info the other player needed to make the same move on their own board.
Imagine how ridiculous it would be if they actually mailed the entire chessboard with all the pieces every time someone made a move. A little impractical, don’t you think?
Of course, there are several reasons for my “dislike” of stored procedures: they are difficult to debug, reduce portability, make the system harder to test, SQL is not as expressive as “normal” languages. Business logic gets scattered between application code and stored procedure code, which is inconvenient and sometimes unsafe during development. Versioning and change control can also become problematic.
But despite all that, there are plenty of scenarios where stored procedures are a much better solution than the alternatives. For the example we’re discussing here, and for the conditions we’ve come up with in the process, a stored procedure would actually be an excellent choice!
Diagram 3
An example of stored procedure code:
CREATE PROCEDURE CopyStructure
@SourceStructureId INT,
@TargetStructureId INT
AS
BEGIN
BEGIN TRAN
INSERT INTO Nodes (StructureId, SomeId1, SomeId2, SomeBool1, SomeString, Created1)
SELECT @TargetStructureId, SomeId1, SomeId2, SomeBool1, SomeString, Created1
FROM Nodes
WHERE StructureId = @SourceStructureId
INSERT INTO Permissions (StructureId, SomeId1, SomeId2, SomeBool1, SomeString, Created1)
SELECT @TargetStructureId, SomeId1, SomeId2, SomeBool1, SomeString, Created1
FROM Permissions
WHERE StructureId = @SourceStructureId
INSERT INTO Attributes (StructureId, SomeId1, SomeId2, SomeBool1, SomeString, Created1)
SELECT @TargetStructureId, SomeId1, SomeId2, SomeBool1, SomeString, Created1
FROM Attributes
WHERE StructureId = @SourceStructureId
COMMIT
END;
Listing 4
And the code to call it:
public void CallSpCopyStructure(Guid sourceId, Guid targetId)
{
_dbContext.Database.ExecuteSql($"EXEC CopyStructure {sourceId}, {targetId}");
}
Listing 5
What we gained:
- 1 database query instead of 7
- No data transfer between application and database (just a tiny SQL instruction and response).
- Significantly reduced transaction time, no unnecessary waiting, only useful actions inside the transaction.
- Reduced pressure on the garbage collector.
To be fair, this adds complexity to the system, but not much.
And that’s all. Yep, it’s really that simple!
Ben’s response to why they don’t do this in their project was expected: “We don’t use stored procedures. The DB should only store data. We use only Entity Framework Core and no raw SQL.” That’s their convention, which is understandable. Maybe their reasons overlap with mine for disliking stored procedures.
Conventions are a good thing. They define how a project evolves and help discipline developers. I sincerely believe that nothing is worse than having no conventions at all. However, we shouldn’t forget that the conditions under which these conventions were adopted can change.
For example, as your product grows, there may be new performance and resource consumption requirements. The number of users could skyrocket, or maybe at some point you realize that maintaining your servers is too expensive and you decide to focus on optimization to save costs. In business, time is often a critical factor, you need to release your new product before your competitors. It is logical to choose the technologies and approaches that yield fastest results, but that doesn’t mean you should stay locked into that paradigm after release and success. It can be worse if the conditions did not change, but were incorrectly assessed initially and at the very beginning you simply made a mistake by choosing a tool that was not optimal, not the most appropriate.
In any case, I’m not here to pass judgment. I just want to say: it is important to remain flexible, explore alternative solutions, and not to be afraid of changing the rules if they start harming your project, even if you don’t like the new rules. (You might want to read about the Semmelweis effect on Wikipedia).
In general, it’s better to think less in terms of “like” and “dislike,” and more in terms of how useful a given tool or solution is for your project and whether it’s the best fit for the task.
Alternative solutions?
It’s important to clarify: there are almost no situations where a single solution is the only valid one. The one I have proposed is no exception. For example, you don’t necessarily need a stored procedure, it is not the procedure that solves the problem, but it is the fact that we delegate the work to the database using SQL. We could just as well call _dbContext.Database.ExecuteSql
and pass the same SQL code that I placed inside the stored procedure. The effect would be exactly the same.
Fun fact
Just for fun, I fed a draft of this article to several LLMs and asked them to critique it. Each rightly noted the lack of alternative approaches, such as:
- SqlBulkCopy
- Batch operations via raw SQL
- Dapper
- Temporary tables
- CQRS or event-driven patterns
Notably, the last two (especially the event-driven one) are like a tooth growing in your nose, completely irrelevant to the problem described. The first three should be indeed considered in similar scenarios, but for this particular case, they are not applicable, as they don’t address the root problem.
So, it’s an expected but ultimately useless answer, thank you AI.
“Being able to search for and evaluate all possible alternatives” is an incredibly valuable skill.
Conclusion
This entire article is based on guesses and assumptions, just like much of our work often is.
My goal was to demonstrate the need to “dig deeper”: to question the code, the tools we use, and the conventions within which development is conducted. Dogmatic adherence to rules can lead to negative consequences. Technical decisions should be made based on context, not principles or preferences. Conventions must remain flexible and adapt to changing conditions.
I work on systems where the issues I’ve pointed out are extremely important, because we have to take these factors into account; otherwise, the system may simply stop functioning. Your situation may be, and probably is, different. You work under unique conditions on a unique project. The success of the project is determined by your ability to find the optimal balance between different solutions.
Only you can decide whether the next optimization is worth the complexity it adds to the system. At the same time, it is important to consider whether neglecting that optimization in favor of certain conventions might end up ruining your system.