Handling concurrency – Aggregate Pattern and EF Core

Introduction

In this post I would like to discuss a frequently overlooked, though in some circumstances very important topic – concurrency handling in context of protection of the so-called Domain Invariants.

In other words, the question is the following: how to ensure that even in a multi-threaded environment we are able to always guarantee the immediate consistency for our business rules?

It’s best to describe this problem with an example…

Problem

Let’s assume that our domain has the concept of Order and Order Line. Domain expert said that:

One Order can have a maximum of 5 Order Lines.

In addition, he said that this rule must be met at all times (without exception). We can model it in the following way:

Conceptual Model
Conceptual Model

The Aggregate

One of the main goals of our system is to enforce business rules. One way to do this is to use the Domain-Driven Design tactical building block – an Aggregate. The Aggregate is a concept created to enforce business rules (invariants). Its implementation may vary depending on the paradigm we use, but In object-oriented programming, it is an object-oriented graph as Martin Fowler describes it:

A DDD aggregate is a cluster of domain objects that can be treated as a single unit.

And Eric Evans in DDD reference describes:

Use the same aggregate boundaries to govern transactions and distribution. Within an aggregate boundary, apply consistency rules synchronously

Back to the example. How can we ensure that our Order will never exceed 5 Order Lines? We must have an object that will take care of it. To do this, it must have information about the current number of Order Lines. So it must have a state and based on this state its responsibility is to decide whether invariant is broken or not.

In this case, the Order seems to be the perfect object for this. It will become root of our Aggregate, which will have Order Lines. It will be his responsibility to add the order line and check that the invariant has not been broken.

Order Aggregate
Order Aggregate

Let’s see how the implementation of such a construct may look like:

Everything’s fine right now, but this is only a static view of our model. Let’s see what the typical system flow is when invariant is not broken and when is broken:

Process of adding Order Line - one thread
Process of adding Order Line – success
Process of adding Order Line - rule broken
Process of adding Order Line – rule broken

Simplified implementation bellow:

Concurrency problem

Everything works nice and clean if we operate in a not heavy loaded environment. However, let’s see what can happen when 2 threads almost at the same time want to perform our operation:

Process of adding Order Line - business rule broken
Process of adding Order Line – business rule broken

As you can see in the diagram above, 2 threads load exactly the same aggregate at the same time. Let’s assume that the Order has 4 Order Lines. The aggregate with 4 Order Lines will be loaded in both the first and second threads. The exception will not be thrown, because of 4 < 5. Finally, depending on how we persist the aggregate, the following scenarios are possible:

a) If we have a relational database and a separate table for Order Lines then 2 Order Lines will be added giving a total of 6 Order Lines – the business rule is broken.

b) If we store aggregate in one atomic place (for example in document database as a JSON object), the second thread will override the first operation and we (and the User) won’t even know about this.

The reason for this behavior is that the second thread read data from the database (point 2.1) before the first one committed it (point 1.4.3).

Let’s see how we can solve this problem.

Solution

Pessimistic concurrency

The first way to ensure that our business rule will not be broken is to use Pessimistic concurrency. In that approach, we allow only one thread to process a given Aggregate. This leads to the fact that the processing thread must block the reading of other threads by creating a lock. Only when the lock is released, the next thread can get the object and process it.

Pessimistic concurrency
Pessimistic concurrency

The main difference from the previous approach is that the second thread waits for the previous one to finish (time between points 2.1 and 1.3). This approach causes us to lose performance because we can process transactions only one after the other. Moreover, it can lead to deadlocks.

How can we implement this behavior using EntityFramework Core and SQL server?

Unfortunately, EF Core does not support Pessimistic Concurrency. However, we can do it easily by ourselves using raw SQL and the query hint mechanism of the SQL Server engine.

Firsty, the database transaction must be set. Then the lock must be set. This can be done in two ways – either read data with query hint (XLOCK, PAGELOCK) or by updating the record at the beginning:

In this way, the transaction on the first thread will receive a Exclusive Lock (on write) and until it releases it (through the transaction commit) no other thread will be able to read this record. Of course, assuming that our queries operate in at least read committed transaction isolation level to avoid the so-called dirty reads.

Optimistic concurrency

An alternative, and most often preferred solution is to use Optimistic Concurrency. In this case, the whole process takes place without locking the data. Instead, the data in the database is versioned and during the update, it is checked – whether there was a change of version in the meantime.

Optimistic Concurrency
Optimistic Concurrency

What does the implementation of this solution look like? Most ORMs support Optimistic Concurrency out of the box. It is enough to indicate which fields should be checked when writing to the database and it will be added to the WHERE statement. If it turned out that our statement has updated 0 records, it means that the version of the record has changed and we need to do a rollback. Though often the current fields are not used to check the version and special column called “Version” or “Timestamp” is added.

Back to our example, does adding a column with a version and incrementing it every time the Order entity is changed solve the problem? Well no. The aggregate must be treated as a whole, as a boundary of transaction and consistency.

Therefore, the version incrementation must take place when we change anything in our aggregate. If we have added the Order Line and we keep it in a separate table, ORM support for optimistic concurrency will not help us because it works on updates and here there are inserts and deletes left.

How can we know that the state of our Aggregate has changed? If you follow this blog well, you already know the answer – by Domain Events. If a Domain Event was thrown from the Aggregate, it means that the state has changed and we need to increase Aggregate version.

The implementation can look like this.

First, we add a _version field to each Aggregate Root and the method to increment that version.

Secondly, we add a mapping for the version attribute and indicate that it is a EF Concurrency Token:

The last thing to do is incrementing the version if any Domain Events have been published:

With this setup, all updates will execute this statement:

For our example, for second thread no record will be updated ( @@ ROWOCOUNT = 0 ), so EntityFramework will throw the following message:

Microsoft.EntityFrameworkCore.DbUpdateConcurrencyException: Database operation expected to affect 1 row(s) but actually affected 0 row(s). Data may have been modified or deleted since entities were loaded. See http://go.microsoft.com/fwlink/?LinkId=527962 for information on understanding and handling optimistic concurrency exceptions.

and our Aggregate will be consistent – the 6th Order Line will not be added. The business rule is not broken, mission accomplished.

Summary

In summary, the most important issues here are:

  • The Aggregate’s main task is to protect invariants (business rules, the boundary of immediate consistency)
  • In a multi-threaded environment, when multiple threads are running simultaneously on the same Aggregate, a business rule may be broken
  • A way to solve concurrency conflicts is to use Pessimistic or Optimistic concurrency techniques
  • Pessimistic Concurrency involves the use of a database transaction and a locking mechanism. In this way, requests are processed one after the other, so basically concurrency is lost and it can lead to deadlocks.
  • Optimistic Concurrency technique is based on versioning database records and checking whether the previously loaded version has not been changed by another thread.
  • Entity Framework Core supports Optimistic Concurrency. Pessimistic Concurrency is not supported
  • The Aggregate must always be treated and versioned as a single unit
  • Domain events are an indicator, that state was changed so Aggregate version should be changed as well
  • GitHub sample repository

    Especially for the needs of this article, I created a repository that shows the implementation of 3 scenarios:

    – without concurrency handling
    – with Pessimistic Concurrency handling
    – with Optimistic Concurrency handling

    Link: https://github.com/kgrzybek/efcore-concurrency-handling

    Related Posts

    Handling Domain Events: Missing Part
    Domain Model Encapsulation and PI with Entity Framework 2.2
    Attributes of Clean Domain Model

Using Database Project and DbUp for database management

Introduction

In previous post I described two popular ways to manage database changes.

The first one was state versioning where you keep whole current design of your database in repository and when you need change something then you need only change this state. Later, when you want to deploy changes, your schema and target database is compared and the migration script is generated.

The second way is to versioning transitions to desired state, which means creating migration script for every change.

In this post I wanted to show implementation of these two approaches in .NET environment combined together – what I think is the best way to manage database changes.

Step one – Database Project

The first thing to do is create Database Project. This type of project is available only when you have SQL Server Data Tools installed. It can be installed together with Visual Studio 2017 or separately – see this page for more information.

When you have SQL Server Data Tools you can add new Database Project the standard way:

Now we can add database objects to our project in the form of SQL scripts. Each script should define one database object – table, view, procedure, function and so on. It is common to create root folders as schemes are named.

TIP: I do not recommend creating database objects in “dbo” schema. I advise to create good named schemes per module/purpose/functionality. Creating your own schemes also allow you to better manage your object namespaces.

The sample database project may look like this:

What is worth to notice is the Build Action setting of every script is set to Build. This is the setting after which Visual Studio recognizes database objects from ordinary scripts and build them together. If we for example remove script defining orders schema, VS will not be able to build our project:

This is great behavior because we have compile-time check and we can avoid more runtime errors.

When we finished database project, we can compare it to other project or database and create migration script. But as I described in previous post this is not optimal way to migrate databases. We will use DbUp library instead.

Step two – DbUp

DbUp is open source .NET library that provide you a way to deploy changes to database. Additionally, it tracks which SQL scripts have been run already, has many sql scripts providers available and other interesting features like scripts pre-processing.

You can ask a question why DbUp and not EF Migrations or Fluent Migrator? I have used all of them and I have to say that DbUp seems to me the most pure solution. I don’t like C# “wrapers” to generate SQL for me. DDL is easy language and I think we don’t need special tool for generating it.

DbUp is library so we can reference it to each application we want. What we need is simple console application which can be executed both on developer environment and CI build server. Firstly, we need reference DbUp NuGet package. Then we can add simple code to Main method:

This console application accepts two parameters: connection string to target database and file system path to scripts directory. It assumes following directory layout:
/PreDeployment
/Migrations
/PostDeployment

For “pre” and “post” deployment scripts we are defining NullJournal – in this way scripts will be run every time.

We should keep directory scripts in Database Project created earlier. DbUp executes scripts in alphabetical order. It can look like this:

Finally, we run migrations running our console application:

Executed scripts are listed in app.MigrationsJournal table:

And that’s all! We can develop and change our database in effective way now. πŸ™‚

Summary

In this post I described how to implement both state and transitions versioning using Database Project na DbUp library. What has been achieved is:
– Compile-time checks (Database project)
– Ease of development (Both)
– History of definition of all objects (Database project)
– Quick access to schema definition (Database project)
– Ease of resolving conflicts (Database project)
– IDE support (Database project)
– Full control of defining transitions (DbUp)
– Pre and post deployment scripts execution (DbUp)
– Deployment automation (DbUp)
– The possibility of manual deployment (DbUp)
– History of applied transitions (DbUp).

Using this machinery the development of database should be definitely easier and less error-prone.

Database change management

Introduction

Database change management is not an easy task. It is even more difficult when we are at the beginning of a project where the data model is constantly changing. More people in the team is another difficulty because we have to coordinate our work and not interfere with each other. Of course, we can delegate database change management to one person, but this results in a bottleneck. In addition, we should take care of the deployment automation, because without automation we can not have truly continuous integration and delivery.

In this post I will describe:
– what we need to have good database change management mechanism
– two common approaches to solve this problem
– recommendation, which approach should be taken

The ideal solution

I think we should stick to the same rules as when managing code changes. Which means that we should implement the following practices:

Everything should be in the source control

We should follow what was the change, who did it and when. There is no better tool for this than source control. Every little change, every little script should be thrown into the repository.

Every developer should be able to run migrations

Every developer should have local database on his own environment and should be able to always upgrade his database after downloading the changes. Without it, developer will have new version of application and old version of database. This will cause unpredictable behavior. Shared development databases say no! What is also important, the deployment of changes should be very fast.

Easy conflict resolution

Conflicts on the same database objects should be easy to solve and should not require a lot of work.

Versioning

We always should know in what version our database is. In addition, we should know what migrations have already been deployed and which ones not.

Deployment automation

Our mechanism should provide us with the ability to automate the deployment of changes to tests and productions environments. We should use the same mechanism during development to make sure everything will be fine and will work as expected.

The possibility of manual deployment

Sometimes changes can only be deployed manually by human due to procedures or regulations in the company. For this reason, we should be able to generate a set of scripts for manual deployment.

The ability to run “pre” and “post” deployment scripts

It is common to execute some logic before or/and after deployment. Sometimes we need regenerate some data, sometimes we need to check something (constraints integrity for example). This type of feature is very useful.

IDE support

It would be perfect to have the full support of our IDE. Quick access to objects schema (without connection to database), prompts, compile-time checking – these are the things we need to be very productive during database development.

A lot of these requirements, let’s see what solutions we have.

Versioning the State

The first approach to database change management is state versioning. We hold whole database definition in our source control repository and when we need change something we change objects definitions. When the upgrade time comes, our tool compares our actual definitions with target database and generates migration script for us (and he can execute it right away). The process looks as follows:

As I described we only change objects definitions. For example we have orders table and in our repository we have:

When we need change something we just change definition:

I altered Name column and added Description column. That’s all – I changed state of my schema. The transition to this state is on the tool side.

Pros

– Compile-time checks (whole schema in one place)
– Ease of development
– History of definition of all objects
– Quick access to schema definition
– Ease of resolving conflicts
– IDE support

Cons

The biggest downside of this approach is that sometimes tool you are using can not be able to generate migration script based on differences. This is situation when you:
– try to add new not nullable column without default value.
– you have different order of columns in project vs target database. Then tool tries to change this order by creating temp tables, coping data and renaming – something that we do not always want because from a theoretical point of view order of columns doesn’t matter.
– try to rename object, column
– change type of column (how to convert?)

All of these problems (despite the many advantages) make it not a good approach to database change management.

Versioning transitions

Another approach is to versioning transitions instead of state. In this approach we create sequence of migration scripts which lead us to the desired database state.

For the example with orders table, instead of change definition we just add migration script as follows:

In this approach we should be aware of 2 things:
– order of executing scripts does matter
– is required to store which scripts were executed.

Pros

– Full control of defining transitions
– Executing scripts one by one in correct order guarantees success of deployment to other environments
– Possibility to implement undo/downgrade features

Cons

– Not so easy as change state definition
– Lack of objects history
– Lack of IDE support
– Hardly visible conflicts (2 developers add changes to the same table in 2 separate files)
– Need to keep order of scripts
– Need to keep which scripts were executed

So this is not perfect solution either, has some disadvantages but it is still better than versioning state because we have more control and possibilities. And now we are going to…

Best solution

The best solution is simple but not easy – versioning transitions AND versioning states. πŸ™‚

If you look at the pros and cons of both solutions you have to admit that they are complementary. By using both approaches you can get rid of all cons leaving advantages.

Well, almost.. πŸ™ Combining two approaches costs a little more time and work. You need always create migration script and change definition as well. In addition, you need to be careful about consistency and integrity with two models. Transitions should always lead to defined state. If you have mismatch, you do not know how final schema should look like.

Fortunately a little more work earlier saves you a lot of work later. I worked separately with both approaches in many projects earlier. Now when I combined them I truly feel certain about development and deployment of my databases. I wish I knew it before. πŸ™‚

Summary

In this post I described:
– how ideal database change mechanism should look like
– state versioning approach
– transitions versioning approach
– the best solution that is a combination of both

In next post I will show how to setup database change management in .NET based on these assumptions.