Using Database Project and DbUp for database management

Introduction

In previous post I described two popular ways to manage database changes.

The first one was state versioning where you keep whole current design of your database in repository and when you need change something then you need only change this state. Later, when you want to deploy changes, your schema and target database is compared and the migration script is generated.

The second way is to versioning transitions to desired state, which means creating migration script for every change.

In this post I wanted to show implementation of these two approaches in .NET environment combined together – what I think is the best way to manage database changes.

Step one – Database Project

The first thing to do is create Database Project. This type of project is available only when you have SQL Server Data Tools installed. It can be installed together with Visual Studio 2017 or separately – see this page for more information.

When you have SQL Server Data Tools you can add new Database Project the standard way:

Now we can add database objects to our project in the form of SQL scripts. Each script should define one database object – table, view, procedure, function and so on. It is common to create root folders as schemes are named.

TIP: I do not recommend creating database objects in “dbo” schema. I advise to create good named schemes per module/purpose/functionality. Creating your own schemes also allow you to better manage your object namespaces.

The sample database project may look like this:

What is worth to notice is the Build Action setting of every script is set to Build. This is the setting after which Visual Studio recognizes database objects from ordinary scripts and build them together. If we for example remove script defining orders schema, VS will not be able to build our project:

This is great behavior because we have compile-time check and we can avoid more runtime errors.

When we finished database project, we can compare it to other project or database and create migration script. But as I described in previous post this is not optimal way to migrate databases. We will use DbUp library instead.

Step two – DbUp

DbUp is open source .NET library that provide you a way to deploy changes to database. Additionally, it tracks which SQL scripts have been run already, has many sql scripts providers available and other interesting features like scripts pre-processing.

You can ask a question why DbUp and not EF Migrations or Fluent Migrator? I have used all of them and I have to say that DbUp seems to me the most pure solution. I don’t like C# “wrapers” to generate SQL for me. DDL is easy language and I think we don’t need special tool for generating it.

DbUp is library so we can reference it to each application we want. What we need is simple console application which can be executed both on developer environment and CI build server. Firstly, we need reference DbUp NuGet package. Then we can add simple code to Main method:

This console application accepts two parameters: connection string to target database and file system path to scripts directory. It assumes following directory layout:
/PreDeployment
/Migrations
/PostDeployment

For “pre” and “post” deployment scripts we are defining NullJournal – in this way scripts will be run every time.

We should keep directory scripts in Database Project created earlier. DbUp executes scripts in alphabetical order. It can look like this:

Finally, we run migrations running our console application:

Executed scripts are listed in app.MigrationsJournal table:

And that’s all! We can develop and change our database in effective way now. 🙂

Summary

In this post I described how to implement both state and transitions versioning using Database Project na DbUp library. What has been achieved is:
– Compile-time checks (Database project)
– Ease of development (Both)
– History of definition of all objects (Database project)
– Quick access to schema definition (Database project)
– Ease of resolving conflicts (Database project)
– IDE support (Database project)
– Full control of defining transitions (DbUp)
– Pre and post deployment scripts execution (DbUp)
– Deployment automation (DbUp)
– The possibility of manual deployment (DbUp)
– History of applied transitions (DbUp).

Using this machinery the development of database should be definitely easier and less error-prone.

Database change management

Introduction

Database change management is not an easy task. It is even more difficult when we are at the beginning of a project where the data model is constantly changing. More people in the team is another difficulty because we have to coordinate our work and not interfere with each other. Of course, we can delegate database change management to one person, but this results in a bottleneck. In addition, we should take care of the deployment automation, because without automation we can not have truly continuous integration and delivery.

In this post I will describe:
– what we need to have good database change management mechanism
– two common approaches to solve this problem
– recommendation, which approach should be taken

The ideal solution

I think we should stick to the same rules as when managing code changes. Which means that we should implement the following practices:

Everything should be in the source control

We should follow what was the change, who did it and when. There is no better tool for this than source control. Every little change, every little script should be thrown into the repository.

Every developer should be able to run migrations

Every developer should have local database on his own environment and should be able to always upgrade his database after downloading the changes. Without it, developer will have new version of application and old version of database. This will cause unpredictable behavior. Shared development databases say no! What is also important, the deployment of changes should be very fast.

Easy conflict resolution

Conflicts on the same database objects should be easy to solve and should not require a lot of work.

Versioning

We always should know in what version our database is. In addition, we should know what migrations have already been deployed and which ones not.

Deployment automation

Our mechanism should provide us with the ability to automate the deployment of changes to tests and productions environments. We should use the same mechanism during development to make sure everything will be fine and will work as expected.

The possibility of manual deployment

Sometimes changes can only be deployed manually by human due to procedures or regulations in the company. For this reason, we should be able to generate a set of scripts for manual deployment.

The ability to run “pre” and “post” deployment scripts

It is common to execute some logic before or/and after deployment. Sometimes we need regenerate some data, sometimes we need to check something (constraints integrity for example). This type of feature is very useful.

IDE support

It would be perfect to have the full support of our IDE. Quick access to objects schema (without connection to database), prompts, compile-time checking – these are the things we need to be very productive during database development.

A lot of these requirements, let’s see what solutions we have.

Versioning the State

The first approach to database change management is state versioning. We hold whole database definition in our source control repository and when we need change something we change objects definitions. When the upgrade time comes, our tool compares our actual definitions with target database and generates migration script for us (and he can execute it right away). The process looks as follows:

As I described we only change objects definitions. For example we have orders table and in our repository we have:

When we need change something we just change definition:

I altered Name column and added Description column. That’s all – I changed state of my schema. The transition to this state is on the tool side.

Pros

– Compile-time checks (whole schema in one place)
– Ease of development
– History of definition of all objects
– Quick access to schema definition
– Ease of resolving conflicts
– IDE support

Cons

The biggest downside of this approach is that sometimes tool you are using can not be able to generate migration script based on differences. This is situation when you:
– try to add new not nullable column without default value.
– you have different order of columns in project vs target database. Then tool tries to change this order by creating temp tables, coping data and renaming – something that we do not always want because from a theoretical point of view order of columns doesn’t matter.
– try to rename object, column
– change type of column (how to convert?)

All of these problems (despite the many advantages) make it not a good approach to database change management.

Versioning transitions

Another approach is to versioning transitions instead of state. In this approach we create sequence of migration scripts which lead us to the desired database state.

For the example with orders table, instead of change definition we just add migration script as follows:

In this approach we should be aware of 2 things:
– order of executing scripts does matter
– is required to store which scripts were executed.

Pros

– Full control of defining transitions
– Executing scripts one by one in correct order guarantees success of deployment to other environments
– Possibility to implement undo/downgrade features

Cons

– Not so easy as change state definition
– Lack of objects history
– Lack of IDE support
– Hardly visible conflicts (2 developers add changes to the same table in 2 separate files)
– Need to keep order of scripts
– Need to keep which scripts were executed

So this is not perfect solution either, has some disadvantages but it is still better than versioning state because we have more control and possibilities. And now we are going to…

Best solution

The best solution is simple but not easy – versioning transitions AND versioning states. 🙂

If you look at the pros and cons of both solutions you have to admit that they are complementary. By using both approaches you can get rid of all cons leaving advantages.

Well, almost.. 🙁 Combining two approaches costs a little more time and work. You need always create migration script and change definition as well. In addition, you need to be careful about consistency and integrity with two models. Transitions should always lead to defined state. If you have mismatch, you do not know how final schema should look like.

Fortunately a little more work earlier saves you a lot of work later. I worked separately with both approaches in many projects earlier. Now when I combined them I truly feel certain about development and deployment of my databases. I wish I knew it before. 🙂

Summary

In this post I described:
– how ideal database change mechanism should look like
– state versioning approach
– transitions versioning approach
– the best solution that is a combination of both

In next post I will show how to setup database change management in .NET based on these assumptions.

How to store sensitive configuration data

Introduction

In this post I would like to discuss places where we can store configuration of our application and warn against keeping sensitive configuration data in the code repository. Additionally, I will show example of configuration based on User Secrets and Environment Variables in .NET Core.

Types of configuration data

There are plenty of configuration data types. I will try to name a few of them:
– store configurations (connection strings, providers, timeouts)
– external API configurations (urls, endpoints)
– caching
– hosting configuration (url, port, schema)
– file system paths (storing files, loading other resources)
– framework configuration
– libraries configuration
– business logic parameters
– …

Apart from the configuration type, we can add additional categorization:

Sensitive vs Insensitive

Sensitive data that must be protected from unauthorized access to safeguard the security of an our application or user’s data. Examples of sensitive data are private key for external API or username and password in database connection string. Example of insensitive configuration data could be database timeout length setting.

Local vs Global

Local configuration is dependent on environment. For example connection string to database is different for local development, test or production environments because they are using their own storage.

Global configuration is something independent from environment. For example it could be list of predefined file extensions which user can upload.

Required vs Optional

Configuration data for specific component could be required or optional. For database connection string server and database name are required parameters but for example max connection per pool is not.

Handling type

There are 3 types of handling configuration data:

1) Fetched once, used many time
2) Fetched every now and then, used many times
3) Fetched each time it is used

Configuration sources

How many configuration types exist there are so many sources of it. Diagram below presents possible variations:

Application configuration file

This is the most common option. Configuration data is included in application and can be changed by editing/replacing configuration file.

External configuration file

It is also configuration file but situated outside the application – on the same computer or somewhere on the network.

Environment variable

Configuration data is stored in operating system environment variables. This option has become popular by the trend on containerization.

Store

Sometimes there is a need to store configuration data in the database. The reason for this is externalization of configuration (when more than one process needs this data) or we need easy way to edit that configuration.

System / Service

When configuration is more complicated and dynamic it is good approach to get configuration from external system / service. An example of such a service is the Oculus which is service discovery mechanism in microservices environment.

Storing sensitive configuration data

If you would have to remember one thing from this entry, that would be it: Don’t store sensitive data in your code repositories. Never. Period.

Even if you have private repositories on Github or Bitbucket .
Even if you have private repositories hosted on your internal servers and whole infrastructure is hidden behind the NAT.
Even if you created only local repository which is not published to remote server.

Don’t do it.

How many times you thought in this way:

Hmm…Ok, I have to add credentials configuration for database/api/whatever now. For now I will simply add them to configuration file and when I get ready to deploy new feature to test/prod I will change it. Or I will do it before committing my changes. Yes, I will only check if this connection works..

…and the data suddenly appears in the repository. Because we forgot. Because we took care of something else. Because…

Too much “becauses”. The right solutions are simple and I will show two of them: storing configuration in external configuration file (User Secrets) and Environment variables.

User Secrets

User Secrets is nothing else than JSON configuration file stored in a system-protected user profile folder on the local machine in path %APPDATA%\Microsoft\UserSecrets\<user_secrets_id>\secrets.json. User secrets id is GUID generated by Secret Manager tool.

Process of adding User Secrets is pretty simple. Firstly, we need right click on the project and select Manage User Secrets option.

After this step Visual Studio will generate secrets.json file and add entry to .csproj file:

Then we can add needed configuration to generated secrets.json file, for example:

Last thing to do is reference Microsoft.Extensions.Configuration.UserSecrets provider configuration package and register it in application startup:

Environment variables

User Secrets are good in development scenarios. For other scenarios when we need deploy our application to other environments (test, stage, prod) we should use Environment variables instead. First step is to define variable for hosting operating system / container. For Windows it looks like:

setx SampleKey SampleValue /M

Second and last step is to configure provider referenced from Microsoft.Extensions.Configuration.EnvironmentVariables:

Extension method AddEnvironmentVariables() has overload with prefix parameter. It can be useful for variables filtering or when we have multiple environments on single host. This is example configuration:

How to implement GetEnvironment() method depends on your preferences. It can be determined based on project configuration or other factor like external file.

Summary

In this post I described the diversity of configuration data as well as their possible sources. In addition, I showed how to keep sensitive configuration data securely on the .NET Core runtime environment using User Secrets and Environment variables. The security of applications and our data is often overlooked in the initial phase of the project, which is a big mistake and can be tragic in consequences. It’s better to think about it from the very beginning.