Over the past year, I've had to migrate multiple legacy applications to a new application. These new applications had their own database structure, so we needed to migrate the data from the legacy application to the new application.
With each migration, the quality of the code improved because of the experiences from the past migrations, and my overall knowledge.
Not only did the code quality improve, but there was also a very noticeable speed difference (in a good way) although the amount of data increased.
In this post, we'll incrementally refactor slow inserts into blazing fast inserts.
If you're only interested in the fastest technique, please navigate to SQL Bulk Copy.
We do this by inserting a collection of N customers (that have an id, a name, and some contact information) into a SQL table.
customers.sql
recordCustomer(
GuidId,
stringFirstName,
stringLastName,
stringStreet,
stringCity,
stringState,
stringPhoneNumber,
stringEmailAddress)
In this example, the Customers table has the same structure as the C# object.
Okay, that's already better but this has a drawback.
We have to escape the SQL command text manually, which doesn't look pretty.
And spoiler... we make it faster.
Don't forget to use the AddRange method, instead of iterating through the whole collection and adding the records one by one with the Add method. The AddRange method is significantly faster because it disables change detection automatically.
using(varcontext=newCustomersContext())
{
context.Customers.AddRange(customers);
context.SaveChanges();
}
When we take a look at the generated SQL commands (by using SQL Extended Events) we notice that Entity Framework generates multiple SQL insert statements. Each statement inserts multiple customers at once, which seems to be the cause of our next speed gain.
We can notice that using Entity Framework, speeds up the inserts again.
I think this is because it inserts multiple records at once instead of record per record.
This can be verified by tweaking the Generating the SQL command text to reflect this way of working.
But we can quickly experience why Entity Framework chops the insert statements in batches.
When we try to insert more than 1000 records, SQL throws the error:
We can work around this restriction, by using a Table Valued Parameter. For this solution, the first step is to create a User-defined Table Type.
We'll use (read, populate it with the customers) this table type later when we execute the SQL query.
CREATETYPE [dbo].[CustomersTableType] ASTABLE (
[Id] [uniqueidentifier] NOT NULL,
[LastName] [nvarchar](255) NOT NULL,
[FirstName] [nvarchar](255) NOT NULL,
[Street] [nvarchar](255) NOT NULL,
[City] [nvarchar](255) NOT NULL,
[State] [nvarchar](255) NOT NULL,
[PhoneNumber] [nvarchar](255) NOT NULL,
[EmailAddress] [nvarchar](255) NOT NULL
)
Next, we must convert the customer list into a DataTable to be able to pass the data to the SQL query. Do not forget to assign the SQL parameter's type name to the name given to the above table type.
In the SQL query, we select all the customers assigned to the table type parameter and insert them into the customer table.
With this way, we execute one SQL query, which inserts all of the customers at once.
varcmdText=@"
insert into dbo.Customers (Id, FirstName, LastName, Street, City, State, PhoneNumber, EmailAddress)
select Id, FirstName, LastName, Street, City, State, PhoneNumber, EmailAddress
SQL has a built-in mechanism to import a large volume of data, called Bulk Insert. Luckily for us, dotnet supports a Bulk Insert with the SqlBulkCopy class.
Besides the visible performance advantage over the other solutions, we can also easily tweak the behavior with some Options.
To use the SqlBulkCopy, we need to create a new instance of the class and provide the SQL destination table.
When we write the customer list into the customers table, with the WriteToServer method on the SqlBulkCopy class, the method expects the collection to be a DataTable.
using(varcopy=newSqlBulkCopy(connectionString))
{
copy.DestinationTableName="dbo.Customers";
// Add mappings so that the column order doesn't matter
When we take another look at what this does under the hood (with SQL Extended Events), we notice that SqlBulkCopy generates the following SQL import statement.
We can see that there are multiple tactics to make a bulk insert fast(er), and why a technique is faster than another.
Not that we always must use the fastest solution out there, but I think it's good to know the options that can be used in different scenarios. For smaller collections, it's probably good enough to use the solution that offers the best developer's ergonomics.
By using a SQL Bulk Copy in our migration, we've won many hours that would otherwise be wasted while waiting on a migration run to complete.
If you know another way to insert a hundred-thousands of records in a fast way, feel free to send me a DM at @tim_deschryver.
I'd love to hear and learn from you.