Monday, November 28, 2011

Speeding up inserts using Linq-To-Sql - Part 1


I posted a followup on this at http://blog.tanneryd.com/2011/12/speeding-up-inserts-using-linq-to-sql.html.

I use linq-to-sql quite a lot and I really like it. But, if you need to insert huge amounts of data into a table, linq-to-sql is not your best friend. I had a table I needed to initialize with more than a million rows and it took forever, something like 10 hours. I scanned the net for solutions but couldn't find anything I really liked. There were various tips about SqlBulkCopy but the proposed solutions were all really complex and mostly involved generating regular SQL code and I really did not want to do that. I just wanted the linq-to-sql datacontext to have a method like InsertAllOnSubmit that would bulk insert my millions of linq-to-sql entities. So, I wrote my own. The solution I ended up using is the one posted below. It extends the linq-to-sql generated DataContext class and should work just fine for you as well. The CommandTimeout setting isn't necessary but I added it just to show how to set it if you need to.
    partial class MyDataContext
    {
        partial void OnCreated()
        {
            CommandTimeout = 5*60;            
        }

        public void BulkInsertAll<T>(IEnumerable<T> entities)
        {
            string cs = Connection.ConnectionString;
            var conn = new SqlConnection(cs);
            conn.Open();

            Type t = typeof (T);

            var properties = t.GetProperties();
            var table = new DataTable();
            foreach (var property in properties)
            {
                table.Columns.Add(new DataColumn(property.Name, property.PropertyType));
            }

            foreach (var entity in entities)
            {
                table.Rows.Add(properties.Select(property => property.GetValue(entity, null)).ToArray());
            }
            var tableAttribute = (TableAttribute)t.GetCustomAttributes(typeof(TableAttribute), false).Single();
            var bulkCopy = new SqlBulkCopy(conn) { DestinationTableName = tableAttribute.Name };
            bulkCopy.WriteToServer(table);
            conn.Close();
        }
    }

5 comments:

Måns Tånneryd said...

Trying this code out I noticed that I needed to make some adjustments. It does not work very well with foreign keys and you have to be careful to use only dedicated data contexts otherwise the password will get stripped from the connection string before you try to use it to create the SqlConnection. I'll post an update later today.

Bikram said...

Thanks for posting this solution. It's a cool way to bulk insert and I was looking to do something similar. But when I tried running your code within my project, I am getting the following error. Any idea why is this happening?? Thanks.

{"Failed to obtain column collation information for the destination table. If the table is not in the current database the name must be qualified using the database name (e.g. [mydb]..[mytable](e.g. [mydb]..[mytable]); this also applies to temporary-tables (e.g. #mytable would be specified as tempdb..#mytable)."}

Måns Tånneryd said...

Bikram:
Not sure why you get that. It's been working very well for me. According to MS that error message is what you get if you try to bulk copy to a table with a dot in the table name. Any chance that that is what is happening for you?

rs said...

Thank you for the article!
In my case it worked after I have added colum mapping to itself like this:

foreach ( var col in properties )
{
bulkCopy.ColumnMappings.Add(col.Name, col.Name);
}
And changed to map only the public properties that have a colum attribute
---------------
here is full code:

public LinqContextExtensions(string connStr)
: base(connStr)
{
_connString = connStr;
}

public void BulkInsertAll(IEnumerable entities, String destinationTbl)
{
#region populate datatable with entities

entities = entities.ToArray();
Type t = typeof(T);
//var tableAttribute = (TableAttribute)t.GetCustomAttributes(typeof(TableAttribute), false).Single();
var properties = t.GetProperties(BindingFlags.DeclaredOnly |
BindingFlags.Instance |
BindingFlags.Public
).Where(EventTypeFilter).ToArray();
var table = new DataTable();

foreach ( var property in properties )
{
Type propertyType = property.PropertyType;
if ( propertyType.IsGenericType &&
propertyType.GetGenericTypeDefinition() == typeof(Nullable<>) )
{
propertyType = Nullable.GetUnderlyingType(propertyType);
}
table.Columns.Add(new DataColumn(property.Name, propertyType));
}

foreach ( var entity in entities )
{
try
{
var row = properties.Select(property => GetPropertyValue(
property.GetValue(entity, null))).ToArray();
table.Rows.Add(row);
}
catch ( Exception e )
{

throw e;
}
}
#endregion

#region Bulk Insert Data table
var sqlconn = new SqlConnection(_connString);
sqlconn.Open();
var bulkCopy = new SqlBulkCopy(sqlconn) { DestinationTableName = destinationTbl };

//specify the mapping for SqlBulk Upload
foreach ( var col in properties )
{
bulkCopy.ColumnMappings.Add(col.Name, col.Name);
}

// bulkCopy.BatchSize = 200;
try
{
bulkCopy.WriteToServer(table);
}
catch ( Exception e )
{
throw e;
}
finally
{
sqlconn.Close();
}
#endregion
}

private bool EventTypeFilter(System.Reflection.PropertyInfo p)
{
//Exclude ID column for bulk upload
if ( p.Name == "ID" )
return false;

var columAttr = Attribute.GetCustomAttribute(p, typeof(ColumnAttribute)) as ColumnAttribute;
if ( columAttr != null )
return true;
else
return false;
//var assocAttr = Attribute.GetCustomAttribute(p, typeof(AssociationAttribute)) as AssociationAttribute;
//if ( assocAttr.IsForeignKey == false )
// return true;
}

private object GetPropertyValue(object o)
{
if ( o == null )
return DBNull.Value;
return o;
}
Roma

Lars Schmidt Weinreich said...

Thank you sooooo much for this. I have a lot of insert statements and this solution really helped me a lot. Things are going a lot faster around here now :D