Skip to content

SSIS SCD vs MERGE Statement – Performance Comparison

July 3, 2012

I wouldn’t class myself as an expert in SSIS but I certainly know my way around but came across something today which I thought I’d share. As with a lot of things there are “many ways to skin a cat”, none of which is something I’ll go into at the moment but what i will concentrate on is updating columns in a table where the data has changed in the source.

One of the projects I’m currently working on requires this very process and when i set about doing so I created the T-SQL Merge statement to do the business. However, the question was raised as to why I didn’t use SSIS’s built in component Slowly Changing Dimension (SCD)? I didn’t really have an answer other than personal preference but decided to delve into it a bit further and compare the performance of each method.

As a test, I created a source table with a Key and Name column:

USE TempDB;

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.iSource') AND type in (N'U'))
	DROP TABLE dbo.iSource;

CREATE TABLE dbo.iSource
(
   ID INT,
   Name varchar(100)
);

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.iTarget') AND type in (N'U'))
	DROP TABLE dbo.iTarget;
	
CREATE TABLE dbo.iTarget
(
   ID INT,
   Name varchar(100)
);

and populated it with some dummy data:

INSERT INTO dbo.iSource (ID,Name)
SELECT TOP 10000
ROW_NUMBER() OVER (ORDER BY t.object_id) AS rownumber
,'Name_'+convert(varchar(4),ROW_NUMBER() OVER (ORDER BY t.object_id))
FROM sys.tables t
CROSS JOIN sys.stats s;

INSERT INTO dbo.iTarget (ID,Name)
SELECT TOP 10000 
ROW_NUMBER() OVER (ORDER BY t.object_id DESC) AS rownumber --Done in descending order
,'Name_'+convert(varchar(4),ROW_NUMBER() OVER (ORDER BY t.object_id))
FROM sys.tables t
CROSS JOIN sys.stats s;

SELECT ID, Name FROM iSource;
SELECT ID, Name FROM iTarget;

So we now have a source and target table with different Names and we’ll look to update the iTarget table with the information coming from iSource.

Method 1 – MERGE Statement

MERGE dbo.iTarget AS target
	USING (
	SELECT ID, Name
	FROM dbo.iSource
	 ) AS  source (ID, Name)
		ON (target.ID = source.ID)
		WHEN MATCHED AND target.Name <> source.Name 
		THEN 
			UPDATE SET Name = source.Name
	 WHEN NOT MATCHED THEN 
		 INSERT (ID, Name)
		 VALUES (source.ID, source.Name); 

Using this method simply in SSMS for simplicity, profiler output 2 rows for Batch Starting and Batch Completing, CPUTime of 125ms and Duration of 125ms and it updated 6678 records. Top stuff, as expected.

Method 2 – SSIS SCD Component
I rebuilt the tables to put them back to where we started and set about creating the same thing in SCD setting ID as the business key and Name as the changing attribute and not setting inferred members, below is a screen dump of the outcome of this:

BEFORE:

I clear down the profiler and run the ssis package and the outcome is quite astounding.

DURING/AFTER:

The profiler output 13456 rows including 6678 rows of queries like this:

exec sp_executesql N'SELECT [ID], [Name] FROM [dbo].[iTarget] WHERE ([ID]=@P1)',N'@P1 int',8

as well as 6678 rows of queries similar to this:

exec sp_execute 1,'Name_3304',3304

Total Duration of 37 seconds (yes that’s seconds not ms!!)…….and this is on a table of only ~7k rows!

Well I’ll be damned, the SCD basically runs a cursor looping each record checking for a match on ID and updating that record if so. I can’t actually believe that MS have built a component which performs in this way.

So, to answer the question asked ” why I didn’t use SSIS’s built in component Slowly Changing Dimension (SCD)?”, I now have a definitive answer, it doesn’t perform!

I’m sure SCD has its place but for me, the requirements and the datasets I’m working on I think I’ll stick with MERGE for now….. :)

NOTE: This was done on SQL Server 2008R2 Developer Edition running on Windows 7 Ultimate, not sure if SQL Server 2012 has improved the SCD performance but I’ll leave that for another day.

About these ads

From → SQL Server, SSIS

9 Comments
  1. Phil permalink

    I’ve used the Codeplex SSIS Dimension Merge SCD component in 2005 and this performed very weel – compared with the Microsoft out-of-the-box offering. Unfortunately, there is a maintenance overhead if your adding new attributes or dimensions.

    Now in 2008, the MERGE statement has to be the best option – especially if you tie it into a stored procedure into which you pass dimension information so that you can us to generate dynamic MERGE statements.

    • Thanks Phil, thats interesting!

      I’m gonna take a look at that component on Codeplex and see how that compares.

    • Phil, I downloaded that component and setup the same test and the output is far quicker than the standard SCD component but still exceptionally slow in comparison to the Merge Statement. The Codeplex component took 14 seconds which is far better than the 37 seconds for the Standard SCD but no where near as good as the 125ms for the Merge statement.

      For those who care, the output in profiler was similar to that of the standard SCD in that it does an update for each record it matches…..this may be my next challenge, write a SCD component that performs as well as the Merge statement :)

  2. Phil permalink

    Hi Chris, Yep, I’d agree – if the MERGE statement is available to you (i.e. 2008 on) use it in favour of any components. As I say, by incorporating it into a dynamic statement, perfromance is increased and maintenance is much reduced…..

  3. Chris,

    SCD transformation is slow, I won’t argue there.

    Merge statement (or any t-SQL construct) is faster than SSIS, but it depends on certain things – location of source and destination, for example. If they are on the same server t-SQL might be faster. Otherwise SSIS does the perfect job. I wouldn’t use SCD transformation still. Lookup transformation and set based updates is the approach I prefer. (http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/62063/)

  4. If you want to create and maintain MERGE statement for SCD quickly, use http://scdmergewizard.codeplex.com/

    cheers

  5. Build-in SCD IS FAST … This was my “shocker” at my SSIS Webinar I did last Thursday.

    250k input rows in 200k dimension took 90 seconds on a standard laptop… without extra optimization … with optimization 30 seconds.

    http://katieandemil.com/ssis-scd-checklist

    Take care
    Emil

    • It’s be interesting to see your results for a similar test using the merge statement as a comparison…..feel free to report your results back here, I’d be very keen to see them!

      • Hi

        I’ve been discussing that with SSISJoost (MSDN Forum / Twitter) and he will perform comparison between Optimized Build-in SCD and T-SQL Merge. I’ll let you know the results once I get them but I presume difference in seconds will be very small for most SCD loads so that will give two good options for SSIS Developers.

        Regards
        Emil

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: