SQLGeordie's Blog

Helping the SQL Server community……where i can!

SSIS SCD vs MERGE Statement – Performance Comparison — July 3, 2012

SSIS SCD vs MERGE Statement – Performance Comparison

I wouldn’t class myself as an expert in SSIS but I certainly know my way around but came across something today which I thought I’d share. As with a lot of things there are “many ways to skin a cat”, none of which is something I’ll go into at the moment but what i will concentrate on is updating columns in a table where the data has changed in the source.

One of the projects I’m currently working on requires this very process and when i set about doing so I created the T-SQL Merge statement to do the business. However, the question was raised as to why I didn’t use SSIS’s built in component Slowly Changing Dimension (SCD)? I didn’t really have an answer other than personal preference but decided to delve into it a bit further and compare the performance of each method.

As a test, I created a source table with a Key and Name column:

USE TempDB;

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.iSource') AND type in (N'U'))
	DROP TABLE dbo.iSource;

CREATE TABLE dbo.iSource
(
   ID INT,
   Name varchar(100)
);

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.iTarget') AND type in (N'U'))
	DROP TABLE dbo.iTarget;
	
CREATE TABLE dbo.iTarget
(
   ID INT,
   Name varchar(100)
);

and populated it with some dummy data:

INSERT INTO dbo.iSource (ID,Name)
SELECT TOP 10000
ROW_NUMBER() OVER (ORDER BY t.object_id) AS rownumber
,'Name_'+convert(varchar(4),ROW_NUMBER() OVER (ORDER BY t.object_id))
FROM sys.tables t
CROSS JOIN sys.stats s;

INSERT INTO dbo.iTarget (ID,Name)
SELECT TOP 10000 
ROW_NUMBER() OVER (ORDER BY t.object_id DESC) AS rownumber --Done in descending order
,'Name_'+convert(varchar(4),ROW_NUMBER() OVER (ORDER BY t.object_id))
FROM sys.tables t
CROSS JOIN sys.stats s;

SELECT ID, Name FROM iSource;
SELECT ID, Name FROM iTarget;

So we now have a source and target table with different Names and we’ll look to update the iTarget table with the information coming from iSource.

Method 1 – MERGE Statement

MERGE dbo.iTarget AS target
	USING (
	SELECT ID, Name
	FROM dbo.iSource
	 ) AS  source (ID, Name)
		ON (target.ID = source.ID)
		WHEN MATCHED AND target.Name <> source.Name 
		THEN 
			UPDATE SET Name = source.Name
	 WHEN NOT MATCHED THEN 
		 INSERT (ID, Name)
		 VALUES (source.ID, source.Name); 

Using this method simply in SSMS for simplicity, profiler output 2 rows for Batch Starting and Batch Completing, CPUTime of 125ms and Duration of 125ms and it updated 6678 records. Top stuff, as expected.

Method 2 – SSIS SCD Component
I rebuilt the tables to put them back to where we started and set about creating the same thing in SCD setting ID as the business key and Name as the changing attribute and not setting inferred members, below is a screen dump of the outcome of this:

BEFORE:

I clear down the profiler and run the ssis package and the outcome is quite astounding.

DURING/AFTER:

The profiler output 13456 rows including 6678 rows of queries like this:

exec sp_executesql N'SELECT [ID], [Name] FROM [dbo].[iTarget] WHERE ([ID]=@P1)',N'@P1 int',8

as well as 6678 rows of queries similar to this:

exec sp_execute 1,'Name_3304',3304

Total Duration of 37 seconds (yes that’s seconds not ms!!)…….and this is on a table of only ~7k rows!

Well I’ll be damned, the SCD basically runs a cursor looping each record checking for a match on ID and updating that record if so. I can’t actually believe that MS have built a component which performs in this way.

So, to answer the question asked ” why I didn’t use SSIS’s built in component Slowly Changing Dimension (SCD)?”, I now have a definitive answer, it doesn’t perform!

I’m sure SCD has its place but for me, the requirements and the datasets I’m working on I think I’ll stick with MERGE for now….. 🙂

NOTE: This was done on SQL Server 2008R2 Developer Edition running on Windows 7 Ultimate, not sure if SQL Server 2012 has improved the SCD performance but I’ll leave that for another day.

Advertisements
Powershell and Dynamic SSIS (or any for that matter) Config files — December 5, 2011

Powershell and Dynamic SSIS (or any for that matter) Config files

As I imagine that the majority of people who are reading this will have some level of SSIS knowledge, I’ll not go into explanations about package configurations in SSIS and its various methods but rather jump straight to it.

The majority of environments I’ve worked in where SSIS packages are utilised, tend to sway down the XML Config package configuration method. As many of you are aware, in multi-tier SQL Environments (ie. Integration, QA, UAT etc etc) this can be a pain when deploying the packages to the relevant environments because you have to at some stage reconfigure the XML configuration file to have the correct parameters to pass into the package for each environment. This can become an even worse scenario when you have tens if not hundreds of SSIS packages (and corresponding dtsConfig’s) and are upgrading your infrastructure with new servers (and/or instance names) as well as drive configurations.

If you don’t have the time to be re-working the SSIS packages to use a SQL table (depending on your process this could take a while to get through development, testing etc) to hold the configuration parameters which makes it easy to script, deploy and update then here’s a simple trick using Powershell (i’m still at the very basic Powershell level so bare with me!!) you can use for your existing dtsConfig files. The sample i’ll be using is for a dtsConfig used in a Restore package.

Unfortunately you’re still going to have to do some initial amending of one config file here :(.

Firstly, lets amend the relevant parameter values to have a standard name for each, as an (snipit) example:

<Configuration ConfiguredType=”Property” Path=”\Package.Connections[master – destination].Properties[ServerName]” ValueType=”String”>
<ConfiguredValue>*SQLInstance*
</Configuration>
<Configuration ConfiguredType=”Property” Path=”\Package.Variables[User::DataFilesDir].Properties[Value]” ValueType=”String”>
<ConfiguredValue>*DataFilesDir*
</Configuration>

I’ve used a * to prefix and suffix the parameter so that you don’t amend anything that may have a similar name.

By doing this, you can run a bit of Powershell to update the relevant element parameter values for each instance by using the code below, NOTE i’ve excluded the setting of the parameters etc as I use this as a module and don’t want to waste space:

#Create Copy for dtsConfigs by piping the date from the static file (with *'d parameters to the new file to be used
(Get-Content $RestoreDatabasesDynamicConfig) |
Set-Content $RestoreDatabasesDynamicConfig_New

#Amend the dtsConfig's and create new files to use
$restoreArray = ("SQLInstance", "DataFilesDir", "LogFilesDir", "SSISRestoreLocationDBs")

#Loop through the array to replace each parameter listed
foreach($Arr in $restoreArray){
	$Replace = "\*"+$Arr+"\*" #This sets the parameter with the * prefix and suffix
	$WithThis = "$"+$Arr #What parameter name passed in to replace the text with, ie. $SQLInstance, $DataFilesDir

	switch ($Arr)
	{
		# Use $ExecutionContext.InvokeCommand.ExpandString($WithThis) to evaluate the string (ie $SQLInstance)
		"DataFilesDir" {$WithThis = $SQLInstallLocation+$ExecutionContext.InvokeCommand.ExpandString($WithThis)}
		"LogFilesDir" {$WithThis = $SQLInstallLocation+$ExecutionContext.InvokeCommand.ExpandString($WithThis)}

		#I've left the above in as an example of how to set the files to a folder location passed in as a Parameter ($SQLInstallLocation)
		default {$WithThis = $ExecutionContext.InvokeCommand.ExpandString($WithThis)}
	}

	#Now create the new dtsConfig file with updated Parameter information
   (Get-Content $RestoreDatabasesDynamicConfig_New) |
	Foreach-Object {$_ -replace "$Replace", $WithThis} |
	Set-Content $RestoreDatabasesDynamicConfig_New

}

In short, this script is taking the amended dynamic file with all the parameters with their *’s prefixed and creating a new dtsConfig file. It then builds an array of parameters to work through and do the replacement, the values of these are(or indeed can be…) passed through to the function/module. I’ve put a switch in there to check for particular items in the array as in my example i wanted to append a folder location to the value passed in. you don’t necessarily have to do this but left it in to show it can be done.

Another example of using this is for a silent SQL Server 2008 install. Once you have a ConfigurationFile.ini then you can follow the same process to put a standard tag in the file and use powershell to find and replace it with a parameter value – works an absolute treat when installing many instances

I’m sure there’ll be someone far more clever than me that can somehow the file for the element and replace any value without standardising the parameter values but I’m no expert with Powershell and learning everyday and hope some others out there can get some use out of this…..and yes, I also realise that you can do a “Find and Replace in Files” with Notepad++ but this technique is great for automating!