SQLGeordie's Blog

Helping the SQL Server community……where i can!

Azure DevOps – Job Dependencies — March 12, 2019

Azure DevOps – Job Dependencies

I wanted to through together a quick post here to help out others who may be having issues with running multiple jobs in a build pipeline and there being no consistency in what order they run in.

The Problem

When I first started with VSTS and ultimately Azure DevOps, I went through many failed builds because the order of the jobs in your pipeline don’t run in the order that you’ve built them and how you would logically believe them to run. The image below shows two Build Pipeline jobs but when the build is queued, whether this be manual or via CI, the second job is running before job #1. In this example the build will fail because Job #2 is to deploy a dacpac to a SQL Server on Linux Docker Container (Using Ubuntu Agent Host) but obviously this cannot be done until the dacpac has been created in Job #1 which is running on a VS2017 Agent Host:

1 – Job running in wrong order

The reason for this is that when you specify multiple jobs in a build pipeline , by default they run in parallel. In my experience I never found that both jobs ran in parallel, always one after the other so it doesn’t seem to quite match what the Microsoft docs states but it’s not something I’ve ever spent the time investigating further.

This can obviously be very frustrating especially as (from my testing) there is no consistency in which order they run but I did find that cancelling a build or re-running a build straight after a failure seemed to through it out of sync whereas previously it was all running in the correct order.

The Fix

To stop this sporadic order of job running you can set Job dependencies in Azure DevOps. The process is so simple to setup and if you didn’t know about it and have been tearing your hair out over failed builds due to ordering then you’re going to kick yourself when you see the simplicity.

All you need to do is select the job you wish to start after another (other specific conditions can be applied) job completes and scroll to Dependencies, click the drop down and (in this example there is only one) the job you want to depend on will be in the list:

Select Job Dependency

That’s it, couple of clicks and your job ordering is sorted 😉

Advertisements
CI/CD with Docker and Azure DevOps – Part 2 (Creating an Azure DevOps Pipeline) — February 14, 2019

CI/CD with Docker and Azure DevOps – Part 2 (Creating an Azure DevOps Pipeline)

For video demos please see: Database CI/CD with Containers (Docker) and Azure DevOps (Demo’s – YouTube)

Introduction

In part 1 of this series we went about setting up our Azure DevOps account, creating a project and adding a Database Project to it. In Part 2 we will look to run through creating a new build pipeline creating a new Docker Image and pushing it to DockerHub.

NOTE: In Part 3 we will change to using Microsoft Container Registry (MCR) for two reasons:

  • To highlight the issues with using DockerHub with Azure DevOps
  • Because we can 🙂

Before we begin creating our build pipeline, it is advised that a Service Connection to Docker Hub (we will also be creating one for Kubernetes in Part 2) is created. This means we aren’t entering passwords / other secure information into our YAML file.

We can do this by selecting Service Connections from the Project Settings. From the image below you can see that there are a large variety of service connections that can be created but we will be choosing Docker Registry:

Simply give the connection a name and enter the Docker ID and Password:

NOTE: once you have created this you need to re-save your pipeline before it will work. This Resource Authorization link provides more information but Microsoft are working improving the user experience on this.

Now that we have created the service connection, we can now look to create our Build Pipeline. Select Builds from the menu on the left and click “New Pipeline”

Select the repository where the project you wish to build resides, in our case it is Azure Repos:

Select the project we have created – TestProj”:

You will now be presented with the option to use a Build Pipeline template or start from scratch.

One of the templates is “Docker Image” so we will choose that one:

This will auto generate some YAML code to get you started:

As we are using DockerHub as opposed to MCR we have to make a change to the azure-pipelines.yml file which will be used.

This link provides more information but in short we need to change the filename:

If you have a Docker Hub account, and you want to push the image to your Docker Hub registry, use the web UI to change the YAML file in the build pipeline from azure-pipelines.yml to azure-pipelines.docker.yml. This file is present at the root of your sample repository.

https://docs.microsoft.com/en-gb/azure/devops/pipelines/languages/docker?view=azure-devops&tabs=yaml#example

Once you have made the change, annoyingly you don’t seem to be able to exit from the file with a simple “Save”, you have to “Save And Run”, which will initiate a failed build.

You can pull the latest changes locally and view/change the file in VS if you prefer:

NOTE: You will also need to update the Pipeline to use the new file. You can do this using the Visual Editor:

So, we now have our YAML file name updated and commited as well as the build pipeline updated to use it. However, before we proceed we need an actual Docker image and push that to our Docker Hub repo

Pull latest SQL Server 2019 on Linux image to local repository:

docker pull mcr.microsoft.com/mssql/server #This will pull the latest version 

Now push this image up to Docker Hub giving it the tag “testprojsql2019“:

docker tag mcr.microsoft.com/mssql/server:latest sqlgeordie/azuredevops:testprojsql2019
docker push sqlgeordie/azuredevops:testprojsql2019

Using VSCode for output:

We’re not quite ready to run our build, the build pipeline doesn’t create a Dockerfile so we need to create this ourselves. If we don’t we get this error:

"unable
to prepare context: unable to evaluate symlinks in Dockerfile path:" lstat
/home/vsts/work/1/s/Dockerfile: no such file or directory

Dockerfile

FROM sqlgeordie/azuredevops:testprojsql2019
RUN mkdir -p /usr/src/sqlscript
WORKDIR /usr/src/sqlscript
CMD /bin/bash  

Now, we have to amend the YAML file to login in to DockerHub for us to be able to pull down the image in order to build using the Dockerfile. You will notice in the image below that i have highlighted “build an image”, the reason for this is relevant in the next section.

Build input:

steps:
 task: Docker@1 
 displayName: 'Build an image'
 inputs:
 containerregistrytype: 'container Registry'
 dockerRegistryEndpoint: sqlgeordiedockerhub
 imageName: 'sqlgeordie/azuredevops:$(Build.BuildId)'
 command: build an image
 dockerFile: '**/Dockerfile' 

Login input:

 task: Docker@1
 displayName: Login
 inputs:
     containerregistrytype: 'container Registry'
     dockerRegistryEndpoint: sqlgeordiedockerhub
     command: login 

Push Input:

task: Docker@1
   displayName: 'Push an image'
   inputs:
     command: push an image
     imageName: 'sqlgeordie/azuredevops:$(Build.BuildId)' 

There are examples on GitHub docs which have (in my opinion) errors. For example, I mentioned earlier that I highlighted “build an image” for a reason, that reason is that it is incorrectly stated as “build” (also the same for “push”) on GitHub and this gives errors.

Complete YAML File

#Docker image
#Build a Docker image to deploy, run, or push to a container registry.
#Add steps that use Docker Compose, tag images, push to a registry, run an image, and more:
#https://docs.microsoft.com/azure/devops/pipelines/languages/docker

trigger:
 - master 
pool:
 - vmImage: 'Ubuntu-16.04'
steps:
 - task: Docker@1 
   displayName: 'Build an image'
   inputs:
     containerregistrytype: 'container Registry'
     dockerRegistryEndpoint: sqlgeordiedockerhub
     imageName: 'sqlgeordie/azuredevops:$(Build.BuildId)'
     command: build an image
     dockerFile: '**/Dockerfile'
 - task: Docker@1
   displayName: Login
   inputs:
     containerregistrytype: 'container Registry'
     dockerRegistryEndpoint: sqlgeordiedockerhub
     command: login
 - task: Docker@1
   displayName: 'Push an image'
   inputs:
     command: push an image
     imageName: 'sqlgeordie/azuredevops:$(Build.BuildId)' 

The “incorrect” example in the docs is:

- task: Docker@1
     displayName: Build image
     inputs:
     command: build
          azureSubscriptionEndpoint: $(azureSubscriptionEndpoint)
          azureContainerRegistry: $(azureContainerRegistry)
          dockerFile: Dockerfile
          imageName: $(Build.Repository.Name) 

The strange thing is that if you edit the file directly online there is no error:

However, if you edit the pipeline you see the red syntax error “squiggle“:

Please make sure you change this otherwise you will receive and error in your build.

To get back on track, in theory, we should now be able to run the build which will pull the image from our DockerHub repository, and initiate building a new Docker image from our Dockerfile (granted a very basic build) and push it back up to DockerHub

We can now check the image exists in Docker Hub:

Pull it down:

Check local images:

There we have it, we have successfully built a new Docker image from a Dockerfile which resides on DockerHub and pushed the newly created image back to DockerHub.

In the Part 3 we will look to expand on this and look to incorporate it into the TestProj we created in Part 1 and show how we can push changes made to our TestDBProj to Azure DevOps to initiate a build process to provide a new Docker Image with our changes applied.

I have created some videos (no audio I’m afraid) of this process which was used as part of my sessions at Data In Devon and DataGrillen earlier this year. I will be looking to replace these with either a full blog post (ie. Part 3) or perhaps re-record the videos with audio.

CI/CD with Docker and Azure DevOps – Part 1 (Creating our Database Project) —

CI/CD with Docker and Azure DevOps – Part 1 (Creating our Database Project)

For video demos please see: Database CI/CD with Containers (Docker) and Azure DevOps (Demo’s – YouTube)

Introduction

I decided to throw together some steps on how you can start utilising SQL Server running on Linux in a Docker container a little more than just doing a docker run, connecting via SSMS/sqlcmd, running a few commands then blowing it away and saying you now know docker 🙂

DevOps, CI/CD, automation, whatever you want to call it is a massively hot topic at the moment and I’ll be running through how you can get yourself started with a database project (.sqlproj) and source control using Azure Repo (Repository). We will then run through how to take a Docker image (this can be stored in Dockerhub or Microsoft Container Registry) and push any changes to this via an Azure DevOps build pipeline to create a new Docker image with your changes applied. For the purposes of this blog post I will be using DockerHub but Part 3 will be using the Microsoft Container Registry (MCR) due to issues I have discovered in the Deployment pipeline in Azure DevOps meaning that I can’t use DockerHub.

This is quite a lengthy post so I will save the Deployment pipeline to run the newly created image in a Kubernetes cluster ready for UAT, Pre-Prod or even Production (if you’re brave enough) for part 3. Or, if you’re in Exeter for Data in Devon on Saturday April 27th, or DataGrillen on Thursday/Friday 20th/21st June then come and see my session where we will delve a bit more into this.

Seeing as most of this will be run in Azure, there are small costs associated with running the Kubernetes cluster but everything else in part 1 and part 2 are (at the time of writing) completely free – so no excuses! the list of tools used are listed below:

  • SSDT (Free)
  • Azure DevOps including Azure Repo (Free)
  • Docker for Windows Community Edition (CE)
  • DockerHub (Free, Ts&Cs apply)
  • Azure Kubernetes Service (AKS) (Not Free but costs can be kept low)

Install SSDT

First off, in order to create our Database Project (sqlproj) then we will need to download and install SQL Server Data Tools. This will install a Visual Studio Shell so you don’t necessarily need a full blown edition of Visual Studio. You can download it here.

Setup Azure DevOps Account

Azure DevOps is a one stop shop for your Development life cycle requirements and can simplify the automation of an entire continuous integration (CI) and continuous delivery (CD) pipeline to Azure . There are several tools within it including:

  • Boards
  • Pipelines
  • Repos
  • Test Plans
  • Artifacts

There are paid for aspects of Azure DevOps. For example if you wish to make an Artifact available to teams larger than 5 users then you will have to pay $30/mth. Everything we will be using here is free!

To get started, you will need to create an account or if you have one already, sign in:

Once you have created your account and went through the various authentication/verification, you can start creating a project in Azure Repo. In the screen shots below you can see that even a Private repo is free which is fantastic!

Give it a name and Description if you wish and choose a version control type. I have chosen Git because that’s what all the cool kids are using.

Congratulations! Your project is now created and ready to connect to using VS / SSDT.

Connect using Visual Studio 2017

You can connect directly to your Azure Repo (Git) project directly from VS, this is done by selecting the Team Explorer menu option:

Enter your login details and select the repo

You will have to “Map” the Git Repo to a folder on your local machine, this can be anywhere so choose somewhere suitable.

Enter the details you used to create the Azure DevOps account and Save Git Credentials so you won’t have to re-enter these later.

That is it, you are now connected to your Azure Repo so any project you make changes to locally can now be pushed to an online repository!

For this demo I have created a folder in the project called “TestDBProj” in Solution Explorer, this isn’t necessary but I like to try and keep things a little tidy:

Looking at Team Explorer, everything is now setup and you have now cloned the TestProj project to your local development machine. We can now get started with making a few changes.

Create Database Project

Seeing as TestProj literally has nothing in it, we need to create a new Database project and we can call it TestDBProj:

Fantastic, our TestDBProj is now created and we can start adding objects to it. At this point if you have a database schema you need to import you can do that using the schema compare but again, keeping it simple we’ll just add a new table called TestTable1:

Add a new column called [descr]:

We’ve made some changes but at the moment our project has not been added to Source Control so let’s add it:

Check the changes and commit making sure you add a commit message:

NOTE: This can also be done via command line but as this is an introduction / getting started we are using the Gui.

You should now get a message saying that the changes have now been committed locally:

To get the changes into our Azure Repo we need to push them, or in the case of the Gui “Sync” our changes and using “Push”:

There we have it, our changes have now successfully pushed to our Azure Repo and we can check online and it should now be available:

In Part 2 of this series we will start looking at the Azure DevOps Build Pipeline and how to create a new (very basic changes made) Docker Image from an Image stored in DockerHub and push it back.

I have created some videos (no audio I’m afraid) of building an Azure DevOps Pipeline which will create a new docker image with the changes made to the project and deploy it to Kubernetes. These are part of the demos used as part of my sessions at Data In Devon and DataGrillen earlier this year.

Creating a Power BI demo environment with SQL Server 2019 on Linux in a Docker Container as the data source. — December 5, 2018

Creating a Power BI demo environment with SQL Server 2019 on Linux in a Docker Container as the data source.

I (and many others) have done a series of docker blog posts over the last couple of years but they’ve all tended to evolve around spinning up a SQL Server 2017+ container for testing or demo purposes. This is only really the start, think of the bigger picture here, once you have your database environment the world is your oyster.

This blog post will show how we can use SQL Server 2019 CTP2.1 running on Linux (Ubuntu) in a docker container as our data source for a Power BI environment in next to no time!

These steps show a very manual process for completing this setup, if it is something you are looking to do frequently then I suggest creating a Dockerfile and/or yml file and use docker-compose. This way you can have all your setup in one file and it will be a single statement to get your SQL Server 2019 environment up and running.

Prerequisites:

Setup

Once you have Power BI installed you will have to enable Hyper-V windows feature before installing Docker for Windows. If you don’t, the docker install will ask you if you wish to enable Hyper-V but in my experience this has rarely (ie. never) worked well – could just be me but I play it safe and enable it before installing. Running Linux containers on Windows 10 inevitably means that the Container will run a Isolated / Hyper-V Container which basically runs a cut down, lightweight Hyper-V VM.

Once DfWCE is installed, you will have to modify the Memory settings to be at least 3250MB. By default this is 2048MB but this isn’t enough to run SQL Server On Linux Containers and needs to be amended.

 

Make sure this is min 3250MB

Once configured, we’re now ready to create our SQL Server 2019 CTP2.1 on Ubuntu. As we are going to be creating a container with a database that we can use the data from (ie. AdventureWorks2016CTP3), we’re first going to create a data volume and copy the bak file (from C:\Docker\SQLServer\Backups) to it so it is available to restore:

#Create dummy container to define and copy backup file
docker container create --name dummycontainer -v sqldatavolume:/sqlserver/data/ sqlgeordie/sqlonlinuxdemo:demo

#Check that the container exists
docker ps -a --format $psformat

#Copy AdventureWorks or whatever you like ;)
docker cp C:\Docker\SQLServer\Backups\AdventureWorks2016CTP3.bak dummycontainer:/sqlserver/data/AdventureWorks2016CTP3.bak

#Check what volumes you have
docker volume ls

#Don't need the dummycontainer anymore so get rid
docker rm dummycontainer

NOTE: We have “created” a container but don’t need to run it so the status will indicate that it is “Created” and not the up-time:

You may be asking why I’ve created a datavolume as opposed to just bind mounting my local folder to the container – good question. There are many blog posts out there explaining the pros/cons of both but for me the biggest reason is that mounting a host folder goes against everything Containers are about – portability. The container shouldn’t care about where it is or anything outside of it, bind mounting means that wherever this container is created, you will need that host folder setup and available. Using a data volume means we can reuse this for other containers, even on another host if we were in an environment that has access.

Have a read here and here for more information.

Now we have our data volume with a copy of AdventureWorks2016CTP3.bak, we can proceed to creating the container:

#Create SQLLinuxLocalPersist container
docker run  -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=P@ssword1' `
--name SQLLinuxLocalPersist -d `
-p 1433:1433 `
-v sqldatavolume:/sqlserver/data `
mcr.microsoft.com/mssql/server:2019-CTP2.1-ubuntu

#Check it is running
docker ps -a

All going well, your container should be created and running:

The next step is to create the database from the backup, we will do this manually via Azure Data Studio (Other SQL Server Management GUI’s are available).

Simply connect to our locally created container. I have used the default SQL Server port of 1433 so we can use “Localhost”. As windows authentication is not currently supported, we have to use a SQL Login, in this example I’m using “sa”:

Next, we want to restore the DB:

You should now have a restored version of AdventureWorks2016CTP3 ready to use and abuse in Power BI. Simply setup a connection to (again) Localhost, no port mapping is required in this instance but if you had a different port to 1433 you’d have to use something like: Localhost,15105.

Once connected, you can play around with the data til your hearts content 🙂

There we have it, a fully working Power BI demo / sandbox environment ready for you to play around with. The longest aspect of this setup will be the downloading/installing of Power BI Desktop and DfWCE. Once setup, the scripting of creating the container can be done in a minute or two -AMAZING!!!!!

SQLNorthEast is now Newcastle DPaC! — November 21, 2018

SQLNorthEast is now Newcastle DPaC!

Having recently gone past our 5th birthday, we have decided to make some changes for 2019. Myself and Mike (t) have been extremely busy this year and perhaps not dedicated as much time to SQLNorthEast as we would have liked and have therefore made a decision to bring in additional help with the organising so I would like to announce two new additions to the committee in Glen Leck (t)and Mark Steinberg (t) – I hope they know what they’re letting themselves in for 😉

We are also in the process of re-branding the Usergroup from SQLNorthEast to Newcastle DPaC (Data Platform and Cloud) as we are not just SQL Server and cover most if not all areas of Microsoft’s Data Platform. The PASS website and associated details have also been amended and we have also setup a new Meetup which will be where our communications will be done from now on. Twitter has also been updated so please now use the handle @DPaCNewc.

This means that if you wish to find out the latest news and about the next meetup then please join our Meetup and make sure you follow us on twitter otherwise you may miss out!

Feel free to join our Newcastle DPaC slack workspace.

Our next meetup will be on Tuesday November 27th with Craig Porteous (b|t) and Paul Broadwith (b|t)talking Git Fundamentals & PowerShell Standards Agency. Both sessions are fantastic and definitely not one to be missed!!

Our schedule will be changing slightly to accommodate our new Power BI Meetup so we will be running these on alternate months:

Power BIThursday 17th January
DPaCTuesday 19th February
Power BITuesday 19th March
DPaCTuesday 16th April
Power BIWednesday 29th May
DPaCTuesday 18th June
Power BITuesday 16th July
DPaCTuesday 20th August
Power BITuesday 17th September
DPaCTuesday 15th October
Power BITuesday 19th November
DPaC/Power BI SocialTBC

The month’s without a date are up for grabs so get in touch with Chris Taylor (b|t)  or any of the other organisers if you wish to speak next year.

Newcastle Power BI Usergroup — November 4, 2018

Newcastle Power BI Usergroup

The people of the North East have asked and we have listened, introducing the Newcastle Power BI Usergroup

A newly created meetup spun off from Newcastle Data Platform and Cloud meetup to help people across the North East get up and running with Power BI by offering a huge range of talks from the very basics to some of the more advanced analytics. So whether you know nothing about Power BI and looking to learn or a seasoned Pro there will be something for everyone.

We will be running these meetup’s bi-monthly initially to gauge interest with an aim to have these running every month at Quorum Business Park.

With 25 new members within a couple of days of setting it up and before we even officially announced it, we are expecting great things from this new Usergroup 🙂

Join us on our Newcastle-PowerBI slack channel

 

 

Workaround for – The transaction log for database ‘master’ is full due to ‘NOTHING’ – Docker — April 24, 2018

Workaround for – The transaction log for database ‘master’ is full due to ‘NOTHING’ – Docker

As with anything open source you’re prone to things changing without you necessarily being aware and the error message in the blog title is something that has been sprung upon me whilst tidying up demo’s for my latest session on Docker and SQL Server On Linux. I hadn’t touched these particular demo’s since October 2017 so as to when things stopped working, I don’t know but they certainly worked the last time I did the session but doing a re-pull of the images, things broke.

With the micorosft/mssql-server-linux images, if you wish to mount a volume to a container then you will experience an error message I’ve never seen in SQL Server:

The transaction log for database ‘master’ is full due to ‘NOTHING’.

The transaction log for database 'master' is full due to 'NOTHING'

Apparently I’m not the only one experiencing this and there is an issue opened (Since October 2017!!!) on GitHub and social.msdn.microsoft.com describing the issue.

I have tested:

  • micorosft/mssql-server-linux:latest
  • micorosft/mssql-server-linux:2017-GA

and the issue was apparent on both images. The reason as highlighted on GitHub seems to be related to the filesystem, seeing as I’m running Docker for Windows I can’t change my file system from ext3 to ext4 like what others have done.

There is a workaround for this. The issue (certainly for me) only seems to be apparent if you try to mount a volume to a folder that already exists (ie. /var/opt/mssql/data/), mount to a new folder (ie. /sqlserver/data) then it will work:

Works:

docker run -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=P@ssword1' `
--cpus="2" `
--name SQLLinuxLocal1 -d -i `
-p 1433:1433 `
-v C:\Docker\SQLServer\Linux\SQLLinuxLocal:/sqlserver/data/ `
microsoft/mssql-server-linux:latest

Does not work:

docker run -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=P@ssword1' `
--cpus="2" `
--name SQLLinuxLocal2 -d -i `
-p 15106:1433 `
-v C:\Docker\SQLServer\Linux\SQLLinuxLocal:/var/opt/mssql/data/ `
microsoft/mssql-server-linux:latest

Or if you can, create a data volume instead and use that 😉

Once you have the files you need (ie. .bak file) in the /sqlserver/data folder within the container, you can then restore it using WITH MOVE to the /var/opt/mssql/data/ folder and this will work.

Introduction to Docker – Slides and Demo’s —
Delayed Durability in the wild… — November 14, 2016

Delayed Durability in the wild…

Background

We have recently been working on large data migration project for one of our clients and thought I would share how Delayed Durability helped us overcome a performance issue when the solution was moved to the client’s Development domain.

I won’t go into details of the project or the finer detail of our proposed solution as I have plans to put some more content together for that but in short the migration of the data was to be run by a (large) number of BIML generated SSIS (Child) packages for each table to be migrated, derived from a meta data driven framework with each stage being run by a master package, all of which run by a MasterOfMaster packages.

To maximize throughput, utilise as much processing power as possible, reduce the time it would take to run the migration and control the flow we built a series of sequence containers each running it’s own collection of Child Packages. We built the framework in such a way that these could be run in parallel or linear and each master package could contain as many containers (no pun intended) of child packages as required. This also allowed us to handle the order that packages were run in, especially those with dependencies whilst keeping the potential for parallelising (is that a word? No idea but I like it) the whole process as much as possible. Leaving the MaxConcurrentExecutables property to -1 mean’t we could push the processing to run up to 10 packages at once due to the VM having 8 cores (on Integration, 4 cores on Development) and this value of -1 allows the maximum number of concurrently running executables to equal the number of processors plus two.

An small example of how the MasterOfMaster and a Master Package for a stage looked is shown below:

Each container number could have Parallel and/or Linear Containers and both must succeed before the next Container level can start.

NOTE that this is just an example representation, naming conventions shown do not reflect the actual solution.

Problem

During development and initial testing on our own hardware, we had the migration at the time running at ~25minutes for around 600 packages (ie. tables) covering (what we termed) RawSource–>Source–>Staging which was well within the performance requirements for the stage that development was at and for what was initially set out. The rest of this blog post will hone in specifically on Source–>Staging only.

However, once we transferred the solution to the clients development environment things took a turn for the worse. In our environment we were running VMs with 8 cores, 16GB RAM and utlising SSDs. The client environment was running SQL Server 2016 Enterprise on VMWare vSphere 5.5, 8 vCPUs, 32GB RAM (for Integration, Development was half this) but the infrastructure team have done everything in their power to force all VMs onto the lower tier (ie. slow disks) of their 3-PAR SAN and throttle them in every way possible, just to make things more of a challenge. Even though the VM’s themselves were throttled we were confident that we wouldn’t see too much of a performance impact, especially as this was only a subset of the processing to be done so we needed it to be quick and it will only ever get longer and longer.

How wrong we were. On the first run the processing (for Source–>Staging) took around 141 minutes, yes you read that right, a full 116 minutes longer than the whole process took on our hardware! Wowza, didn’t see that one coming. I won’t delve too much into the investigations as again that will be saved for another blog post but essentially we were seeing a huge amount of the WRITELOG wait type since moving to the client environment. We believed the reason for this was due to the significant amount of parallel processing (running of SSIS packages in parallel loading to the same DB) we were doing and the SAN didn’t seem to be able to handle it. One other thing to note, due to truncations not being flagged as error’s in OLE DB Destination fast load data mode access, some of the packages that weren’t a direct copy where we knew the schema was exactly the same were run in non-fast load, ie row-by-row which puts additional stress on the system as a whole.

I will be blogging at a later date regarding how we managed to get everything running in fast load and handle the truncation via automated testing instead.

Solution

Enter Delayed Durability.

I won’t enter into too much detail regarding what this is or how it specifically works as this has been blogged by many others (Paul Randal, Aaron Bertrand to name just a couple) but my favourite description of delayed durability is comes from the msdn blogs and they refer to it as a “lazy commit“. Before you ask, yes we understood the issues of implementing such a change but the migration process was always a full drop and reload of the data so we didn’t care if we lost anything as we could simply run the process again.

Setting delayed durability at the database level we were able to control which Databases involved in the process we wished to have this without altering the BIML framework or code itself to handle it at the transaction level. By simply applying this to the Source and Staging databases we reduced the processing time from 141 minutes to 59 minutes. This wasn’t exactly perfect but shaving more than half the time off with one simple change and pushing the WRITELOG wait stat way way down the list was a great start.

As a side not, we have managed to get the processing from ~59mins to ~30mins without changing the VM/hardware configuration but I will leave that for another post.

Proof

When I first set out with this blog post it was only going to be a few paragraphs giving an insight into what we did however, I thought that all this would be pointless without some visualisation of the processing both before and after.

Row-by-Row with no Delayed Durability

We needed to get a baseline and where better to start than capturing the metrics through SentryOne and using Adam Mechanic’s spWhoIsActive we can see what I was talking about with the WRITELOG wait stat:

spwhoisactive_output

Granted the wait time themselves was relatively low, these were apparent almost every time we hit F5 and running our wait stat scripts in was in the top 3. A sample of the processing indicating this wait stat can also be seen below:

waitstats_writelog

As stated previously, overall the Source–>Staging process took 141 minutes and the overall processing from SentryOne PA was captured:

so_fullprocess_output

Row-by-Row with Delayed Durability

So when we ran the same process with Delayed Durability we can see straight away that the transactions/sec ramp up from ~7000 to ~12500. Top left shows without Delayed Durability, bottom left with Delayed Durability and right shows them side by side:

The overall process for Source–>Staging took only 59 minutes. I’ve tried to capture the before/after in the image below, the highlighted section being the process running with Delayed Durability forced:

SO_FullProcess_withDD_Output.png

You can see from this the drastic increase in Transactions/sec and reduction in Log Flushes.

Two package execution time examples (trust me that they are the same package) showing that with Delayed Durability the processing time was only 43% (166sec down to 72sec and 991sec to 424sec) ) of that without Delayed Durability set. Apologies for the poor image quality….

To me that is a huge reduction for such a simple change!

Conclusion

So should you go out and apply this to all your production databases right this second? No, of course you shouldn’t. We applied this change for to fix a very specific problem in an isolated environment and were willing to take the hit on losing data if the server crashed – are you, or more importantly your company willing to lose that data? I’m taking an educated guess that this will be a no but for certain situations and environments this configuration could prove to be very useful.

Links

Why would you never use SSIS Fast Load…? —

Why would you never use SSIS Fast Load…?

We all know that if you want SQL Server to push data into a table then you want to batch the inserts / use a bulk insert mechanism but is there a time when performance isn’t everything?

Background

Although it has its critics, SSIS is a very powerful tool for Extracting, Transforming and ultimately Loading data from and to various systems. I kind of have a love / hate relationship with SSIS, I love it but it seemingly hates me with a passion.

During a recent data migration project we had a series of packages using a stored procedure as the source and a SQL Server table as the destination. By using the OLE DB Destination task you have a series of options Data Access Modes which can provide various additional configurations. I won’t delve into all of these but have a look at the msdn link provided at the end for further information.

The ones I want to concentrate on are:

  • Table or view
  • Table or view – Fast Load

In short, fast load does exactly what it says on the tin, it loads data fast! This is because it is optimised for bulk inserts which we all know SQL Server thrives on, it isn’t too keen on this row-by-row lark.

Problem

Now, I won’t be providing performance figures showing the difference between running a package in fast load compared to row-by-row, this has been done to death and it is pretty much a given (in most cases) that fast load will out perform row-by-row.

What I do want to bring to your attention is the differences between the two when it comes to redirecting error rows, specifically rows that are truncated. One of the beauties of SSIS is the ability to output rows that fail to import through the error pipeline and push them into an error table for example. With fast load there is a downside to this, the whole batch will be output even if there is only 1 row that fails, there are ways to handle this and a tried and tested method is to push those rows into another OLE DB Destination where you can run them either in smaller batches and keep getting smaller or simply push that batch to run in row-by-row to eventually output the 1 error you want. Take a look at Marco Schreuder’s blog for how this can be done.

One of the issues we have exerienced in the past is that any truncation of a column’s data in fast load will not force the package to fail. What? So a package can succeed when in fact the data itself could potentially not be complete!?! Yes this is certainly the case, lets take a quick look with an example.

Truncation with Fast Load

Setup

I have provided a script to setup a table where we can test this. I will attempt through SSIS to insert data which is both below and above 5 characters in length and show the output.

USE tempdb;
GO

DROP TABLE IF EXISTS dbo.TruncationTest;
DROP TABLE IF EXISTS dbo.TruncationTest_error;

CREATE TABLE dbo.TruncationTest
(
TruncationTestID INT IDENTITY(1,1),
TruncationTestDescription VARCHAR(5)
)
GO

CREATE TABLE dbo.TruncationTest_error
(
TruncationTestID INT,
TruncationTestDescription VARCHAR(1000) --Make sure we capture the full value
)
GO

This code will set up 2 tables, one for us to import into (TruncationTest) and another to capture any error rows that we will output (TruncationTest_error).

I set up a very quick and dirty SSIS package to run a simple select statement to output 3 rows and use the fast load data access mode:

SELECT  ('123') AS TruncationTestDescription UNION ALL
SELECT  ('12345') UNION ALL
SELECT  ('123456789');

The OLE DB Source Editor looks like this:

sourceeditor

the OLE DB Destination data access mode:

desteditor_fastload

Finally, this is how the package looks:

package_fastload

Note the truncation warning. This is easy to see when viewing a package in Visual Studio, not so easy to pick up when you are dynamically generating packages using BIML.

Let’s run it……

package_fastload_success

Great, 3 rows populated into the TruncationTest table, everything worked fine! So let’s check the data:

SELECT * FROM dbo.TruncationTest

results_1

Eh? What happened there???? Where’s my ‘6789’ gone from row 3???

From this example you can see that the package succeeds without error and it looks as though all rows have migrated entirely but by querying the data after the package has completed you can see that the description column has indeed been truncated.

Let’s try the same test but changing the Data Access Mode to non-fast load (ie. Row-By-Row)

Truncation with row-by-row

In this example you can see that the row with truncation is in fact pushed out to the error pipeline as you would hope and expect.

desteditor_nonfastload

package_nonfastload_success

We now have 3 rows being processed but one row pushing out to the error pipeline which is what we would expect and hope for.

Let’s take a look at the output:

SELECT * FROM dbo.TruncationTest ORDER BY TruncationTestID
SELECT TruncationTestDescription FROM TruncationTest_error

results_2

The results highlighted in red are those from the fast load, in green are the results from the row-by-row indicating that the error row was piped out to the error table.

Solution(?)

You have a few different options here:

  1. Not really care and push the data through in fast load and suffer the concequences
  2. Run in row-by-row and suffer the performance hit
  3. Amend the OLE DB Source Output to be the same length as the destination column and redirect error rows from there.
  4. Probably loads of others involving conditional splits, derived columns and/or script tasks
  5. Apply option #1 and make sure that relevant (automated or otherwise) testing is applied

During the recent data migration project we were involved in we chose option #5. The reasons for this are:

  1. We wanted to keep the BIML framework, the code and the relevant mappings as simplistic as possible
  2. Performance was vital….
  3. …..but more importantly was the validity of the data we were migrating

We already had a series of automated tests setup for each package we were running and table we were migrating and we had to add to this a series of additional automated tests to check that no data itself was being truncated.

NOTE: Option #4 was also a very valid choice for us but due to the nature of the mapping between source and destination this was not something that was easily viable to implement.

I will leave the how we implemented these test this for another blog post 🙂

Conclusion

Taking a look at the error redirect in the OLE DB Destination we can clearly see that Truncation is greyed out and no option is provided so I have to assume that it simply isn’t an option to configure it here.

errorredirect

I used to have a link to an article which mentions that truncation cannot be deemed an error in a bulk import operation via SSIS due to the mechanics of how it all works but for the life of me I cannot find it :(. I am hoping someone who reads this will be able to provide me with this but for now I will have to draw my own conclusions from this. The closest thing I can find is an answer from Koen Verbeeck (b|t) in an msdn forum question where he states:

The only thing you get is a warning when designing the package.

You get truncation errors when you try to put data longer than the column width in the data flow buffer, i.e. at the source or at transformations, but not at the destination apparently.

What I still don’t understand is why in tSQL you will get an error when trying to “bulk insert” (loose sense of the term……ie. using an INSERT….SELECT) data that will truncate data but SSIS does not. Hopefully someone far cleverer than me will be able to shed some light on this!

The idea behind this blog post was not to focus too much on the importance of testing any data that is moved from one place to another but I wanted to highlight how easy it is to believe that what you are migrating is all fine n dandy because the SSIS package told you so but in actual fact you could be losing some very very important data!!

You have been warned 😉

Links