SQLGeordie's Blog

Helping the SQL Server community……where i can!

tempdb – size matters! — January 9, 2022

tempdb – size matters!

tldr: Over the years I’ve read a lot of blog posts and watched a lot of videos where they mention that you should have your tempdb files all the same size. What I haven’t seen much of (if any) is what performance impact you actually see if they are not configured optimally. This blog post aims to address that 😉

Introduction

We had a customer experiencing significant performance issues leading to application timeouts (30s) so called on Datamasterminds to investigate. Although this wasn’t a constant performance issue and only ever seen in extremity very infrequently, we were fortunate enough that they had invested in a SQL Server monitoring tool (SQL Sentry) which captured the historical performance.

Analysis

Looking at various noted dates/times where they had encountered performance issues and an example is shown below:

It is obvious to see that it wasn’t pretty during these times, very high PAGELATCH_UP (tempdb PFS) and the dreaded THREADPOOL waits can be seen so we got to work looking at setting up additional monitoring and analysing the database(s) and queries which were running at the time. Long story short, this lead to there being a select few stored procedures creating/dropping a lot of temporary objects and in some cases running a very high number of inserts/deletes to them in loops/cursors. With the queries continuing to come in, the wait times get higher ultimately leading to them being queued and eventually hitting the 30s application timeout.

A good explanation of what Page Free Space (PFS) is can be found over at microsoft docs.

At the time, there were 8 tempdb files but all are different sizes so the usage was skewed. This is because of SQL Server’s proportional fill algorithm where it will try (more often than not) to write to the file with the most free space. In this case, as file id 1 was significantly larger (117GB) than any of the others (25-50GB) so it was the “defacto standard” when writing to tempdb. Ultimately, causing the contention we were seeing.

Tempdb Usage during high wait times can be seen below (taken from SQL Sentry), note the variation in Read/Writes to each file as well as the size differences:

Below is some of the output from sp_whoisactive during the high PAGELATCH_UP wait times. You will see the majority relate to the INSERTS and DELETES to temporary objects…..all in tempdb file id 1.

NOTE: This is just a snippet of the output, the number of queries was in the 1000’s 😲

The Fix

The interim fix was very straight forward, simply resize the tempdb files to be the same size and the proportional fill algorithm worked far better 💪

We’re still working with the customer on the performance tuning efforts to reduce resource usage and contention seen throughout.

The following list from microsoft explains how increasing the number of tempdb data files that have equal sizing reduces contention:

  • If you have one data file for the tempdb, you only have one GAM page, and one SGAM page for each 4 GB of space.
  • Increasing the number of data files that have the same sizes for tempdb effectively creates one or more GAM and SGAM pages for each data file.
  • The allocation algorithm for GAM allocates one extent at a time (eight contiguous pages) from the number of files in a round robin fashion while honouring the proportional fill. Therefore, if you have 10 equally sized files, the first allocation is from File1, the second from File2, the third from File3, and so on.
  • The resource contention of the PFS page is reduced because eight pages at a time are marked as FULL because GAM is allocating the pages.

Hopefully this blog post gives you an insight into what sort of issue you can see if you don’t take the advice of Microsoft, Consultants or indeed anyone telling you to size all your tempdb files the same 🤔

DBCC ShrinkFile Error Message: File ID 1 of database ID ‘nn’ cannot be shrunk as it is either being shrunk by another process or is empty —

DBCC ShrinkFile Error Message: File ID 1 of database ID ‘nn’ cannot be shrunk as it is either being shrunk by another process or is empty

We’re currently working with a customer on an Archiving project and as part of it trying to reduce their 8.5tb database down to where it should/needs to be (~5tb) in order to be able to restore it in their dev/test environments. Unfortunately adding more disk space in these environments is not an option so as we remove data we are forced to shrink the database files.

I’ll be posting some issues you may encounter when trying to shrink a very large database filled with a ton of (B)LOB data but this post focuses on and issue experienced whilst trying to get the shrink to run.

Although we haven’t pinpointed the cause 100% yet, occasionally there were times when the shrink process would just bomb out after ~2mins with the message below:

File ID 1 of database ID 10 cannot be shrunk as it is either being shrunk by another process or is empty. [SQLSTATE 01000] (Message 5240)

This is usually a message seen when a backup is currently running against that database but in this case it was not. There is a coincidence where the last full backup that run was taking a significantly longer time that it should and this shrink process was trying to run (and bombing out) so it could be linked but to get the process running again there is a little trick you can use and that is to simply increase the database size by “a very small amount”.

In our case we just used 1MB and the script is below:

USE DBNameHere;
GO

SELECT DB_NAME() AS DbName, 
    name AS FileName, 
    type_desc,
    size/128.0 AS CurrentSizeMB,  
    size/128.0 - CAST(FILEPROPERTY(name, 'SpaceUsed') AS INT)/128.0 AS FreeSpaceMB
FROM sys.database_files
WHERE type IN (0,1)
AND DB_NAME() = 'DBNameHere'
GO

-- Use the CurrentSizeMB and add 1MB
USE [master]
GO
ALTER DATABASE [DBNameHere] MODIFY FILE ( NAME = N'DBNameHere', SIZE = 7820225MB )
GO

Hopefully this will be useful for others that may be stuck in a similar situation.

Should I split my SQL Server Drives on a SAN in 2021? — February 7, 2021

Should I split my SQL Server Drives on a SAN in 2021?

NOTE: This blog post references HPE as our example but is relevant to other storage vendors out there. I am in no way affiliated with HPE 😉

Back in the day, “when I was a lad“, the recommendation for SQL Server was to split your data, logs and tempdb files onto separate drives/luns to get the most out of your storage. Jump forward to 2021, is this still relevant and should I be splitting my SQL Server drives on to separate luns on new SAN storage? A question which is often asked not just by customers as well as their 3rd party managed service providers / hosting companies. This question can also be along the lines of, “Why can’t we just put everything on a C:\ because the backend is all on the same lun“. This is slightly different as they’re questioning the drive lettering more than creating separate luns but still relevant to this topic.

The majority (not all) of SQL Servers will have a SAN to host its storage and SANs these days are super duper quick, especially those that have tiered SSD or even fancier, flash storage. The technical differences between the older spinning rust and the new fancy dan flash storage is not something we’ll delve into as there’s plenty of other blogs out there and not really the scope of this blog post.

Each storage vendor will (should) provide their own documentation specific to how the SAN should be configured for SQL Server to get the best bang for your buck. Taking HPE as an example, they have pdf’s for their various offerings including 3PAR/Primera as well as Nimble. Although there are some slight differences, each of them suggest that you SHOULD split your drives onto separate volumes.

I won’t disect the documents in their entirety but some of the sections which will help with answering the question but these mostly relate to which performance policy to set for your data, logs and tempdb based on the workload (ie. OLTP / OLAP and size of files):

  1. Array Snapshots and remote replication
    • You may not (won’t) want tempdb as part of this due to its large rate of data change
  2. Volume Block Size
    • According to the documentation, depending on the workload, you may (or may not?) want 8kb for data and 4kb for logs as per their default policy
  3. Caching
  4. Deduplication
  5. Compression
  6. Number of disks available

To provide a great overview, below is a snippet from the HPE Nimble recommendations:

Storage Layout for SQL Server Volumes
In general, using multiple volumes for SQL Server deployments offers several advantages:
The ability to apply custom performance policy settings to specific volumes
• The ability to control cache use for volumes on adaptive flash arrays
• The choice of limiting replication of certain volumes, such as the tempdb volume
• A gain in performance for I/O-intensive databases, especially if the queue depth of a device is at risk of
becoming saturated
• The ability to group databases in separate volume collections and apply custom backup and replication
schedules
• The ability to quickly recover specific databases with volume-based restores
Before designing the storage layout, carefully consider the administrative overhead of managing multiple
volumes. The ideal solution provides a balance between optimization and administration.

Allocation Unit Size

Something that often comes up during these conversations is the configuration of the volume formatting. Regardless of the chosen Performance Policy and indeed volume block size, the default recommendation from HPE is to use 64kb for your data, logs and tempdb. This is a recommendation, only testing for your specific environment will truly give you the answer as to what allocation unit size to set.

Additional Information

Below are some further snippets from the HPE documentation regarding default configurations:

HPE 3PAR/Primera:

Nimble:

Comparison of SSD / tiering uses for SQL Server files:

Conclusion

Should you split your SQL Server drives in 2021? The Nimble documentation gives a sentence which sum’s it up very well:

The ideal solution provides a balance between optimization and administration.

Having everything thrown into a single pool will make life a lot easier for the SAN guy, splitting things up could lead to an administrative nightmare but may be required to get the best out of it for your workload.

What I will say is, review your storage vendor’s documentation / recommendations, compare it with what sort of environment / workload you have and if it fits your specific setup, configure it, test it and use it as a baseline to compare with other configurations.

If testing proves that their guide/recommendations do not provide optimal performance for you environment then it is perfectly fine to deviate from their suggested recommendations, as long as the admin overhead is manageable. All I would add to this is that whatever the configuration decision made is, make sure it is fully documented as to why this has been done as someone several years later will no doubt ask the question as to why 🙂

Docker in 12 Steps – YouTube — December 30, 2020

Docker in 12 Steps – YouTube

I have eventually got around to start tidying up my Youtube channel since I moved it and the process made a proper mess of everything 🤦‍♂️

Want to get started with Docker, containers and even using SQL Server on Linux in just 12 easy , hands on steps (short videos)? If you’ve moved onto this sentence then the answer to the previous question must have been YES!

Have a look, I’ve purposely kept them as short as possible, the total time for all 12 videos is less than 90 minutes so there really is no excuse 😉

SQLGeordie – Docker in 12 Steps

Creating a Power BI demo environment with SQL Server 2019 on Linux in a Docker Container as the data source. — December 5, 2018

Creating a Power BI demo environment with SQL Server 2019 on Linux in a Docker Container as the data source.

I (and many others) have done a series of docker blog posts over the last couple of years but they’ve all tended to evolve around spinning up a SQL Server 2017+ container for testing or demo purposes. This is only really the start, think of the bigger picture here, once you have your database environment the world is your oyster.

This blog post will show how we can use SQL Server 2019 CTP2.1 running on Linux (Ubuntu) in a docker container as our data source for a Power BI environment in next to no time!

These steps show a very manual process for completing this setup, if it is something you are looking to do frequently then I suggest creating a Dockerfile and/or yml file and use docker-compose. This way you can have all your setup in one file and it will be a single statement to get your SQL Server 2019 environment up and running.

Prerequisites:

Setup

Once you have Power BI installed you will have to enable Hyper-V windows feature before installing Docker for Windows. If you don’t, the docker install will ask you if you wish to enable Hyper-V but in my experience this has rarely (ie. never) worked well – could just be me but I play it safe and enable it before installing. Running Linux containers on Windows 10 inevitably means that the Container will run a Isolated / Hyper-V Container which basically runs a cut down, lightweight Hyper-V VM.

Once DfWCE is installed, you will have to modify the Memory settings to be at least 3250MB. By default this is 2048MB but this isn’t enough to run SQL Server On Linux Containers and needs to be amended.

 

Make sure this is min 3250MB

Once configured, we’re now ready to create our SQL Server 2019 CTP2.1 on Ubuntu. As we are going to be creating a container with a database that we can use the data from (ie. AdventureWorks2016CTP3), we’re first going to create a data volume and copy the bak file (from C:\Docker\SQLServer\Backups) to it so it is available to restore:

#Create dummy container to define and copy backup file
docker container create --name dummycontainer -v sqldatavolume:/sqlserver/data/ sqlgeordie/sqlonlinuxdemo:demo

#Check that the container exists
docker ps -a --format $psformat

#Copy AdventureWorks or whatever you like ;)
docker cp C:\Docker\SQLServer\Backups\AdventureWorks2016CTP3.bak dummycontainer:/sqlserver/data/AdventureWorks2016CTP3.bak

#Check what volumes you have
docker volume ls

#Don't need the dummycontainer anymore so get rid
docker rm dummycontainer

NOTE: We have “created” a container but don’t need to run it so the status will indicate that it is “Created” and not the up-time:

You may be asking why I’ve created a datavolume as opposed to just bind mounting my local folder to the container – good question. There are many blog posts out there explaining the pros/cons of both but for me the biggest reason is that mounting a host folder goes against everything Containers are about – portability. The container shouldn’t care about where it is or anything outside of it, bind mounting means that wherever this container is created, you will need that host folder setup and available. Using a data volume means we can reuse this for other containers, even on another host if we were in an environment that has access.

Have a read here and here for more information.

Now we have our data volume with a copy of AdventureWorks2016CTP3.bak, we can proceed to creating the container:

#Create SQLLinuxLocalPersist container
docker run  -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=P@ssword1' `
--name SQLLinuxLocalPersist -d `
-p 1433:1433 `
-v sqldatavolume:/sqlserver/data `
mcr.microsoft.com/mssql/server:2019-CTP2.1-ubuntu

#Check it is running
docker ps -a

All going well, your container should be created and running:

The next step is to create the database from the backup, we will do this manually via Azure Data Studio (Other SQL Server Management GUI’s are available).

Simply connect to our locally created container. I have used the default SQL Server port of 1433 so we can use “Localhost”. As windows authentication is not currently supported, we have to use a SQL Login, in this example I’m using “sa”:

Next, we want to restore the DB:

You should now have a restored version of AdventureWorks2016CTP3 ready to use and abuse in Power BI. Simply setup a connection to (again) Localhost, no port mapping is required in this instance but if you had a different port to 1433 you’d have to use something like: Localhost,15105.

Once connected, you can play around with the data til your hearts content 🙂

There we have it, a fully working Power BI demo / sandbox environment ready for you to play around with. The longest aspect of this setup will be the downloading/installing of Power BI Desktop and DfWCE. Once setup, the scripting of creating the container can be done in a minute or two -AMAZING!!!!!

Docker, Kubernetes and SQL Server on Linux series — June 20, 2018

Docker, Kubernetes and SQL Server on Linux series

DockerBuildShipRunAnywhere

I’ve been playing around with Docker (more specifically with SQL Server on Linux) the beginning of 2017 to see if indeed I can “Build, Ship, Run, Any App Anywhere” and will be putting together a series of how-to’s and issues experienced so you guys don’t spend the countless hours trying to investigate and work out what is going on when things do not work.

I’ll be adding links to this main page as and when new posts are added as a central point of reference.

Workaround for – The transaction log for database ‘master’ is full due to ‘NOTHING’ – Docker — April 24, 2018

Workaround for – The transaction log for database ‘master’ is full due to ‘NOTHING’ – Docker

As with anything open source you’re prone to things changing without you necessarily being aware and the error message in the blog title is something that has been sprung upon me whilst tidying up demo’s for my latest session on Docker and SQL Server On Linux. I hadn’t touched these particular demo’s since October 2017 so as to when things stopped working, I don’t know but they certainly worked the last time I did the session but doing a re-pull of the images, things broke.

With the micorosft/mssql-server-linux images, if you wish to mount a volume to a container then you will experience an error message I’ve never seen in SQL Server:

The transaction log for database ‘master’ is full due to ‘NOTHING’.

The transaction log for database 'master' is full due to 'NOTHING'

Apparently I’m not the only one experiencing this and there is an issue opened (Since October 2017!!!) on GitHub and social.msdn.microsoft.com describing the issue.

I have tested:

  • micorosft/mssql-server-linux:latest
  • micorosft/mssql-server-linux:2017-GA

and the issue was apparent on both images. The reason as highlighted on GitHub seems to be related to the filesystem, seeing as I’m running Docker for Windows I can’t change my file system from ext3 to ext4 like what others have done.

There is a workaround for this. The issue (certainly for me) only seems to be apparent if you try to mount a volume to a folder that already exists (ie. /var/opt/mssql/data/), mount to a new folder (ie. /sqlserver/data) then it will work:

Works:

docker run -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=P@ssword1' `
--cpus="2" `
--name SQLLinuxLocal1 -d -i `
-p 1433:1433 `
-v C:\Docker\SQLServer\Linux\SQLLinuxLocal:/sqlserver/data/ `
microsoft/mssql-server-linux:latest

Does not work:

docker run -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=P@ssword1' `
--cpus="2" `
--name SQLLinuxLocal2 -d -i `
-p 15106:1433 `
-v C:\Docker\SQLServer\Linux\SQLLinuxLocal:/var/opt/mssql/data/ `
microsoft/mssql-server-linux:latest

Or if you can, create a data volume instead and use that 😉

Once you have the files you need (ie. .bak file) in the /sqlserver/data folder within the container, you can then restore it using WITH MOVE to the /var/opt/mssql/data/ folder and this will work.

Introduction to Docker – Slides and Demo’s —
Docker and SQL Server – Useful links — July 23, 2017

Docker and SQL Server – Useful links

Docker

Below are a collection of links I’ve used as reference throughout my journey of Docker and running SQL Server containers. These have been invaluable in knowledge gathering and troubleshooting throughout this (at times) painstaking process.

First and foremost I would like to put a huge shout out to Andrew Pruski (b|t) for his Container Series which if you haven’t had a look at then I suggest you do, like now! He has put together some great content for anyone looking to get started with Containers.

I have tried to section them to make things easier to refer to and more than happy for others to suggest others to add:

Getting Started with Docker:

Tutorials

SQL Server on Linux:

Use cases:

General:

Licensing:

Installing:

Setup on Windows 10 VM:

SQL Server On Linux:

Connecting to SQL Server via sqlcmd:

Legacy SQL Server Containers

Performance

Issues and Troubleshooting

Using Volumes

Hyper-V containers

Videos

Output SQL Server data from multiple tables to Tab Delimited text files using Powershell — December 20, 2013

Output SQL Server data from multiple tables to Tab Delimited text files using Powershell

tabI had a request this morning for something I though was actually very simple:

Client: “Can you extract all data for these particular tables including column headers to a tab delimited .txt file?”
Chris: “Sure, no problem, I’ll just run bcp querying sys.tables using a COALESCE loop to output the statements”
Client: “Top stuff, let me know when it’s done”

So, away I went generating my script which took a matter of minutes and run it…….where’s the column headers? Bugger, forgot that bcp doesn’t output column headers without doing some funky stuff by creating a header record in a separate file and merging that with the file of data.

With this in mind I knew creating a SSIS package (or using export data to generate – very manual unless I delved into the realms of BIML) could do this but I thought I’d have a look at powershell invoking sqlcmd.

Again, this all seemed to be going very well until I came to outputting the data to a tab delimited .txt file. As far as I’m aware Powershell does not have an Export-Txt so I had to look into how I can use the Export-Csv to actualy output to .txt tab delimited as opposed to comma separated and found the parameter -delimiter “`t” – Excellent!!! Added this in and run the script…………and the first row consisted of “#TYPE System.Data.DataRow” – wft!?!?!?!?!

Quick search on my search engine of choice showed that there is a parameter that you can pass in to remove this from the export -NoTypeInformation.

Run it again with -NoTypeInformation and everything worked as expected apart from all column headers and data had quotes (“) around them which was not part of the requirement. Unfortunately (as far as I know) there is no switch, parameter or the likes that does this so I had to change the Export-Csv to ConvertTo-Csv and run a Replace on ‘”‘ with ” which managed to do the trick.

I’ve included the script below which can be tailored to your needs:

$server = 'ServerInstanceHere'
$database = 'DBNameHere'
$path = 'c:\work\ToDelete\'
$query = "SELECT name FROM sys.tables WHERE name in (
 'TableNameHere_1',
 'TableNameHere_2'
 --etc etc
 )"
$queryToOut = "SELECT * FROM $TableName"

#Get list of table names to output data
$Tables = invoke-sqlcmd -query $query -database $database -serverinstance $server
foreach ($Table in $Tables)
{
 $TableName = $Table["name"]
 write-host -ForegroundColor Green "Creating File $TableName.txt"
 invoke-sqlcmd -query $queryToOut -database $database -serverinstance $server | `
 #Convert as opposed to Export to replace quotes if required
 ConvertTo-Csv -NoTypeInformation -delimiter "`t" | `
 ForEach-Object {$_ -Replace('"','')} | `
 Out-file $path$TableName.txt
 #Export-Csv -NoTypeInformation -delimiter "`t" -path $path$TableName.txt
}

Apologies for the formatting but the powershell script tag doesn’t seem to format it the way I’m wanting it to so here is a screen shot of the code:

POSH_TAB