SQLGeordie's Blog

Helping the SQL Server community……where i can!

Microsoft Ignite 2020 – Book of News — September 24, 2020

Microsoft Ignite 2020 – Book of News

Microsoft Ignite 2020 is currently underway and if, like me, you can’t attend every session you would like to, well Microsoft have this covered with their Book of News.

There’s a ton of new stuff but this gives a great overview of all the latest information including:

Enjoy the conference and if you miss any of it, enjoy the Book of News 😃

Free DataSets – Kaggle — July 5, 2020

Free DataSets – Kaggle

Pretty much anything and everything that you would need for a SQL Server, Azure, AWS, GCP, Power BI etc demo dataset can be found on kaggle. The search facility is easy to use and you can also filter on the particular file type and license you want. I also like the “quicklook” option so you can quickly check to see if indeed it is the sort of data you want as opposed to having to download the data, open it and then realise it’s nothing like what you are after.

I never knew a lot about kaggle and to be honest, other than being able to get datasets from it, I never had a need to. Taken from wikipedia, “Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

It’s actually no surprise that they’re a subsidiary of Google when you look at most if not all of Google’s datasets, they all seem to link to kaggle….

I’ve included a couple of useful datasets that I’ve used in the past, hopefully you’ll find them useful also:

Back to dataset main page

Free DataSets – Google —

Free DataSets – Google

For most people on the planet with internet access, the first place to visit when looking for anything on the web is Google……….other search engines are available. Not only can you search using the standard search engine but they have various spin offs specific for searching datasets.

Google tracks dateset searches, allows you to explore these and provide the ability to download the data either direct from one of their charts or by providing back links to the site in which it came from. Chances are, kaggle will be one of them which is quite obvious when they are a subsidiary of Google LLC 🙂

There are various places within google to search the datasets, depending on what it is you’re looking for, below are some examples:

Back to dataset main page

Free DataSets…..for SQL Server, Azure, AWS, Power BI etc etc Demo Purposes —

Free DataSets…..for SQL Server, Azure, AWS, Power BI etc etc Demo Purposes

This is just a short overview page of various dataset sources I’ve used in the past for usage in my SQL Server, Azure, AWS and Power BI demo’s. Most if not all of these are free as I don’t like paying for stuff and there are some whereby you can generate your own data if there is something more specific you need……..just be careful as although its “randomly” generated, I have had issues where I pushed the files to github and as it happens, one of the email addresses randomly generated was that of an employee of a company which I’d never heard of. Pure chance (I believe) but something to be mindful of if that is your intention for use.


Kaggle

Pretty much anything and everything that you would need for a SQL Server, Azure, AWS, GCP, Power BI etc demo dataset can be found on kaggle. The search facility is easy to use and you can also filter on the particular file type and license you want. I also like the “quicklook” option so you can quickly check to see if indeed it is the sort of data you want as opposed to having to download the data, open it and then realise it’s nothing like what you are after.


Google

For most people on the planet with internet access, the first place to visit when looking for anything on the web is Google……….other search engines are available. Not only can you search using the standard search engine but they have various spin offs specific for searching datasets.


Power BI Sample DataSets

Not only datasets but also Power BI sample pbix files for anyone looking to get started with it.


GenerateData – Create your own

If there is a specific dataset you are looking for, with specific column names and data types then this is a useful tool. Data is restricted to 100 rows initially unless you sign up / donate but worth a look:


data.gov.uk

At the time of writing there are 54,846 datasets available on here. Some are a bit bizarre (See below) and are specific to the UK so unless you’re looking for that kind of thing, it may not be the place for you.

Other very specific examples (apart from that below) I’ve found are “NHS Bassetlaw CCG Expenditure over £25K” and “Planned road works on the HE road network“………..


Super Bizarre DataSets

I’ve not used any of these but saw this article via Brent Ozar’s weekly newsletter – https://thestacker.com/. Some of them sound pretty cool, others are a bit out there such as: “Abandoned Shopping Trolleys in Bristol Rivers” from data.gov.uk and “The Great British Toilet Map“…….

MSBuild 2020 – Build Book of News — May 20, 2020
“Kubernetify” your Containers — February 23, 2020

“Kubernetify” your Containers

Adding the link to github which contain the slides and demo’s from the various events I have delivered this session to:

Github – Kubernetify Your Containers

To see them all I have given the root folder and if you search for “kubernetify” you should see everything needed 👍

Abstract

We have all now had a play around with Docker and Containers or at least heard about them.

This demo heavy session will walk through some of the challenges around managing container environments and how Kubernetes orchestration can help alleviate some of the pain points.

We will be talking about what Kubernetes is and how it works and through the use of demos we will:

  • Highlight some of the issues with getting setup (Specifically Minikube on Ubuntu),
  • Deploying/Updating containers in Kubernetes (on-Prem as well as AKS using Azure DevOps)
  • Persisting data
  • How to avoid making the same mistakes as I have
Database CI/CD with Containers and Azure DevOps — September 29, 2019
AKS SQL Server Error – 0/1 nodes are available: 1 node(s) exceed max volume count — September 21, 2019

AKS SQL Server Error – 0/1 nodes are available: 1 node(s) exceed max volume count

Background

Whilst playing around with my session for Techorama.nl I encountered an error I hadn’t seen previously whilst deploying SQL Server on Linux in Azure Kubernetes Service (AKS)

0/1 nodes are available: 1 node(s) exceed max volume count

The yaml I used was only slightly modified (mainly names) from scripts used on minikube and docker-desktop so I was a little confused as to why I was getting this in AKS.

As it happens, the reason is because I am tight and don’t like spending money! During testing etc I drop my AKS node size to as small as I can have it, in this case it was a Standard_B2s (2CPU / 4GB) which previously I’ve never had issues with until this particular demo.

When playing around with AKS you may have used a single PersistentVolume (or no volumes at all) but this particular setup had:

  • 1 for systemdbs
  • 1 for sql data files
  • 1 for sql log files

Which if you can do maths, equals 3 files. Which is fine for this particular Azure VM size as you can attach 4 disks. However, the issue arises once you start adding additional deployments with the same setup but in a different namespace. This would take me over the threshold of the 4 allowed disks and gives you the error that you have exceeded the max volume count 😦

So how do you fix this?

The options are to either scale up your VM size or alter your deployment to use fewer disks. In my case I could get away with only having 1 disk for systemdbs/data/logs. This is a demo environment so I can do this 🙂

Azure DevOps – Job Dependencies — March 12, 2019

Azure DevOps – Job Dependencies

I wanted to through together a quick post here to help out others who may be having issues with running multiple jobs in a build pipeline and there being no consistency in what order they run in.

The Problem

When I first started with VSTS and ultimately Azure DevOps, I went through many failed builds because the order of the jobs in your pipeline don’t run in the order that you’ve built them and how you would logically believe them to run. The image below shows two Build Pipeline jobs but when the build is queued, whether this be manual or via CI, the second job is running before job #1. In this example the build will fail because Job #2 is to deploy a dacpac to a SQL Server on Linux Docker Container (Using Ubuntu Agent Host) but obviously this cannot be done until the dacpac has been created in Job #1 which is running on a VS2017 Agent Host:

1 – Job running in wrong order

The reason for this is that when you specify multiple jobs in a build pipeline , by default they run in parallel. In my experience I never found that both jobs ran in parallel, always one after the other so it doesn’t seem to quite match what the Microsoft docs states but it’s not something I’ve ever spent the time investigating further.

This can obviously be very frustrating especially as (from my testing) there is no consistency in which order they run but I did find that cancelling a build or re-running a build straight after a failure seemed to through it out of sync whereas previously it was all running in the correct order.

The Fix

To stop this sporadic order of job running you can set Job dependencies in Azure DevOps. The process is so simple to setup and if you didn’t know about it and have been tearing your hair out over failed builds due to ordering then you’re going to kick yourself when you see the simplicity.

All you need to do is select the job you wish to start after another (other specific conditions can be applied) job completes and scroll to Dependencies, click the drop down and (in this example there is only one) the job you want to depend on will be in the list:

Select Job Dependency

That’s it, couple of clicks and your job ordering is sorted 😉

CI/CD with Docker and Azure DevOps – Part 2 (Creating an Azure DevOps Pipeline) — February 14, 2019

CI/CD with Docker and Azure DevOps – Part 2 (Creating an Azure DevOps Pipeline)

For video demos please see: Database CI/CD with Containers (Docker) and Azure DevOps (Demo’s – YouTube)

Introduction

In part 1 of this series we went about setting up our Azure DevOps account, creating a project and adding a Database Project to it. In Part 2 we will look to run through creating a new build pipeline creating a new Docker Image and pushing it to DockerHub.

NOTE: In Part 3 we will change to using Microsoft Container Registry (MCR) for two reasons:

  • To highlight the issues with using DockerHub with Azure DevOps
  • Because we can 🙂

Before we begin creating our build pipeline, it is advised that a Service Connection to Docker Hub (we will also be creating one for Kubernetes in Part 2) is created. This means we aren’t entering passwords / other secure information into our YAML file.

We can do this by selecting Service Connections from the Project Settings. From the image below you can see that there are a large variety of service connections that can be created but we will be choosing Docker Registry:

Simply give the connection a name and enter the Docker ID and Password:

NOTE: once you have created this you need to re-save your pipeline before it will work. This Resource Authorization link provides more information but Microsoft are working improving the user experience on this.

Now that we have created the service connection, we can now look to create our Build Pipeline. Select Builds from the menu on the left and click “New Pipeline”

Select the repository where the project you wish to build resides, in our case it is Azure Repos:

Select the project we have created – TestProj”:

You will now be presented with the option to use a Build Pipeline template or start from scratch.

One of the templates is “Docker Image” so we will choose that one:

This will auto generate some YAML code to get you started:

As we are using DockerHub as opposed to MCR we have to make a change to the azure-pipelines.yml file which will be used.

This link provides more information but in short we need to change the filename:

If you have a Docker Hub account, and you want to push the image to your Docker Hub registry, use the web UI to change the YAML file in the build pipeline from azure-pipelines.yml to azure-pipelines.docker.yml. This file is present at the root of your sample repository.

https://docs.microsoft.com/en-gb/azure/devops/pipelines/languages/docker?view=azure-devops&tabs=yaml#example

Once you have made the change, annoyingly you don’t seem to be able to exit from the file with a simple “Save”, you have to “Save And Run”, which will initiate a failed build.

You can pull the latest changes locally and view/change the file in VS if you prefer:

NOTE: You will also need to update the Pipeline to use the new file. You can do this using the Visual Editor:

So, we now have our YAML file name updated and commited as well as the build pipeline updated to use it. However, before we proceed we need an actual Docker image and push that to our Docker Hub repo

Pull latest SQL Server 2019 on Linux image to local repository:

docker pull mcr.microsoft.com/mssql/server #This will pull the latest version 

Now push this image up to Docker Hub giving it the tag “testprojsql2019“:

docker tag mcr.microsoft.com/mssql/server:latest sqlgeordie/azuredevops:testprojsql2019
docker push sqlgeordie/azuredevops:testprojsql2019

Using VSCode for output:

We’re not quite ready to run our build, the build pipeline doesn’t create a Dockerfile so we need to create this ourselves. If we don’t we get this error:

"unable
to prepare context: unable to evaluate symlinks in Dockerfile path:" lstat
/home/vsts/work/1/s/Dockerfile: no such file or directory

Dockerfile

FROM sqlgeordie/azuredevops:testprojsql2019
RUN mkdir -p /usr/src/sqlscript
WORKDIR /usr/src/sqlscript
CMD /bin/bash  

Now, we have to amend the YAML file to login in to DockerHub for us to be able to pull down the image in order to build using the Dockerfile. You will notice in the image below that i have highlighted “build an image”, the reason for this is relevant in the next section.

Build input:

steps:
 task: Docker@1 
 displayName: 'Build an image'
 inputs:
 containerregistrytype: 'container Registry'
 dockerRegistryEndpoint: sqlgeordiedockerhub
 imageName: 'sqlgeordie/azuredevops:$(Build.BuildId)'
 command: build an image
 dockerFile: '**/Dockerfile' 

Login input:

 task: Docker@1
 displayName: Login
 inputs:
     containerregistrytype: 'container Registry'
     dockerRegistryEndpoint: sqlgeordiedockerhub
     command: login 

Push Input:

task: Docker@1
   displayName: 'Push an image'
   inputs:
     command: push an image
     imageName: 'sqlgeordie/azuredevops:$(Build.BuildId)' 

There are examples on GitHub docs which have (in my opinion) errors. For example, I mentioned earlier that I highlighted “build an image” for a reason, that reason is that it is incorrectly stated as “build” (also the same for “push”) on GitHub and this gives errors.

Complete YAML File

#Docker image
#Build a Docker image to deploy, run, or push to a container registry.
#Add steps that use Docker Compose, tag images, push to a registry, run an image, and more:
#https://docs.microsoft.com/azure/devops/pipelines/languages/docker

trigger:
 - master 
pool:
 - vmImage: 'Ubuntu-16.04'
steps:
 - task: Docker@1 
   displayName: 'Build an image'
   inputs:
     containerregistrytype: 'container Registry'
     dockerRegistryEndpoint: sqlgeordiedockerhub
     imageName: 'sqlgeordie/azuredevops:$(Build.BuildId)'
     command: build an image
     dockerFile: '**/Dockerfile'
 - task: Docker@1
   displayName: Login
   inputs:
     containerregistrytype: 'container Registry'
     dockerRegistryEndpoint: sqlgeordiedockerhub
     command: login
 - task: Docker@1
   displayName: 'Push an image'
   inputs:
     command: push an image
     imageName: 'sqlgeordie/azuredevops:$(Build.BuildId)' 

The “incorrect” example in the docs is:

- task: Docker@1
     displayName: Build image
     inputs:
     command: build
          azureSubscriptionEndpoint: $(azureSubscriptionEndpoint)
          azureContainerRegistry: $(azureContainerRegistry)
          dockerFile: Dockerfile
          imageName: $(Build.Repository.Name) 

The strange thing is that if you edit the file directly online there is no error:

However, if you edit the pipeline you see the red syntax error “squiggle“:

Please make sure you change this otherwise you will receive and error in your build.

To get back on track, in theory, we should now be able to run the build which will pull the image from our DockerHub repository, and initiate building a new Docker image from our Dockerfile (granted a very basic build) and push it back up to DockerHub

We can now check the image exists in Docker Hub:

Pull it down:

Check local images:

There we have it, we have successfully built a new Docker image from a Dockerfile which resides on DockerHub and pushed the newly created image back to DockerHub.

In the Part 3 we will look to expand on this and look to incorporate it into the TestProj we created in Part 1 and show how we can push changes made to our TestDBProj to Azure DevOps to initiate a build process to provide a new Docker Image with our changes applied.

I have created some videos (no audio I’m afraid) of this process which was used as part of my sessions at Data In Devon and DataGrillen earlier this year. I will be looking to replace these with either a full blog post (ie. Part 3) or perhaps re-record the videos with audio.