SQLGeordie's Blog

Helping the SQL Server community……where i can!

Free DataSets – Kaggle — July 5, 2020

Free DataSets – Kaggle

Pretty much anything and everything that you would need for a SQL Server, Azure, AWS, GCP, Power BI etc demo dataset can be found on kaggle. The search facility is easy to use and you can also filter on the particular file type and license you want. I also like the “quicklook” option so you can quickly check to see if indeed it is the sort of data you want as opposed to having to download the data, open it and then realise it’s nothing like what you are after.

I never knew a lot about kaggle and to be honest, other than being able to get datasets from it, I never had a need to. Taken from wikipedia, “Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

It’s actually no surprise that they’re a subsidiary of Google when you look at most if not all of Google’s datasets, they all seem to link to kaggle….

I’ve included a couple of useful datasets that I’ve used in the past, hopefully you’ll find them useful also:

Back to dataset main page

Free DataSets – Google —

Free DataSets – Google

For most people on the planet with internet access, the first place to visit when looking for anything on the web is Google……….other search engines are available. Not only can you search using the standard search engine but they have various spin offs specific for searching datasets.

Google tracks dateset searches, allows you to explore these and provide the ability to download the data either direct from one of their charts or by providing back links to the site in which it came from. Chances are, kaggle will be one of them which is quite obvious when they are a subsidiary of Google LLC 🙂

There are various places within google to search the datasets, depending on what it is you’re looking for, below are some examples:

Back to dataset main page

Free DataSets…..for SQL Server, Azure, AWS, Power BI etc etc Demo Purposes —

Free DataSets…..for SQL Server, Azure, AWS, Power BI etc etc Demo Purposes

This is just a short overview page of various dataset sources I’ve used in the past for usage in my SQL Server, Azure, AWS and Power BI demo’s. Most if not all of these are free as I don’t like paying for stuff and there are some whereby you can generate your own data if there is something more specific you need……..just be careful as although its “randomly” generated, I have had issues where I pushed the files to github and as it happens, one of the email addresses randomly generated was that of an employee of a company which I’d never heard of. Pure chance (I believe) but something to be mindful of if that is your intention for use.


Kaggle

Pretty much anything and everything that you would need for a SQL Server, Azure, AWS, GCP, Power BI etc demo dataset can be found on kaggle. The search facility is easy to use and you can also filter on the particular file type and license you want. I also like the “quicklook” option so you can quickly check to see if indeed it is the sort of data you want as opposed to having to download the data, open it and then realise it’s nothing like what you are after.


Google

For most people on the planet with internet access, the first place to visit when looking for anything on the web is Google……….other search engines are available. Not only can you search using the standard search engine but they have various spin offs specific for searching datasets.


Power BI Sample DataSets

Not only datasets but also Power BI sample pbix files for anyone looking to get started with it.


GenerateData – Create your own

If there is a specific dataset you are looking for, with specific column names and data types then this is a useful tool. Data is restricted to 100 rows initially unless you sign up / donate but worth a look:


data.gov.uk

At the time of writing there are 54,846 datasets available on here. Some are a bit bizarre (See below) and are specific to the UK so unless you’re looking for that kind of thing, it may not be the place for you.

Other very specific examples (apart from that below) I’ve found are “NHS Bassetlaw CCG Expenditure over £25K” and “Planned road works on the HE road network“………..


Super Bizarre DataSets

I’ve not used any of these but saw this article via Brent Ozar’s weekly newsletter – https://thestacker.com/. Some of them sound pretty cool, others are a bit out there such as: “Abandoned Shopping Trolleys in Bristol Rivers” from data.gov.uk and “The Great British Toilet Map“…….

MSBuild 2020 – Build Book of News — May 20, 2020
“Kubernetify” your Containers — February 23, 2020

“Kubernetify” your Containers

Adding the link to github which contain the slides and demo’s from the various events I have delivered this session to:

Github – Kubernetify Your Containers

To see them all I have given the root folder and if you search for “kubernetify” you should see everything needed 👍

Abstract

We have all now had a play around with Docker and Containers or at least heard about them.

This demo heavy session will walk through some of the challenges around managing container environments and how Kubernetes orchestration can help alleviate some of the pain points.

We will be talking about what Kubernetes is and how it works and through the use of demos we will:

  • Highlight some of the issues with getting setup (Specifically Minikube on Ubuntu),
  • Deploying/Updating containers in Kubernetes (on-Prem as well as AKS using Azure DevOps)
  • Persisting data
  • How to avoid making the same mistakes as I have
Database CI/CD with Containers and Azure DevOps — September 29, 2019
AKS SQL Server Error – 0/1 nodes are available: 1 node(s) exceed max volume count — September 21, 2019

AKS SQL Server Error – 0/1 nodes are available: 1 node(s) exceed max volume count

Background

Whilst playing around with my session for Techorama.nl I encountered an error I hadn’t seen previously whilst deploying SQL Server on Linux in Azure Kubernetes Service (AKS)

0/1 nodes are available: 1 node(s) exceed max volume count

The yaml I used was only slightly modified (mainly names) from scripts used on minikube and docker-desktop so I was a little confused as to why I was getting this in AKS.

As it happens, the reason is because I am tight and don’t like spending money! During testing etc I drop my AKS node size to as small as I can have it, in this case it was a Standard_B2s (2CPU / 4GB) which previously I’ve never had issues with until this particular demo.

When playing around with AKS you may have used a single PersistentVolume (or no volumes at all) but this particular setup had:

  • 1 for systemdbs
  • 1 for sql data files
  • 1 for sql log files

Which if you can do maths, equals 3 files. Which is fine for this particular Azure VM size as you can attach 4 disks. However, the issue arises once you start adding additional deployments with the same setup but in a different namespace. This would take me over the threshold of the 4 allowed disks and gives you the error that you have exceeded the max volume count 😦

So how do you fix this?

The options are to either scale up your VM size or alter your deployment to use fewer disks. In my case I could get away with only having 1 disk for systemdbs/data/logs. This is a demo environment so I can do this 🙂

Azure DevOps – Job Dependencies — March 12, 2019

Azure DevOps – Job Dependencies

I wanted to through together a quick post here to help out others who may be having issues with running multiple jobs in a build pipeline and there being no consistency in what order they run in.

The Problem

When I first started with VSTS and ultimately Azure DevOps, I went through many failed builds because the order of the jobs in your pipeline don’t run in the order that you’ve built them and how you would logically believe them to run. The image below shows two Build Pipeline jobs but when the build is queued, whether this be manual or via CI, the second job is running before job #1. In this example the build will fail because Job #2 is to deploy a dacpac to a SQL Server on Linux Docker Container (Using Ubuntu Agent Host) but obviously this cannot be done until the dacpac has been created in Job #1 which is running on a VS2017 Agent Host:

1 – Job running in wrong order

The reason for this is that when you specify multiple jobs in a build pipeline , by default they run in parallel. In my experience I never found that both jobs ran in parallel, always one after the other so it doesn’t seem to quite match what the Microsoft docs states but it’s not something I’ve ever spent the time investigating further.

This can obviously be very frustrating especially as (from my testing) there is no consistency in which order they run but I did find that cancelling a build or re-running a build straight after a failure seemed to through it out of sync whereas previously it was all running in the correct order.

The Fix

To stop this sporadic order of job running you can set Job dependencies in Azure DevOps. The process is so simple to setup and if you didn’t know about it and have been tearing your hair out over failed builds due to ordering then you’re going to kick yourself when you see the simplicity.

All you need to do is select the job you wish to start after another (other specific conditions can be applied) job completes and scroll to Dependencies, click the drop down and (in this example there is only one) the job you want to depend on will be in the list:

Select Job Dependency

That’s it, couple of clicks and your job ordering is sorted 😉

CI/CD with Docker and Azure DevOps – Part 2 (Creating an Azure DevOps Pipeline) — February 14, 2019

CI/CD with Docker and Azure DevOps – Part 2 (Creating an Azure DevOps Pipeline)

For video demos please see: Database CI/CD with Containers (Docker) and Azure DevOps (Demo’s – YouTube)

Introduction

In part 1 of this series we went about setting up our Azure DevOps account, creating a project and adding a Database Project to it. In Part 2 we will look to run through creating a new build pipeline creating a new Docker Image and pushing it to DockerHub.

NOTE: In Part 3 we will change to using Microsoft Container Registry (MCR) for two reasons:

  • To highlight the issues with using DockerHub with Azure DevOps
  • Because we can 🙂

Before we begin creating our build pipeline, it is advised that a Service Connection to Docker Hub (we will also be creating one for Kubernetes in Part 2) is created. This means we aren’t entering passwords / other secure information into our YAML file.

We can do this by selecting Service Connections from the Project Settings. From the image below you can see that there are a large variety of service connections that can be created but we will be choosing Docker Registry:

Simply give the connection a name and enter the Docker ID and Password:

NOTE: once you have created this you need to re-save your pipeline before it will work. This Resource Authorization link provides more information but Microsoft are working improving the user experience on this.

Now that we have created the service connection, we can now look to create our Build Pipeline. Select Builds from the menu on the left and click “New Pipeline”

Select the repository where the project you wish to build resides, in our case it is Azure Repos:

Select the project we have created – TestProj”:

You will now be presented with the option to use a Build Pipeline template or start from scratch.

One of the templates is “Docker Image” so we will choose that one:

This will auto generate some YAML code to get you started:

As we are using DockerHub as opposed to MCR we have to make a change to the azure-pipelines.yml file which will be used.

This link provides more information but in short we need to change the filename:

If you have a Docker Hub account, and you want to push the image to your Docker Hub registry, use the web UI to change the YAML file in the build pipeline from azure-pipelines.yml to azure-pipelines.docker.yml. This file is present at the root of your sample repository.

https://docs.microsoft.com/en-gb/azure/devops/pipelines/languages/docker?view=azure-devops&tabs=yaml#example

Once you have made the change, annoyingly you don’t seem to be able to exit from the file with a simple “Save”, you have to “Save And Run”, which will initiate a failed build.

You can pull the latest changes locally and view/change the file in VS if you prefer:

NOTE: You will also need to update the Pipeline to use the new file. You can do this using the Visual Editor:

So, we now have our YAML file name updated and commited as well as the build pipeline updated to use it. However, before we proceed we need an actual Docker image and push that to our Docker Hub repo

Pull latest SQL Server 2019 on Linux image to local repository:

docker pull mcr.microsoft.com/mssql/server #This will pull the latest version 

Now push this image up to Docker Hub giving it the tag “testprojsql2019“:

docker tag mcr.microsoft.com/mssql/server:latest sqlgeordie/azuredevops:testprojsql2019
docker push sqlgeordie/azuredevops:testprojsql2019

Using VSCode for output:

We’re not quite ready to run our build, the build pipeline doesn’t create a Dockerfile so we need to create this ourselves. If we don’t we get this error:

"unable
to prepare context: unable to evaluate symlinks in Dockerfile path:" lstat
/home/vsts/work/1/s/Dockerfile: no such file or directory

Dockerfile

FROM sqlgeordie/azuredevops:testprojsql2019
RUN mkdir -p /usr/src/sqlscript
WORKDIR /usr/src/sqlscript
CMD /bin/bash  

Now, we have to amend the YAML file to login in to DockerHub for us to be able to pull down the image in order to build using the Dockerfile. You will notice in the image below that i have highlighted “build an image”, the reason for this is relevant in the next section.

Build input:

steps:
 task: Docker@1 
 displayName: 'Build an image'
 inputs:
 containerregistrytype: 'container Registry'
 dockerRegistryEndpoint: sqlgeordiedockerhub
 imageName: 'sqlgeordie/azuredevops:$(Build.BuildId)'
 command: build an image
 dockerFile: '**/Dockerfile' 

Login input:

 task: Docker@1
 displayName: Login
 inputs:
     containerregistrytype: 'container Registry'
     dockerRegistryEndpoint: sqlgeordiedockerhub
     command: login 

Push Input:

task: Docker@1
   displayName: 'Push an image'
   inputs:
     command: push an image
     imageName: 'sqlgeordie/azuredevops:$(Build.BuildId)' 

There are examples on GitHub docs which have (in my opinion) errors. For example, I mentioned earlier that I highlighted “build an image” for a reason, that reason is that it is incorrectly stated as “build” (also the same for “push”) on GitHub and this gives errors.

Complete YAML File

#Docker image
#Build a Docker image to deploy, run, or push to a container registry.
#Add steps that use Docker Compose, tag images, push to a registry, run an image, and more:
#https://docs.microsoft.com/azure/devops/pipelines/languages/docker

trigger:
 - master 
pool:
 - vmImage: 'Ubuntu-16.04'
steps:
 - task: Docker@1 
   displayName: 'Build an image'
   inputs:
     containerregistrytype: 'container Registry'
     dockerRegistryEndpoint: sqlgeordiedockerhub
     imageName: 'sqlgeordie/azuredevops:$(Build.BuildId)'
     command: build an image
     dockerFile: '**/Dockerfile'
 - task: Docker@1
   displayName: Login
   inputs:
     containerregistrytype: 'container Registry'
     dockerRegistryEndpoint: sqlgeordiedockerhub
     command: login
 - task: Docker@1
   displayName: 'Push an image'
   inputs:
     command: push an image
     imageName: 'sqlgeordie/azuredevops:$(Build.BuildId)' 

The “incorrect” example in the docs is:

- task: Docker@1
     displayName: Build image
     inputs:
     command: build
          azureSubscriptionEndpoint: $(azureSubscriptionEndpoint)
          azureContainerRegistry: $(azureContainerRegistry)
          dockerFile: Dockerfile
          imageName: $(Build.Repository.Name) 

The strange thing is that if you edit the file directly online there is no error:

However, if you edit the pipeline you see the red syntax error “squiggle“:

Please make sure you change this otherwise you will receive and error in your build.

To get back on track, in theory, we should now be able to run the build which will pull the image from our DockerHub repository, and initiate building a new Docker image from our Dockerfile (granted a very basic build) and push it back up to DockerHub

We can now check the image exists in Docker Hub:

Pull it down:

Check local images:

There we have it, we have successfully built a new Docker image from a Dockerfile which resides on DockerHub and pushed the newly created image back to DockerHub.

In the Part 3 we will look to expand on this and look to incorporate it into the TestProj we created in Part 1 and show how we can push changes made to our TestDBProj to Azure DevOps to initiate a build process to provide a new Docker Image with our changes applied.

I have created some videos (no audio I’m afraid) of this process which was used as part of my sessions at Data In Devon and DataGrillen earlier this year. I will be looking to replace these with either a full blog post (ie. Part 3) or perhaps re-record the videos with audio.

CI/CD with Docker and Azure DevOps – Part 1 (Creating our Database Project) —

CI/CD with Docker and Azure DevOps – Part 1 (Creating our Database Project)

For video demos please see: Database CI/CD with Containers (Docker) and Azure DevOps (Demo’s – YouTube)

Introduction

I decided to throw together some steps on how you can start utilising SQL Server running on Linux in a Docker container a little more than just doing a docker run, connecting via SSMS/sqlcmd, running a few commands then blowing it away and saying you now know docker 🙂

DevOps, CI/CD, automation, whatever you want to call it is a massively hot topic at the moment and I’ll be running through how you can get yourself started with a database project (.sqlproj) and source control using Azure Repo (Repository). We will then run through how to take a Docker image (this can be stored in Dockerhub or Microsoft Container Registry) and push any changes to this via an Azure DevOps build pipeline to create a new Docker image with your changes applied. For the purposes of this blog post I will be using DockerHub but Part 3 will be using the Microsoft Container Registry (MCR) due to issues I have discovered in the Deployment pipeline in Azure DevOps meaning that I can’t use DockerHub.

This is quite a lengthy post so I will save the Deployment pipeline to run the newly created image in a Kubernetes cluster ready for UAT, Pre-Prod or even Production (if you’re brave enough) for part 3. Or, if you’re in Exeter for Data in Devon on Saturday April 27th, or DataGrillen on Thursday/Friday 20th/21st June then come and see my session where we will delve a bit more into this.

Seeing as most of this will be run in Azure, there are small costs associated with running the Kubernetes cluster but everything else in part 1 and part 2 are (at the time of writing) completely free – so no excuses! the list of tools used are listed below:

  • SSDT (Free)
  • Azure DevOps including Azure Repo (Free)
  • Docker for Windows Community Edition (CE)
  • DockerHub (Free, Ts&Cs apply)
  • Azure Kubernetes Service (AKS) (Not Free but costs can be kept low)

Install SSDT

First off, in order to create our Database Project (sqlproj) then we will need to download and install SQL Server Data Tools. This will install a Visual Studio Shell so you don’t necessarily need a full blown edition of Visual Studio. You can download it here.

Setup Azure DevOps Account

Azure DevOps is a one stop shop for your Development life cycle requirements and can simplify the automation of an entire continuous integration (CI) and continuous delivery (CD) pipeline to Azure . There are several tools within it including:

  • Boards
  • Pipelines
  • Repos
  • Test Plans
  • Artifacts

There are paid for aspects of Azure DevOps. For example if you wish to make an Artifact available to teams larger than 5 users then you will have to pay $30/mth. Everything we will be using here is free!

To get started, you will need to create an account or if you have one already, sign in:

Once you have created your account and went through the various authentication/verification, you can start creating a project in Azure Repo. In the screen shots below you can see that even a Private repo is free which is fantastic!

Give it a name and Description if you wish and choose a version control type. I have chosen Git because that’s what all the cool kids are using.

Congratulations! Your project is now created and ready to connect to using VS / SSDT.

Connect using Visual Studio 2017

You can connect directly to your Azure Repo (Git) project directly from VS, this is done by selecting the Team Explorer menu option:

Enter your login details and select the repo

You will have to “Map” the Git Repo to a folder on your local machine, this can be anywhere so choose somewhere suitable.

Enter the details you used to create the Azure DevOps account and Save Git Credentials so you won’t have to re-enter these later.

That is it, you are now connected to your Azure Repo so any project you make changes to locally can now be pushed to an online repository!

For this demo I have created a folder in the project called “TestDBProj” in Solution Explorer, this isn’t necessary but I like to try and keep things a little tidy:

Looking at Team Explorer, everything is now setup and you have now cloned the TestProj project to your local development machine. We can now get started with making a few changes.

Create Database Project

Seeing as TestProj literally has nothing in it, we need to create a new Database project and we can call it TestDBProj:

Fantastic, our TestDBProj is now created and we can start adding objects to it. At this point if you have a database schema you need to import you can do that using the schema compare but again, keeping it simple we’ll just add a new table called TestTable1:

Add a new column called [descr]:

We’ve made some changes but at the moment our project has not been added to Source Control so let’s add it:

Check the changes and commit making sure you add a commit message:

NOTE: This can also be done via command line but as this is an introduction / getting started we are using the Gui.

You should now get a message saying that the changes have now been committed locally:

To get the changes into our Azure Repo we need to push them, or in the case of the Gui “Sync” our changes and using “Push”:

There we have it, our changes have now successfully pushed to our Azure Repo and we can check online and it should now be available:

In Part 2 of this series we will start looking at the Azure DevOps Build Pipeline and how to create a new (very basic changes made) Docker Image from an Image stored in DockerHub and push it back.

I have created some videos (no audio I’m afraid) of building an Azure DevOps Pipeline which will create a new docker image with the changes made to the project and deploy it to Kubernetes. These are part of the demos used as part of my sessions at Data In Devon and DataGrillen earlier this year.