A colleague of mine asked for some help building out a DevOps pipeline that would provision a web app and a database, upload default data held in csv files into the database, and then deploy the web app and subsequent schema changes.
Piece Of Cake. Totally simple. How hard could this be? And what could possibly go wrong?
Apparently everything. It took me forever to figure this out. For those that don’t want to ready my hilariously rediculous adventure and just want to know how I did this, skip to the end to the TLDR section.
Approaches I Could Take
This was ultimately for a MS Learn module so we wanted to use all out of the box Microsoft solutions. I used powershell scripts with the azure cli to provision my infrastructure (why not ARM templates? Oh, you totally can use ARM. I just happened to use the powershell with the azure cli because it’s easy and for this sample, it would be fine). I used SSDT to capture my schema, used the default deploy tasks for Azure Sql dacpacs and the default deploy tasks for the web app. That just left uploading the default data from csv files.
To upload my csv files, I could either use
- BULK INSERT – SQL command that bulk inserts a bunch of data into tables
- bcp.exe – bulk copy program from Microsoft Command Line Utilties for SQL Server
- Write-SqlTableData – Powershell command to upload a bunch of data into databases
BULK INSERT part 1
I first looked at BULK INSERT. Apparently that works if you are local to the db server. But if you are trying to use this for Azure SQL, you have to first upload your csv files into blob storage, and then you can use BULK INSERT to read the csv file from the blob storage. That sounded like a lot of extra steps so I decided to look at the next one.
bcp.exe works great but first I didn’t know if I could access it on our build/release agents. A quick test in my power shell script showed nope. bcp.exe was either not on the hosted build agent or the path to bcp.exe wasn’t known. However, I thought I could probably install all the sql server command line tools via Chocolatey or something like that. A quick look at chocolatey and cool. I can totally install the sql server command line utlities.
I added that in, kicked off my build/release and……
No errors, no problems. It just said:
0 rows copied
Whaaaat???? ok. So I tried the same command on my desktop and it worked fine. I thought maybe since I have full blown sql server installed on my dev box, maybe that’s why it worked on my machine. So I build a windows server VM with nothing on it. Just a fresh install of windows. Installed the sql server command line utilties via Chocolatey, ran my script and evertying worked fine.
So…. why wasn’t this working? Usually when I see 0 row copied with no errors, it’s because the format of the file was wrong but… I literally pulled the file down from github onto my vm and it worked just fine. So clearly it wasn’t a file format problem!
After banging my head against the wall for a lot more hours, I decided to abandon bcp.exe and try out something else.
Write-SqlData powershell command looked super promising. I was already using Invoke-Sqlcmd in my powershell script to run sql commands so how hard could using Write-SqlData be?
When I tried using Write-SqlData to upload a bunch of data from my csv files to Azure SQL, it had authentication problems. After again spending forever googling this, I finally pinged Steve (@stevenmurawski) and he sent me some blog posts that showed this was something many people have run into and he even included a workaround. Cool.
Here’s the workaround:
$sqlConnection = new-object (‘System.Data.SqlClient.SqlConnection’) “Server=tcp:abellearndbserver1.database.windows.net,1433;Initial Catalog=learndb;Persist Security Info=False;User ID=abel;Password=mypassword;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;”
$serverConnection = new-object (‘Microsoft.SqlServer.Management.Common.ServerConnection’) $sqlConnection
$server = new-object (‘Microsoft.SqlServer.Management.Smo.Server’) $serverConnection
$db = $server.Databases[“learndb”]
$table = $db.Tables[“Courses”]
,(Import-Csv -Path “D:\a\r1\a\_LearnDB-ASP.NETCore-CI\drop\courses.csv” -Header “ID”,”Course”) | Write-SqlTableData -InputObject $table
And guess what? It worked perfectly on my machine. But when I tried to run it on the build agent, it couldn’t find the class Microsoft.SqlServer.Management.Smo.Server and Microsoft.SqlServer.Management.Common.ServerConnection. Great! How was I supposed to get the right dll’s loaded correctly on the build agent? There probably was a way but I was so frustrated by now that I thought, screw this. i’ll just use BULK INSERT and upload those csv files to blob storage. I know how to do that. That’s not hard. And there were plenty of articles that showed how to do this using BULK INSERT csv files into Azure SQL
BULK INSERT part 2
All right. So I set everything up. Dynamically created a storage account, uploaded my csv files to them and called BULK INSERT. Aaaaand….. didn’t work. It kept complaining that the external data source to the csv file didn’t exist. Even though I created it one line above the line I was using it!!! Plus, when I went into the query editor in the Azure portal, everything ran just fine!!!!
Uggghhhh!!!! By this time, days had past with me failing at everything I was trying. Countless blogs, docs with samples whatever. I could not get this to work. It got to the point that I literally sent an email out to my team mates admitting defeat (y’all need to understand what a monumental task this was for me. I DO NOT like admitting defeat when it comes to code. EVER). Steven even told me he would start looking at this as soon as he was done with a meeting.
Right around the time Steven told me he would help in a bit, I couldn’t get the thought out of my head that the bcp problem I was seeing REALLY seemed like the format of my csv file was wrong. And on a whim, I looked at my pipelines and noticed my build pipeline was a linux agent and my release pipeline was a windows agent.
Is this another one of those line ending problem between Linux and Windows? I knew that my csv files were formatted correctly in github (where my source is). But… could just downloading them using a Linux agent screw with that?
Of course it does! Git tries to be smart and converts the line endings for you when you download onto a linux machine. And all I did was use git to download my csv files and then proceeded to upload those csv files as build artifacts. And of course… those csv files now did not have the line ending of carriage return new line (CRNL). It just had NL. Which would cause my release to be wonky because now bcp couldn’t parse the csv file correctly!
I changed my build agent to be a windows agent, kicked off a build and bam! Everything worked.
It sounds like this all occurred over a couple of hours. In reality, this took me DAYS to figure out!!!!! DAYS of failure. DAYS of banging my head into the wall. All because of line endings.
WHEN WORKING WITH FILES, BE CAREFUL ABOUT THE AGENTS YOU USE AS LINUX AGENTS AND GIT CAN SCRUB THOSE LINE ENDINGS FOR YOU!!!!!!!!!!!!!!!!!!!!!
Stupid line endings. Jump to my Links section to see how to use bcp to load csv files into azure.
When using bcp.exe to upload my default data from a csv file into Azure Sql, it worked perfectly fine on my dev box. Worked perfectly find from a brand new VM that I created with nothing on it. But wouldn’t work on the hosted build agents. It didn’t error out or anything but it just kept reporting that 0 rows copied.
Turns out I was using a Linux build agent and a Windows release agent. And when the Linux agent used git to download the files, guess what it does with the line endings? Changed all the CRNL (carriage return, new line) to NL. When I uploaded the csv files as build artifacts, guess what all those csv files had as endings? Just NL. Which caused the parsing of my csv file to break. But instead of an error message, i just got
0 rows copied
Changed the build agent to a Windows agent and bam! Everything worked fine.