Automation journey of a legacy organization
TL;DR: introducing basic CI/CD automation in an organization with lots of history is a lot of work, but the benefits can be huge. Here's a brief overview of the steps in my current workplace, which can fairly be called a 'legacy organization'.
In the past few years, I've been increasingly focused on improving our team's dev process by automation: your bog standard CI/CD pipelines that do, at least on the surface, nothing fancy. Since we're owned by using a lot of Microsoft products, the platform of choice is Azure DevOps Pipelines.
Humble beginnings
For context, the organization (government, research) has been around for decades, and currently has a focused 10-person team doing software development plus a dozen or so developers around the organization. The SD team has around 40 systems to maintain, the earliest from the 80s, and a few new projects each year. Since we have a long history and a portfolio of systems that really show their age, I think it's fair to characterize us as a "legacy" organization.
Having had no previous experience to speak of, and the organization new to these things well, we've all learned as we went. Things kicked off in 2019, as we were starting to use Azure DevOps for new projects instead of Team Foundation Server (TFS, as it was called at the time. Now it's Azure DevOps Server). This unlocked the door for cloud hosted agents to do our bidding, and off to the races we went.
CI - constant integration - build validation
The first step of this automation journey was a simple dotnet build
pipeline to be run automatically when a PR was submitted. Since even having an enforced code review (and git, Azure DevOps etc.) were new to the team, there was so much going on that these humble build validation pipelines were the whole of our automation for a while.
As obvious as it is now, introducing automated build pipelines has encouraged our devs to implement and then rely on automated tests, which are not really used on the earlier generation of our projects. Having a CI pipeline makes tests front and center of the development process, and it's easier to get motivated to actually write the tests.
CD - constant deployment - Azure and on-prem
Since the organization did not (and still doesn't) have a good grasp on automation, all the production deployments have been manual - build the thing on your machine, copy the files over, take backups (manually), stop the application, update the files, start the application, notice you copied the wrong files, repeat... A familiar story to be sure, and one that causes errors, stress and takes a tremendous amount of time.
Our first automatic deployments were to dev/test environments in our developers' personal Azure playgrounds in Azure. Again, the beginnings were humble: initialize the app and env manually, after which the pipeline would automatically deploy updates. This was a very easy win for efficiency.
Since the org was, and still is, mainly an on-prem shop, next we had to figure out how to do deployments to Windows VMs. Without going in to too much details here, the solution we finally ended up was self-hosted Azure agents combined with PowerShell Just Enough Administration (JEA) and Desired State Configuration. I'll give an overview of the solution in a later post, since this trio has been the single most painful part of the journey thus far, with many a dead end discovered.
After a few years of head -> wall
development process, we now have all the necessary building blocks in place:
- variable configuration per environment (for dotnet and node applications)
- handling manual database migrations during the pipeline
- backups from on-prem servers
- IAC on Azure It will be interesting to see how the overall software development process changes with these new capabilities. We're already seeing varied discussions on e.g. when manual intervention is needed, how to give visibility to what's deployed and when exactly should each environment be updated.
Communication
One not so obvious benefit has been bringing IT and software developers closer. (I know, we missed the whole DevOps boat, but isn't that rolling back right now? We'll meet you halfway). Implementing deployments to on-prem services through firewalls, network layers and access restrictions has made it necessary for our automation devs to understand at least parts of the stack, which in turn has improved communication of needs and limitations. Our recent push to do actual IAC in Azure also enforces a certain structure, defined in a (somewhat) collaborative process.
Security
Having automated deployments has interesting security tradeoffs. On one hand the dev access on servers can be reduced or removed completely, which reduces the total number of accounts having access and the risk of manually messing stuff up. On the other, there is reduced visibility into the environment and making hotfixes gets more difficult.
The deployment accounts need powerful permissions, but they can be limited with JEA to limit the blast radius of an account getting hijacked. It's pretty wild that PowershellOnTargetMachines requires an administrator account, when it seems this could be changed in a single line. To work around this, you should use a construct such as
- pwsh: |
$securePassword = ConvertTo-SecureString -AsPlainText '${{ parameters.password }}' -Force
$credential = New-Object System.Management.Automation.PSCredential('${{ parameters.username }}', $securePassword)
$stop = @{
ComputerName = '${{ parameters.machineName }}'
Credential = $credential
ConfigurationName = '${{ parameters.jeaConfig }}'
ScriptBlock = {
& C:\scripts\shut-down-iis-app-pool.ps1 -PoolName $args[0]
}
ArgumentList = '${{ parameters.appPoolName }}'
}
Invoke-Command @stop
displayName: Shut down application pool
Making environments standard is pretty much a requirement for working automation, which has security benefits as well when management is simplified. Then again, enforcing the new structure on a mismatch of older servers is a lot of work.
Other aspects
Since we have so many projects to keep track of, we very quickly identified the need to have templates for common tasks. This was easy enough to achieve:
- set up a new repo
- create (and tweak, ad infinitum) a directory structure
- implement your templates
- publish releases to e.g. a release-branch
- reference the template with
resources:
repositories:
- repository: templates
type: git
name: RepoName
ref: refs/heads/release/v6
jobs:
- template: dotnet/jobs/build_dotnet_core_app.yml@templates
Having a centralized repo for these templates helps to keep the pipelines in the actual app repos simple and allows us to work around the various quirks easily.
In addition to CI/CD there's a couple other things that we've automated:
- package updates for node and dotnet applications (since DependaBot is not a first class citizen in ADO)
- dotnet-outdated does a good job of working around the limitations of the Nuget system
- static code analysis using either infer# or GHAS
- maintaining our own NuGet and npm libraries
Some headway has also been made to initializing new Azure DevOps projects straight from a pipeline, which would allow for easy standardization of projects.
Worth it?
Having spent quite a lot of time on this in the past five years, I've often considered how many deployments we could have done manually in the time it has taken to get these pipelines working. Obviously, the answer is 'all of them, for a few years', so the price has been pretty steep.
The easiest wins came from the build pipelines and the biggest from automated deployments. We've managed to keep most of the complexity hidden away in the templates, so adding these pipelines to projecs has become easier on each iteration, but still it takes work when starting a new project.
Just today I had a new project deployment that unfortunately had to be done in the old way, and luckily it reminded of the stark difference between "trigger a release branch and be done with it" and "spend two hours with three people updating the same app on three distinct machines". For all its exquisite pains, insanity and horrors the end results, at least for me, have been worth it a few times over.