This session will present new options available with Terraform Cloud to support GitOps workflows.
Thank you very much and thanks for being here, both in person and online. Like the title says, today we're going to talk about GitOps workflows in Terraform Cloud, including some enhancements we've made recently to make that a little smoother of a process.
First, we'll do a level set definition of GitOps, what we mean when we talk about GitOps workflows, and what we try and get out of adopting a GitOps workflow.
Then we'll see how we can put that into practice in Terraform Cloud, primarily with our VCS-connected workflow.
We'll also take a look at the Cloud Development Kit for Terraform (CDKTF) and how we can bring those projects into the same kind of workflows with Terraform Cloud as the backend for that.
To start off, a simple-seeming question: What is GitOps? If you do a search for that, you'll find a lot of different definitions. Everyone has their own take on that definition. Not quite as contentious as the, “What is DevOps?” question, but there are some different flavors of it. What really emerges is a pretty core set of themes and principles.
First, we have two core principles to a GitOps workflow. The first thing to understand is that GitOps is not a thing. You don't buy a GitOps. It is not a feature, it is a model. It is a way of structuring your infrastructure automation process, and really centered around two core principles. The first being that the Git repository is the authoritative, declarative source of truth for your infrastructure state, and so everything derives from that central Git repository, and as much as possible, we attempt to drive the process and the workflow from Git and from the Git workflow — that's the GitOps part of this.
We're borrowing development practices from the software development world and bringing that into our infrastructure automation practices, so things like collaborative development, collaborative review processes, automated testing, CI/CD workflows, and of course, version control are the core of the whole thing.
Some core components that we need to actually achieve this? Obviously infrastructure as code. This is a Terraform session and it all needs to start with declaring our infrastructure in code. That way we have something to version that is human readable, that we can differentiate between versions, and examine what the changes are going to be.
The next core element here is that pull request or merge request, depending on your Git platform and what they call it.That becomes the center of the interactivity and the collaboration around changes to your infrastructure.
And then, of course, we need our CI/CD automation to tie the whole thing together and bring that automation to the entire process.
What we get out of adopting this kind of workflow is, again, I've used the word, collaboration. That really is a central tenet of this. We get that centralized collaboration. We can involve everyone we need in the process of not only defining, but changing, our infrastructure through that pull request as the mechanism for enabling that collaboration.
We also get increased deployment velocity through that automation, and by bringing everyone together at the right time without long waits between potential reviews.
And then also the ability to recover and restore more quickly and roll back. Because everything is versioned, we have that last-known good state. If something goes wrong in a new change, we have that prior state we can roll back to. Not only roll back our code, but then also through the declarative nature of infrastructure as code, roll back our infrastructure state to that known good working state.
We've level-set on what GitOps is and what we're trying to get out of it. How do we actually put this into practice with Terraform Cloud? That's where we have a few different options in terms of the workflows in Terraform Cloud.
If you've set up workspaces in Terraform Cloud, you've seen there are three types of workspaces. The first being the VCS or version control workflow workspace, and the other two are the CLI- or API-based workflows. For today's discussion I'm lumping them together, but we're going to focus on that VCS-connected workflow.
In that workflow we have our Terraform Cloud workspaces directly connected to that Git repo, and actively linked to a branch in that repo. Runs are initiated automatically by actions occurring in that repo. We can also initiate them through the UI and at least there we know what version of the code we're going to be executing that run against.
Whereas in a CLI or API workflow, we don't have as direct a connection. We're initiating runs by a user on the CLI on their laptop, maybe. We have this potential that two different team members might be operating on a different set of code against the same workspace if someone hasn't pulled the latest changes down. We lose that direct connection to the authoritative version of the code in the CLI and API workflows, potentially.
So the VCS workflow is our prescriptive, preferred workflow in Terraform Cloud. It is our out-of-the-box idea of what a GitOps workflow looks like in Terraform Cloud. That said, yes, we have opinions and the defaults reflect those opinions, but we also know that not every team operates the same way. Not everyone is structured the same way, not everyone's repos look the same, so we also have a number of options for controlling how and when those jobs get triggered and what that workflow looks like, and we'll take a look at those.
This is, at a very high level, what that standard VCS workflow looks like:
We start with our supported version control provider integrations to hook up to that workspace. We start with opening a future branch to make some changes, and then once we're ready to review and potentially merge those changes, that's where the pull request gets opened. And again, in the default settings on a VCS-backed workspace that is going to initiate an automatic speculative plan or a plan only run against that Terraform Cloud workspace. That way, prior to actually merging the change, you can see that it's going to have the effect that you intended and not have any undesirable results.
And then of course, once all the reviews are done, we might have a security review as part of that Git process, so whatever we need to go through in terms of testing and review, we then go ahead and merge that back to our main branch and that is when the full plan and apply run will kick off.
I mentioned that automatic speculative plan in the past, so getting into some of the enhancements we've made in the past, the only place to actually see the results of that automatic speculative plan was on the pull request itself. If you wanted to see the history of those, you had to dig through all the old closed pull requests, but we're happy to now also show those plans in the main run list in your Terraform workspace. Those now show up alongside all of your other runs, so you don't have to go digging through those old pull requests to see all those prior speculative plans.
It also opens up a new feature, which is the capability to go ahead and rerun that same plan against a different version of Terraform in your workspace. That way you can preview the effect of upgrading your Terraform version without any surprises there as well.
I mentioned we can customize how we trigger runs. These are the main mechanisms for that in the default workflow:
A change in the working directory for our workspace is always going to trigger the run. We know a pattern that we see in the wild — and it's not necessarily our recommended pattern, but we see a lot of monorepo patterns in the wild: a single large repository that may contain many different workspaces. We know they exist and we want to be able to support those as well.
We've had for a while the ability to also trigger runs based on path prefixes, so if you have code elsewhere in a repository that's outside the working directory, the repository that also needs to trigger a run, we have that.
We've also introduced some new capabilities to do glob-based patterns around that now.
We also have a new capability to trigger based on a Git tag.
Taking a look at those pattern-based triggers, now instead of just a static prefix for a directory to watch, we can use glob-style syntax — this would be similar syntax to what you do in a
.gitignore file. We can use the double star to indicate any number of potential sub directories or potentially, like in this example, only look at changes in startup
.tf files because I don't care if someone changes the README in a module that is part of the monorepo.
That's some new flexibility for those monorepo environments, but another very requested feature has been tag-based triggering, and the ability to not always initiate that plan and apply that full run on any merge to our branch, but instead wait for a tag to be published against it. We've introduced that as well recently. You'll see this in the version control settings of your workspaces. We have some default options there to look for a semantic versioning style pattern or you can supply your own custom regex that fits however, whatever your style is for tagging resources.
Let's take a look at this in action. Before this starts, I want to set up something you can see here that's highlighted in purple, and this is a very common pattern.
You'll notice I have three workspaces here, a dev, a test, and a prod. They're all connected to the same underlying GitHub repository. This is a common and recommended pattern for having multiple environments deployed all based on the same underlying code. This could be dev, test, prod like I have, it could be multiple cloud accounts getting the same set of infrastructure deployed, or multiple cloud regions. You can have many workspaces connected to a single underlying Git repository.
If we start this up, I have these three workspaces, all connected to this repo. If we go over to GitHub, I've prepared changes introducing new resources and changes to this workspace into a feature branch. I'm going to go ahead and open a pull request against that feature branch, and we'll see that once we've checked, we have no merge conflicts, everything's happy.
We will go ahead and initiate that speculative plan for all three workspaces that are connected to that repo, and right here in the GitHub interface in the pull request interface (and this also happens with our other supported VCS providers) we will see the results of those checks or of those plans, and we'll see that they all succeeded. We get a little summary of how many resources would be added, changed, deleted, and we also have a “details” link to go over and see the full results of that plan.
Previously that would've been the only place to get to those full details of that run, but like I said, now we've added speculative plans to the run list in the actual Terraform Cloud UI, so I can also go back to the workspace right here and to the runs list in my prod repo or at my prod workspace in this case, and I can see that speculative plan that we just ran. We can see that it was triggered from GitHub, and we can also see the details of exactly what pull request triggered it, exactly what commit it was against, and then the full output of exactly what would've changed if this were to actually be merged.
So let's say we're ready to merge this, everything looks good, we're happy with it. If I merge this right now, it's going to be applied right away to all three workspaces, and maybe I want to be a little more controlled in how I release a prod. I'm going to go to my version control settings in the prod workspace here, and I'm going to turn on our new tag-based filtering option, a triggering option. In my case, I'm just going to use a semantic version with a prefix, so it'll be something like V1.0.0, and it will look for that pattern in a tag published and pushed to my branch that I'm monitoring.
Once we save that, we'll head back over to GitHub, and again, everything looks good. We'll go ahead and merge this pull request. And what we'll see happen back in the Terraform Cloud workspaces list is that our dev and test workspaces have automatically kicked off their plan and apply run, but prod has not. Prod is waiting for that tag pattern that it's looking for now. Now we can kick the tires on things, make sure dev and test look okay, let it bake for a few hours, few days, whatever our process needs, and then wait maybe for a maintenance window and wait to apply that tag for the prod trigger.
I'm going to do that over here in GitHub. In GitHub, I'm going to publish a release, and releases and tags are linked together in GitHub. So I'll pick my branch, I'll publish a new V2.0.0 tag that'll match that semantic versioning with a prefix pattern that I set up.
We can give some release notes here, let other people know what has gone on with this, and we'll see that as soon as I publish this tag and this release, we will see the prod workspace has now kicked off its run and we've been in total control of exactly when that happened, thanks to that tag publishing.
So that is the standard VCS workflow in Terraform Cloud.
Switching gears, I want to talk to some about the CDKTF. This is the Cloud Development Kit for Terraform, and we'll see how we can adopt those same kinds of patterns in Terraform Cloud for projects that are using the CDKTF as well. Some of these techniques can apply to HCL-based projects as well.
A little bit of background on the CDKTF: It is a toolkit that allows developers to use their familiar programming languages — TypeScript, Python, Java, C#, and Go — to define infrastructure, still with access to the entire Terraform ecosystem of modules and providers in the registry, but they can now use their preferred programming language instead of HCL.
I won't go into too much detail on the CDK itself. We have a lot of materials out there about it. We did a webinar in August that went into much more depth on it, so if you're interested, definitely check that out. If you caught Cole's session yesterday [When, Why, and How to Use the CDK for Terraform], he went into a lot of depth there too.
The idea is, we write this application code in the programming language and it gets synthesized into Terraform-compatible JSON. That's the key part of the process we're going to take advantage of in this GitOps workflow because that code that is synthesized down can be directly consumed by Terraform or Terraform Cloud.
And also a note — as of version 0.12 which was released in the beginning of August, the CDKTF is now GA, so it has moved out of its beta phase and it is now generally available.
Similar to the standard workflows, we have two choices here of how we could integrate a CDKTF project:.
The first one is using that version control workflow where we're going to automate that synthesized process via pre-commit hook or GitHub action, which is what I'll use in this case.Then we're actually going to publish that synthesized JSON right back to the repo, and that will be the working directory and the repo that my Terraform Cloud VCS workspace is watching.
Then we'll also take a look at another option here, as part of the GA release. When they released version 0.12, the CDKTF team also released a new GitHub action specifically for CDKTF. They published that to the GitHub Actions Marketplace, so that's out there too. We'll take a look at that one too.
That one is technically because it's actually running the CDK processes itself in the GitHub action. That one's technically an API-based integration, but again, because it's all running in the context of GitHub, we still have that direct link to the code and to the specific version of the code that we want to be operating on.
So more demo, we'll see more of this in action. Here, this is a very, very simple CDKTF application. It's using TypeScript just to set up the AWS provider and deploy a single EC2 instance.
I'm going to show the GitHub action first, and because that's an API-based workspace, I need to define the cloud backend in this project. That would be equivalent to using the cloud backend block in an HCL project to connect it to a workspace in Terraform Cloud.
Looking at our GitHub workflow files, these are what define the GitHub actions. The first one here is going to trigger on pull requests. This setup code here was all taken from the CDKTF GitHub action README file, so that's all documented there for you.
We're going to run that new action. It supports a few different modes, so we're going to run on a newer updated pull request. We're going to run that plan-only mode, so again, that aligns to the automatic speculative plan that a VCS-connected workspace would do, but instead we're now doing it in the context of a GitHub action.
The second action I've defined here is going to happen when a merge happens into my main branch or a direct push — which we would never, of course, directly push to main. We would never do that, but it will fire in that case too.
It's going to do all the same setup. Instead of the plan-only mode, we're going to run and apply auto-approve mode, so that is, again, going to automate that full plan and apply run once we've merged back.
Those are the two actions that I've set up on this project. GitHub will read those once the code is checked in. We can see here I have my workspace that is handling this project. You'll note it is not a VCS-backed workspace, it's not directly connected to a repo, but I am using remote execution mode because I want the run to actually occur in the Terraform Cloud environment to take advantage of all the features of Terraform Cloud, like RBAC, the audit trail, policy as code, and our new continuous validation features that we announced this morning. So we'll still get all that power even using this GitHub action workflow with a CDK project.
I've got a few changes prepped here, and I will go ahead and open that pull request. And now, instead of seeing that direct Terraform Cloud integration take flight, we're going to see the GitHub action that I defined to run the plan and basically do the same thing that the VCS workflow would do, and comment those results back to the pull request. So we see that action, and the details of that action running. Obviously it sets up the prerequisite, it sets up the CDKTF environment, and then starts the plan including connecting to the Terraform Cloud workspace to actually execute the plan in the cloud environment.
Once that's done, we get a very similar result here. We see that the plan succeeded, we had no errors. We also get a link out to the full plan run details and what those full results were. I can also now get to that right from the workspace runs list too, we can see that was initiated by the GitHub action.
Once again, it's time to merge, everybody's happy, everything looks good. We'll go ahead and merge that and we'll see that second GitHub action now take effect, and back on our workspace, we'll see that our full plan and apply run now kicks off.
That's the GitHub action, and that's great. We're still taking advantage of everything in Terraform Cloud, but we've lost that really direct VCS-backed integration of a VCS-backed workspace. So what can we do to still use the direct VCS integration even with the CDKTF projects?
That's what we'll do here. You'll see in this case, I've got a workspace that is connected to the repo, so this is a VCS-backed workspace. Importantly, I have set the working directory to the
cdktf.outdirectory/stacks and then
/ the stack name.
A CDKTF parlance stack is basically a related set of resources. It maps one-to-one to a Terraform working directory state file, so that stack name, we need to watch that
cdktf.out directory because that is where the JSON code is synthesized to when we run the synthesis.
Again, this is the same TypeScript project. The one difference is because this is a VCS-backed workspace, I no longer need the cloud backend block because we're directly integrating into Terraform Cloud there.
In this case, I have just a single GitHub action that I'm going to define to take care of this. It's going to operate on a pull request, so when a new pull request is opened or when one is updated, I'm doing all the same setup steps as the CDKTF action, but instead of using the actual CDKTF action, which does not currently have a synthesis-only mode, I'm just running CDKTF CLI
synth. That's the command that will take that TypeScript application code and turn it into the Terraform-compatible JSON code.
And then I'm using a second action here which is an auto-commit action that I found on the GitHub Marketplace, and that's going to then take the contents that are generated into the
cdktf.out directory and commit that right back in as a new commit on the pull request.
One important thing to note here in our
gitignore file: When you initialize a new CDKTF project, it generates the
gitignore file for you as well. Normally, the
cdktf.out directory is in the
gitignore file. For this workflow we need to make sure it actually gets checked back into the repo. I needed to comment that out of the
gitignore to make sure it gets captured on that commit because that's how we're actually enabling this VCS-backed workflow. And a quick look here, this is what the synthesized JSON looks like. It represents the end result of that application code.
This can be run by the CDKTF. You can run the apply by the CDKTF. What we're going to do here is supply that JSON to Terraform Cloud to run. So again, I'm going to go ahead and open a pull request with some changes here. We'll see that the VCS integration there has not executed because it says, "No changes have been detected in the working directory yet," but instead, we're going to run that GitHub action to set up the environment, run that CDKTF synth process that will generate the JSON, and then do the auto-commit back into our repo here.
Once that's done and successful, back on the pull request we see we have a second commit now as part of the pull request, which was that synthesized code being checked back in. Now, the pull request has been updated. The Terraform Cloud VCS integration gets fired again. Now, the working directory has changed, and we're going to actually run that plan. Here we see that plan kicking off.
We can see all the details of what triggered it and in this case, it was triggered by the commit that the GitHub action made, not something that I did myself. And again, we merge it and we kick off our final plan and apply run. Now, the merge has happened back to the main branch.
That's how we can take a CDKTF project and still use a VCS-backed workflow with just a little bit of automation wiring through that GitHub action, and get that same experience that we do with an HCL project, also with a CDKTF project.
That's about my time. We have a lot of resources to learn more — if you haven't been to the HashiCorp Developer site, you really need to. It's a lot of great tutorials. We have tutorials on Terraform Cloud and VCS-backed workspaces there, and also on the CDKTF. So you can get started with CDKTF projects, and the docs as well.
And then, of course, head over to app.terraform.io to get started with Terraform Cloud. You can get started for free and check out some of these potential workflows.