Learn about the Scale Factory's Terraform Cloud adoption journey and how they elevated the API access to the platform using the TFE provider to automate workflows.
Welcome, everyone. Today I'll be talking about managing Terraform Cloud workspaces using the Terraform Enterprise (TFE) provider. Let's start with what Terraform Cloud is and how it compares to the open source version of Terraform, which most people are familiar with.
Terraform Cloud is a completely managed SaaS solution that exposes a web GUI and API and can streamline the workflow of running your infrastructure as code. It also offers a private module registry. It delegates away managing the state files and the locking of those. It also knows how to trigger and run remote operations in the cloud for you. When you do your CLI Terraform runs, it will execute them in the SaaS solution instead of your local environment.
It also offers a granular way to collaborate within the team. You can set specific permissions for your team members to collaborate, and it also knows how to connect to your version-control systems in order to trigger your pipelines based on changes that you push into your code repositories.
Who am I? I'm a senior consultant at the Scale Factory. My name is Marko Bevc, and we are a remote-first AWS consultancy working exclusively with SaaS businesses. My background is mostly ops. I also did some programming in my previous life. I'm also a HashiCorp ambassador, an open source contributor and supporter, and I'm a big fan of simplifying and automating things as well as hiking, cycling, and traveling.
Today, we're going to look at what using SaaS platforms and shifting left mean for empowering users and how we can utilize those kinds of platforms in order to get a better and smoother experience for the end engineers running the infrastructure as code.
We're going to look at the Scale Factory's Terraform Cloud adoption journey. We're going to look at how we elevated the API access to the platform and use the TFE provider to automate our workflows. And we're going to look at how we approach module abstraction and created reusable code in order to scale up easier.
I also have a demo and an implementation example, and I'm going to wrap it up with some takeaways and learnings from our experience adopting the TFE provider and modularizing that using the Terraform modules.
Terraform Cloud helped us shift left and move responsibility toward the end users, in our case the engineers that are provisioning infrastructure as code.
Mostly we would use SaaS platforms to delegate responsibility and complexity away from the users. They usually come with a streamlined workflow and built-in checks. For example, we eliminated the margin for human error and it also usually reduces team friction.
We push ownership down to the users who are responsible for provisioning infrastructure and at the same time breaking down silos where interaction between teams is minimized to the necessary level.
Using SaaS platforms such as Terraform Cloud empowers users to deliver more with less and moves the ownership of their changes down to self-provision services, where they don't need to seek a lot of approvals and other interactions with other teams.
But is the platform itself enough? Is it enough to just have a SaaS platform such as Terraform Cloud to fully automate and delegate all the things away from ops?
When Terraform Cloud was initially announced at HashiConf 2019, and then shortly afterwards made generally available, we, as a SaaS consultancy, immediately jumped on the opportunity to give it a go and start testing and seeing how it all works, how it all connects, how it all fits within our story, and how this can help our customers achieve more.
Naturally, we started with manual configuration and twisting some knobs and buttons to see how it all works. As our projects grew and as our customers grew with the infrastructure that needs to be provisioned to Terraform Cloud, the complexity quickly increased.
As part of that, we noticed that operational overhead, managing the configuration of Terraform Cloud itself, such as workspaces and configuration related to that, quickly outgrew the initial tinkering with it.
We thought this was a good point to stop a bit, rethink, circle back, and try to see if there was anything else that we could do to optimize this workflow and automate it a bit more.
Shortly afterwards, we found out about the TFE provider, which seems to be an official provider for Terraform Cloud or Terraform Enterprise that you might run on premises.
It's an open source project. It supports most of the Terraform Cloud or Terraform Enterprise resources or data sources. It offers really nice programmatic access using the API endpoints of the SaaS platform, and also enables us to automate the manual configuration that we had done before.
At the same time—and this is a really powerful feature for us—because it's a Terraform provider, we can use an approach that we all are familiar with and comfortable with, which is infrastructure as code, by using HCL2, or HashiCorp Configuration Language, version 2, to configure Terraform Cloud itself.
If you think of it, using Terraform to configure Terraform Cloud seems kind of weird, but at the same time, it's a cool feature, right?
As we started using the TFE provider, everything fell into place. We are using Terraform to configure our configurations of Terraform Cloud, but we noticed that that doesn't eliminate the code duplication and the things that we needed to repeat whenever we try to configure a resource within Terraform Cloud.
Naturally, because this is Terraform code, we thought, "Let's package it as a module." That's a good way of keeping your code dry and easily accessible. And because it's a container for lightweight abstraction of complexity, you can easily reuse the code.
But at the same time, because we are using Terraform Cloud, we can publish it either in the public or private registry within Terraform Cloud. If you publish it on the public registry, you can widen your community reach.
In our case, most of our customers were constrained to what they can publish. So they used the private registry, but still it enabled them to use the repository for easy, reusable code that they used for configuring the rest of the workspaces.
A simple way of versioning your modules is to use Git tags and publish those in your Git repo. They are immediately available in the Terraform registry, and your code is easily available and shareable across your team, or even beyond the organization.
As I mentioned before, this is a good way of keeping your code dry and easily accessible, and it definitely eliminates all the concerns we had before.
First, we need to define a backend. In our case, we're using Terraform Cloud, so the backend would be a remote backend. We need to specify an organization and a workspace where this is going to be run. And it will use the Terraform module that we create, and that will be feeding it parameters in order to configure the rest of the workspaces for those.
Before we dive into the demo, let's look at how it's going right now for us. By using a Terraform module and a TFE provider, we ended up with a fully automated self-provisioning process that is Git-driven.
It offers us a central configuration point using HCL2. The code is reproducible and highly reusable. By just calling a module up from the Terraform registry, you can easily consume that module and use it for any kind of specific configuration or workloads that you will need. It's built up withe security in mind, so all the sensitive values are safely stored either within Terraform Cloud, using either Vault in the background or the external secrets provider, such as Vault again or the AWS secrets manager.
For the demo today, we are going to use a version control system, in our case GitHub, where we're going to have a repository where our configuration code for Terraform Cloud will live. We will have the Terraform Cloud workspace defined, which is going to be used to configure the rest of the workspaces.
In order to achieve that, we're going to use a private TFE registry, where our Terraform module will be published. And we are going to use that as a reusable piece of code to provision the rest of the workspaces that we are going to link to the same VCS repo, but they're going to be triggered from different subfolders. Of course, everything will be run and triggered in a Terraform SaaS managed solution.
Before we look at the specifics of how we organized our GitHub repository, let's switch to the Terraform Cloud console and look at what we configured here.
I have defined a TFE workspace, which is going to be used to configure the rest of the workspaces within Terraform Cloud.
This workspace is linked to my GitHub repository. It's going to be triggered on a Terraform Cloud subfolder. We are passing down a couple of tokens. Specifically, we're using the TFE token, which is used to access the Terraform Cloud API endpoints. We're also passing down a couple of other variables in order to access our AWS environment.
Let's have a look at the module that we're going to use as part of this demo. We are going to use a module called "workspaces," which also lives in the GitHub repo.
We're not going to go into details about how everything was implemented, but let's have a quick overview of how this works. From a high-level perspective, we are feeding it variable workspaces, which it's going to use to define all the rest of the workspaces that we define. It's going to link them to a specific working directory within the same VCS, and also link them to the VCS, using the same 0Auth identifier that we use for the configuration workspace. There's also an option to configure notifications and triggers.
Our repo is a simple structure where we have a Terraform top root folder. ithin that, we are defining a cloud folder, which is going to be used to configure our Terraform Cloud workspaces, and we have 2 other test folders that are going to be linked to the workspaces that we are going to provision using our Terraform code.
If you look at our code in the Terraform Workspaces workspace, we are using the module previously shown that lives in the registry. We are feeding it down with some of the parameters needed to be used in order to provision and connect to Terraform Cloud.
We're also defining a couple of variables, such as the test variable for "Hello HashiConf Global," and we're passing down some secure variables, which are safely encrypted and encoded so people cannot snoop on your sensitive credentials for AWS, for example.
With the variables that we're passing down to that module, we are defining 2 workspaces linked to different folders in the Git repo, and we are setting a trigger that will trigger the workspace No. 2 when workspace No. 1 successfully applies.
Let's have a look at the code we're using in our workspace. It's really simple. We are provisioning an S3 bucket and defining a random name for it.
Let's commit that code into our repo and push it to the main branch, though obviously that's not suggested for production workloads. As you can see, as soon as we push it to the main branch, the TFE Workspaces workspace is being triggered and it immediately starts planning our code from that repository or subfolder, which we define.
As you can see, the plan succeeded and it's trying to provision 2 Terraform Cloud workspaces linked to the same repo by different folders. It's also setting a couple of parameters such as test variables and AWS access and secrets keys.
We are also setting a trigger so when the workspace test 1 is successfully run, it's going to trigger the workspace test 2. Let's apply that.
As this one is applying, let's observe what happens in the Terraform Cloud console. As soon as this applies, it'll use the TFE token in order to connect with the API endpoint and immediately create 2 workspaces for us, which are called test 1 and test 2. In the test 2 workspace, let's see if everything was set in the order we want.
First, let's see the version-control system. It's linked to the same GitHub repo linked to the different folder, which is test 2, where the code's going to live and is currently being planned. Then there are variables that we set in our configuration repo that also have been set for this particular Terraform workspace for us.
In the meanwhile, it successfully planned. Let's inspect the plan for this current run. As you can see, it's provisioning the S3 bucket, and it's generating a random name using a random path resource.
Let's apply this specific plan and let it provision our infrastructure using the keys that we set in the AWS cloud.
As this workspace is running let's go back and inspect the test 1 workspace, which also successfully planned. This plan is much simpler. We're using a random path resource and are not provisioning anything in AWS, but this one is linked to test 2. So we're using a trigger on a successful run. This one will trigger our next one as well, as we configured before. Let's approve this one.
You can see the previous one successfully applied, and this one is currently running and applying the code. But what we are interested in here is to see, after this one applies, if this will trigger the run on test 2As you can see, the trigger was successfully configured, and it triggered the planning phase for our test 2 workspace, because we haven't pushed any of the code changes in that specific repo. It's not going to produce anything in the plan and it will just stop by saying, "No new infrastructure changes, so nothing else needed for this phase."
Before we clean up the workspaces, let's queue a destroy plan for test 2, because just destroying our workspace doesn't really clean up the resources. We don't want to keep our bill racking up in the background.
Let's queue the destroy plan and clean up all the resources that we provisioned during our first apply in the test 2 workspace.
As the plan runs, we're going to notice that it's going to connect to our AWS provider. It's going to check for the resource that we have, and it's going to produce a plan saying that it's going to destroy 3 resources, exactly the same amount we provisioned before, and it's going to end up with a clean account afterwards. Let's approve the plan and trigger the apply stage.
This run will apply the destroy. It'll remove all the resources.
As this is going ahead, let's switch back to our console and open the terraform.auto.tfvars again and remove whatever we enabled before, the trigger and both of the workspaces. Again, let's commit that to the repo and push the changes back in, which will trigger again the Terraform Cloud run for the configuration workspace.
The tfe-workspace, as you can see, has started planning. Let's see the plan output. It should be fairly quick as it's only connecting to the API and its local instance.
As you can see, it's planned out the destruction of 9 resources, exactly the same amount that we provisioned before. It's going to remove the triggers, the variables, and all the workspaces that we provisioned initially. Let's trigger a plan.
As this is running, let's go back to the workspaces and see what's happening. As soon as this runs and it connects back to the API, it'll go and remove all the workspaces programmatically.
To summarize, by using the TFE provider, we increase the deployment velocity and automation while provisioning Terraform Cloud workspaces. It helps us to reduce friction rolling on the infrastructure changes while using an infrastructure as code approach and thus utilizing HCL2.
There was no need to adopt a different approach or to learn an API, which would be needed to configure those using the programmatic access. And while the TFE provider is still beta and subject to change and still at version 0.26, it proved to be quite stable. A lot of our customers successfully adopted it and used it in production in order to provision their highly scalable and elastic Terraform Cloud infrastructures.
If you're operating in more of a cloud-native space, or if you're using Kubernetes for running your workloads, there is also a Terraform Cloud operator that is a little different than the TFE provider. You can use that and use cloud-native manifests and CRDs in order to provision your Terraform cloud workspaces.
There are alternatives to provision and configure your Terraform Cloud or Terraform Enterprise using the cloud-native constructs.
By using Terraform Cloud TFE provider and modularizing for reusability, we use the GitOps paradigm, where all the changes are centrally managed and triggered from our Git repo. This brings that our Terraform runs closer to a commodity, like, for example, electricity, where you just expect it to be there. It just runs without you needing to know all the nitty-gritty details and how it works in the background.
Additional readings for you:
Thank you for listening. I hope you enjoy the rest of the conference.