Want to be better informed about the threat vectors around Terraform? Marko Bevc from the Scale Factory provides an expert overview.
Today we're going to cover two of my favorite topics: security and infrastructure as code. It warms my heart to hear the likes of Zurich that we heard just now. They're using things like Terraform Cloud, and they're aware of different security approaches you need to take using infrastructure as that. Without further ado, I’ll dive into it. I'm going to start with a little bit of graphics and information here.
Coincidentally, this aligns nicely with what we heard this morning with the keynote. I'm glad to hear that my research showed the same thing that state of the cloud from HashiCorp revealed as well. When I was preparing for this talk, I looked at some of the research out there — specifically, the Gartner and Snyk reports to see where the industry is going and what the major concerns are going forward.
If you look at the Gartner research, luckily, the cloud is still on the rise, so we are in the right room and right industry. But more importantly is security is still the number two concern, with 78% for most of the companies in the cloud.
On the other side of the spectrum, if you look at the research from Snyk, they found out more than 80% of organizations have actually seen a major or significant security incident during their cloud journey. It's quite worrying information that we're seeing. Even though it's a nice way of scaling up your company and getting your product out in front of your customers, there are still a couple of things you need to take care of as you onboard this journey.
Quickly breaking it down a bit — if we look at this graph, one of the major cloud incidents people see is caused by downtime due to misconfiguration — and that's 34%. If you break it down even more, it's usually something like IMS configuration, object storage misconfiguration, insecure API keys, and all the good things.
Even though this is a well-known fact, and we are all aware of this, we still do that. Can I get a raise of hands, whoever has done that or seen that in the past? A lot of you. Even though security is important, we still try to take a lot of shortcuts — maybe try to get things deployed quicker and increase the velocity, even though it might not be the most secure way of doing it.
Maybe a takeaway from here would be at this point — we're going to dive into the details later. But security through obscurity should never be the only security mechanisms you have. So, even though you're putting your key under the doormat, people will still get around that and find it eventually. Maybe just tripping on it, or something.
So quickly, something about me. As I was introduced, my name is Marko. I work as a head of consultancy at The Scale Factory. We are an AWS consultancy focusing on SaaS, B2B businesses. I have a strong ops background, so I have worn different hats throughout my career. Everything from SYSOPS to consultancy now — and during my career, I've seen things as well.
I hope that qualifies me to share some of our experience and what to watch out for in the cloud native space, specifically around security. I'm also an open source contributor. You can find me in a lot of projects, and I'm a maintainer of a few as well, such as RVCD helm charts , and I'm also a HashiCorp ambassador.
We are a friendly bunch of people. There are a couple of us around here. So, we are definitely happy to help you with any questions you have around the technology as well. When I'm not pressing the little plastic squares. I also like hiking and traveling as well, which I can now do more of now. Please feel free to reach out through any social platform if you'd like. I'm probably the most active on Twitter. That's still around, I guess. If you're on the other ones, that works as well.
We're going to initially start with Terraform workflows on a fundamental level to set the scene first.
Then we'll look where sensitive values can come from, where you can store those, what the attack vectors are, and what potential threats are storing sensitive values at different points. We'll look at some remediation prevention, and I'm going to wrap it up at the end with some takeaways.
Let's have a quick look at the workflows. As mentioned throughout the day — multiple times — you can run your Terraform using the Community binary, so that would be a simple workflow of running Terraform in your pipeline somewhere or using any wrappers. I think we probably heard Terragrunt or something like that earlier.
We have Terraform Cloud, currently my favorite platform. We can split workloads. Three basic ones if you're using Terraform Cloud-driven runs.
You can run your Terraform as a VCS-driven run, which means it's triggered by your code in your version control system.
You can use the CLI type, which is literally using Terraform Cloud just as the backend.
Then the API, which requires a little bit more tooling on your end.
All of those can elevate and use the secrets available through Terraform Cloud or any other HashiStack tooling out there.
If you operate in the cloud native space and Kubernetes is your name of the game, you can definitely go for the Terraform Kubernetes Operator. To be fair, I haven't heard a lot about that lately, but it's an interesting way to provision your workspaces through CRDs and then run your Terraform within that as well.
If you're more inclined to use structured programming languages or procedural program languages, you might go with CDKTF. That gives you the opportunity to use strong types such as Golang, C#, or TypeScript. That gives you opportunity to store your secrets in a slightly different way as well.
For example, with Terraform Kubernetes Operator, you can store it as Kubernetes secrets or, as we've seen earlier, using the Vault Operator, which syncs the secrets between your secret managers and your Kubernetes clusters. There are third-party tools as well, such as Atlantis, env0, Scalr, Spacelifts, etc.,like the likes of those.
I'm not going to dwell on that too much. I'm sure most of you are familiar with how that looks. Best to look at the open source workflow. It's similar to what you would get out of the Terraform Cloud with additional functionality. But, if we break it down and look where the inflection points are, you would usually have your code stored typically somewhere in your Git repo.
Then as soon as you trigger Terraform run, you would initialize your modules, providers, do the planning, applying and provisioning through your cloud providers. The secrets can come from the cloud providers. You can have it in code — I hope not — and external systems as well, but let's dive into those right now.
If we have a look at sensitive values and threats using Terraform specifically — where can those come from or live? We can break it down into four major areas.
Provider tokens (execution authentication)
Hardcoded passwords or API keys
Logs and outputs
State content and access
First, to run your Terraform, you need to authenticate with your backend, with your providers. So, you need to use provider tokens for that. That's the execution authentication you're doing.
Specific resources would require you to set initial secrets or passwords — such as databases. Those can live hardcoded as passwords in HCL, or you can refer to them as the API keys.
Slightly less obvious is logs and outputs. If you are using your Terraform runs as part of your pipelines. Sensitive values might actually be emitted as part of the output and logs. That might be a potential attack vector as well for people that have access to logs but might not necessarily have access to resources you're provisioning.
The last one, and probably the most important, is the state content and access to the state as well. It's not just about the content of the state but also ensuring people that have access to the state are properly controlled, and you're using the least privilege access for those resources as well.
Let's have a quick look at some code samples and how we can use those.
Like I mentioned, this is a very simplistic example of using the AWS provider. To access that, you need to provide access keys and secret access keys that can either live as part of the HCL, or you can inject that as part of your environment or external variables.
Don't get me wrong, that can be stored securely as well. So, it's not necessarily wrong to use environmental variables or inject them from other environments. But, as we heard earlier, it's good to see that my presentation is matching what we heard so far. The next improvement on that would definitely be using something like dynamic provided credentials using OpenID Connecter protocol.
I'm not going to dive into that in detail. We've seen a lot of graphics in the previous presentation. Also, there is a more in-detail workshop following a couple of sessions this afternoon. If you're interested in that, definitely attend.
But the main advantage and improvement versus static keys is not just that you don't have to provide the keys, and are avoiding using the long lived keys. That's, I think, the strong point of using the dynamic credentials.
You can use it in a similar way to keys. The only difference is — as Kyle showed us earlier — there is literally nothing you need to hide. You just need to specify a role, session name, and identity token if that's the approach you're having — just apply your code, and automatically works.
Kubectl establishes the trust relationship between your Terraform execution environment on the one end and the IM identity provider on the other. If you're using an AWS provider, that would be AWSIM. Terraform Cloud supports that as well now, as we heard. In a similar way, you can inject that as part of your Terraform Cloud workflow.
Let's quickly have a look at some code. Normally my presentations would have demos, but this time, unfortunately, no demos. But we will be looking at some code at least. If we look at this simple example. It's slightly less simple than I wanted.Nevertheless, as I mentioned before, if you're provisioning something like aws_db_instance, you always need to provide an initial admin username and password.
If you were to store that in the HCL code — that would be literally putting the key on the doormat, not underneath. I hope you don't do that, but there are other ways to get around it. You can use variables, and I hope you're not using defaults. But, I wanted to demonstrate that it can either live in your code or you can actually inject that as part of your run environment using something like Terraform variables like tfvars or environmental variables as well if this is part of your workflow.
Since 0.14, you can also set the sensitive value to true. There are certain arguments of the resources, such as in aws_db_instance, that would automatically mark passwords as sensitive. There is no implicit need of doing that. But if using variables, make sure that you mark them as sensitive if this is something you don't want to emit in your outputs.
As we can see in this screen, since we marked the value as sensitive, it'll be omitted and masked out in the Terraform run outputs. In this case, the password is marked as a sensitive value, and we cannot see what it says.
But at this point, the question is, is this really enough? You cannot see that from the Terraform run, but if we have a quick look at the Terraform state — there's just a snippet of that state — there is more to it. But if you look at the snippet of that state for that db_instance, you'll see the password is stored as clear text. Even though we mark that as a sensitive attribute, it's still clear text. You might ask yourself, that's a weird way of storing that. But thinking back, for Terraform to apply the resources and see if you have drift or if it needs to change anything, it needs to have access to that as well.
There you have it, we have clear text passwords there, but at this point, you may ask yourself, is there anything more we can do? Luckily, there is. As you're probably aware, we can use something like AWS Secrets Manager, Vault, or any external secrets manager. In this example, we're randomizing the password to reduce human error in terms of people storing the passwords locally on the laptops or — god forbid —post-its.
We're feeding that into the secrets manager, ingesting that as a resource for the password field, and using the json_decode` to get it from the JSON form that AWS Secrets Manager is returning. We're using the external password manager now, and passwords are completely stored securely. Everything is nice and fine. We can call it a day, but let's have a look at the state again.
It's a snippet, so it's not the full state, but I wanted to show specifically in this case — for the AWS Secrets Manager secret version — it still holds the secret string as a plain text, and the reasons are still the same. Even though using an external secrets provider, Terraform needs to be aware of the secret because otherwise, it cannot detect the drift, it cannot provision it on the change, and things like that. You still need to store it like that.
So, we haven't sorted it out, and the reason why I wanted to get to this point is I was really surprised when I talked to customers or even when I was preparing for this talk. I was reading a couple of blogs from even security companies, and a lot of people mention scenarios where they say: “use the external secrets provider, and you have everything encrypted," and that's not true. The state is still plain text, regardless of what you're doing. It's definitely good to keep that in mind going forward. The secret might be encrypted at the ingestion point, but the state is still plain text.
Before we look at the recommended approach is — and probably many of you are sensing where I'm going with that — I would like to pose another question: Are there any other threat vectors or possible threats from running your Terraform? Have we covered it all with those four?
The answer is definitely no. In the world of fast-paced development, you probably heard of a thing called supply chain attacks. That also applies to your Terraform runs. So, it might apply in a different way regarding which workflow you're using. Still, definitely, we heard Zurich mentioning before they're using their private module registry.
But for example, if you’re using a public module registry, are you sure what code you're ingesting or are you pinning your module versions because maybe the upstream gets compromised or they introduce functionality that might break or — god forbid — expose your secrets. So, definitely, something to be aware of going forward. It's not what we already mentioned.
There are binary sources as well. If you're running Terraform as part of your pipeline and not using something like Terraform Cloud, are you checking signatures for the binary? Are you checking the provenance? Are you sure where the code is coming from? Somebody could pretend they're the source of your binaries that you are pulling down. Or if you're upgrading your version of the binary, you know where it's coming from. So, it's something to keep in mind as well.
There is a dependency injection as well. So, if using any third-party plugins or providers, I would definitely urge you to — not to necessarily check the source code — but at least know where it's coming from and establish some trust in the plugins you're ingesting.
Because as we know, when you're doing Terraform init`, that wouldn't pull in the third-party plugins. You need to make sure you're putting them in place. And this is the point when you need to make sure the sources or artifacts you are using are secure — that they're not going to expose you in any other way.
With all the supply chain attacks, the main aim is usually — at least what we see working with customers — is they can be used to provision additional resources and abuse them in different ways. For example, like Bitcoin mining — I'm not sure what the most popular abuse is today — or they can be used to elevate permissions further. Definitely, something to keep in mind going forward.
We covered a lot, but I think Rob insinuated earlier something about AI, so I definitely wanted to mention that as well. I want to mention that because we've seen a big uptick in that. The first thing I want to say; generative is not intelligent — keep that in mind.
I think this is a common misconception. Even though the code is being generated, it doesn't mean it's correct, secure, or going to work. Even though it's very popular, at least from what we've seen, it usually raises more questions than answers. It can accelerate your path or make development slightly easier. But at the same time, you have to be aware of the potential risks or threats coming from that direction.
For example, do you know what models are being used for your code prediction, how they're being trained, and what data has been used? I think those are the questions that are usually skipped when people say there is a bunch of code coming out of thin air, and it works right, and it looks about right.
If you’re using a third party, make sure that you trust them. Their models can be exposed as well. Maybe they have leaks of their own, so that's a thing. Also, I've seen a lot of prompt injections as well lately. People are getting creative about how to ask things like ChatGPT how to get things. If you're training your own models, make sure that you don't put secrets in the training data, that's definitely a no-go.
Let's have a quick look at the remediation and prevention, so how would we remediate against those things that I've talked about so far.
The first one is to use secrets managers. Use things like Vault, KMS, and any dedicated secrets management solution out there. It can store your secrets much more securely than you can on your own. I'm sure of that.
There are a lot of SaaS products out there that can ease up the transition if you're not familiar with those things. We've seen Vault releasing Vault secrets now as part of the HCP platform.I think that's a nice way of getting familiar with that if you're not using that already. All the major cloud providers provide something of their own as well.
When you are creating roles and IM permissions for your Terraform execution roles, make sure that you use least privilege. When I say that, I mean it doesn't necessarily have to always be full administrator IAM, which I see quite often. A lot of people go likeAn administrator access, it works? It's fine. Do you need that, or can you scope it down, maybe using something like IAM boundaries and things like that? It's quite easy to scope it down and limit the blast radius in case something goes wrong.
Then next thing — definitely encrypt your state. We mentioned state earlier. We are all aware that everything is plain text in it, so if you are using a remote state, definitely make sure that the remote backend is encrypted and addressed.
Terraform automatically insures encryption in transit, so you don't have to make sure that happens, but at least make sure the backend is encrypted. Another question there is the content encryption. We've seen customers, for example, using something like git-crypt. I wouldn't say it's the worst thing, but probably not something I would suggest using, but it keeps your secrets secure. It always makes me twitch when somebody says I'm committing my secrets to Git — it makes me a little bit uncomfortable even though they're using PGP.
There is a longstanding GitHub issue on Terraform, the famous 9556, open since 2016 on encrypting the Terraform state. That issue is still open, by the way. That only proves it's a hard problem to solve, and there are reasons why it's not encrypted. That's why I cannot stress enough to encrypt the backend.
Moving on, I would suggest using company-wide governance to set up policies and guardrails for people provisioning things. We heard during the previous talk they're deploying things using the landing zone across their state — something we would suggest as well. While working with customers, this is something that would give you the best control and central governance over your state. There are solutions for that, like Sentinel, the native HashiCorp tool, and then the OPA, which is also part of the Hashi tooling as well. So you can use it using Terraform Cloud now as well.
Definitely use code scanning. I was glad to hear that during the previous talk as well. When you're pushing your code through — even pre-commit — make sure you're using tools such as checkov, tfsec, terrascan, etc. Those are all valid to use tools that would raise issues early on, so it all aligns with the general shift left security mantra. I would definitely recommend using those.
Then, as mentioned earlier, make sure that you use trusted sources for your artifacts, such as binaries or modules and things like that, and make sure that you're pinning versions. You wouldn't believe how many modules I see people using without version pinning. It's like they're using pessimistic pinning, and they think that's fine until it's not.
Something like Terraform Cloud or HCP, I think it ticks a lot of boxes. I'm suggesting that constantly not just because I'm a HashiCorp ambassador but I think it also gives you a nice streamlined workflow and sets you up for success early on on your journey. Regardless of where you are in your journey — or your cloud adoption story — I think this is a good foundation to start with. And it manages the whole lifecycle for your infrastructure as code as well, which is a good thing.
To wrap it up, we didn't talk about securing your infrastructure as such. It's an important topic — not something we are covering today — but also worth keeping in mind. But if there is anything I would like you to leave with from this talk is this: Protect your crown jewels, and those are definitely state safekeeping. Make sure that it's encrypted. Secrets manager, streamline workflows, central governance and avoid long-lived keys using things like OADC. And don't be the person hiding your secrets under the floor mat.
Thank you. Hope you're going to enjoy the rest of the conference.