FAQ

Cloud Compliance & Management with Terraform - Typical Adoption Challenges at Scale

Why does it get tougher to use open source Terraform as the number of users grows?

Speakers

  • Corrigan Neralich
    Corrigan NeralichSolutions Engineer, HashiCorp

Transcript

Hi, I'm Corrigan Neralich, the solutions engineer at HashiCorp, and I'm here today to talk to you about Terraform and some of the challenges that organizations can face when they look to adopt this at scale internally in their organization.

I'm going to walk you through the typical adoption patterns we see internally in organizations.

Typical Terraform adoption

Typically you'll see a single developer writing Terraform configuration files that live locally on their laptops. They have a set of credentials that give them the ability to deploy to the cloud or hybrid cloud environment, whatever platform they may be using.

They will first off run what we call a plan, in order to see what Terraform intends to go build based on what they've defined in their configuration files. Then if everything looks good, they will run a Terraform apply, which will then go and build those resources in those various platforms.

Once this has been complete, Terraform generates what we call the state file, which is essentially the source of truth, a living record of what Terraform has gone and built in those platforms.

This is tremendously powerful, as now it becomes an iterative process, where anytime there are updates to that configuration, Terraform is able to essentially perform a diff and understand what's been added, what's been removed, what's been changed, and only concern itself with those differences. So it provides a degree of idempotency.

Teams are the norm

But typically you're never going to see a single developer working alone. They don't operate in a vacuum. You're going to want to enable multiple teams or team members within the same team, to work with this same tool. You need a certain degree of collaboration.

Phase 1 is usually, "Let's push these files to version control," so that at a minimum, developers can work off of these configurations and follow the same Git workflows that they do for their application code development.

The second challenge they typically encounter is there are 2 state files, and you want to make sure that they are working off of the same, corresponding state file. In order to solve for this, the next phase is usually pushing it to some sort of remote storage.

Whether this is S3 or Azure Blob storage or Consul, the key idea here is now these developers are able to collaborate on the same configuration files and they also can ensure that they are working off of the same, corresponding state files.

As you can see, multiple developers usually start to grow rather quickly. You never have just 2 team members; you have 100, or maybe your organization scales to 1,000.

When configs and state files proliferate

Now that you've solved this initial challenge of how to enable collaboration, ensuring that these developers are working off of the same configuration files in the same, corresponding state file, you start to see a proliferation of these configurations, each of which might represent an individual project or set of independently managed infrastructure. Ultimately what you begin to see is a proliferation of configurations and state files.

From a security perspective, this introduces a number of different challenges. First off, you ultimately want to limit access to these state files and adhere to least privilege internally. You want to make sure that you establish your own role-based access control (RBAC) around these to ensure that team A can access team A's state files, and team B can access team B's. But really you don't give everyone access to everything.

Secondarily, everyone has their own sets of credentials to enable them to provision, since these commands are happening locally on their machines.

What you start to see is what we refer to as "secrets sprawl," where everyone has their own sets of credentials and it becomes incumbent upon the individual to adhere to security best practices.

You're ultimately relying on everyone to keep these secure, to not check them into version control—which unfortunately does happen, and that just becomes a significant increase in your risk and your security exposure.

The second challenge you start to experience is one of workflows. Because all of these commands and deployments are happening locally, oftentimes it becomes very difficult for you to enforce your desired process. In this case, this could represent a developer, say, forking a repository and deploying against that. Or let's say accidentally forgetting to pull the latest version from master or switching off of their test branch locally before they deployed.

Now you have to go through what amounts to a very painful reconciliation process to ensure that what exists in the master branch of version control is in fact reflective of what exists in your environment. Which is really paramount here for your security, as any of these resources that are provisioned out of band or out of your prescribed workflow potentially aren't adhering to your best practice.

The rogue Terraformer

Finally, the last instance that I'll highlight today is potentially an individual sidestepping this entirely. Because every individual in this case likely will have their own sets of credentials, there's really nothing stopping them from, say, logging in directly to these cloud platforms and spinning up their own infrastructure outside of the Terraform workflow. Now you've lost all visibility and control over what these individuals are doing.

Best case here is, you now have unaccounted-for costs. Maybe they're not tagging their infrastructure, so it becomes very difficult, once you do discover this, to track it back to the appropriate business unit. But you also introduce significant risk, as these individuals may or may not be, say, applying best-practice security groups. Or they are deploying VMs that are open to the internet.

As you can see, Terraform adoption across large organizations presents certain challenges related to security—managing credentials, for example, or establishing RBAC around your very sensitive state files—as well as controlling your prescriptive workflows internally, making sure that individuals are following, end to end, exactly the process that you have defined for your organization and ensuring that they aren't doing anything risky in the cloud or on premises.

To learn more, visit HashiCorp.com/Terraform.

More resources like this one

  • 3/15/2023
  • Presentation

Advanced Terraform techniques

  • 2/3/2023
  • Case Study

Automating Multi-Cloud, Multi-Region Vault for Teams and Landing Zones

  • 2/1/2023
  • Case Study

Should My Team Really Need to Know Terraform?

  • 1/20/2023
  • Case Study

Packaging security in Terraform modules