Controlling Costs with Terraform Enterprise

This blog explores Terraform features including modules, policy as code, and automated governance that organizations can use to optimize cost savings and management.

Anthony Davanzo

Terraform

May 6, 2019

Anthony Davanzo

Terraform is a tool for provisioning and managing infrastructure, particularly useful for multi-cloud infrastructure deployments. Many organizations adopting a cloud operating model use Terraform to provision infrastrastructure. One approach organizations take is a “lift and shift” of existing on-premises deployments into the cloud. However, this strategy doesn’t take into account how organizations can leverage the unique cost-saving benefits of the cloud, such as ephemeral work loads or auto-scaling machines that adapt to use. Terraform has features including modules, Sentinel policies, and automated policy enforcement that organizations can use to take advantage of these benefits and optimize cost savings and management.

»Codification Through Modules

To begin controlling costs organizations can codify infrastructure components into modules. Modules are packaged terraform configurations that can be quickly reused and modified through variables. Modules facilitate cost savings by allowing organizations to build operational best practices into reusable templates. This allows operators provisioning infrastructure to work quickly without sacrificing quality of production. Through these codified best practices organizations control cloud spend and prevent costly and unwieldy non-standardized solutions.

A look at modules in the Terraform Enterprise UI.

»Cost Sensitive Policy as Code

Sentinel is a policy as code framework that allows users to codify guardrails. In the context of Terraform Enterprise, Sentinel enables organizations to codify guardrails around infrastructure provisioning. You might think of software applications like factories and infrastructure costs like utility costs in those factories. As the factories get more complex it becomes difficult to manage and understand the utility costs, so Sentinel helps organizations attack that challenge in a programmatic, easy to administer fashion. By writing cost sensitive Sentinel policies organizations are able to better manage costs even as applications become more complex. There are many cost-sensitive axes one could write sentinel policies for, but the main ones we’ve identified are machine size and machine lifespan.

»Limiting Machine Size

Businesses adopting the cloud frequently run into “Cloud Waste” or unused cloud. Writing Sentinel policies that control machine size is a common approach to preventing cloud waste and optimizing costs. To write these policies, you create a list of allowed machine types and write a policy that prevents machine types outside of that list from being provisioned. Here is an example limiting machine size in Google Cloud:

import "tfplan"

allowed_machine_types = [
  "n1-standard-1",
  "n1-standard-2",
  "n1-standard-4",
  "n1-standard-8",
]

main = rule {
  all tfplan.resources.google_compute_instance as _, instances {
    all instances as _, r {
      r.applied.machine_type in allowed_machine_types
    }
  }
}

You can see that only machines smaller than n1-standard-16 (i.e. n1-standard-1 through n1-standard-8) are allowed to be provisioned.

»Environment Right-Sizing

Terraform and Sentinel can also help save costs by controlling what infrastructure is used for development/test environments. Often times development and test environments use the same infrastructure as production ones, while bearing much lighter or more infrequent loads. There are a few ways to achieve this result. One possibility is tagging resources as “prod/dev/test” and then creating a sentinel policy like the one above that limits machine size based on those tags. Another option is to create a ‘promotion’ variable in your terraform configuration that corresponds to the ideal set of infrastructure for that environment. For example, setting that variable to “prod” would correspond to a map of large, high performing compute/storage while setting it to “dev” correspond to smaller, lower performing ones.

import "tfplan"

main = rule {
  all tfplan.resources.aws_instance as _, instances {
    all instances as _, r {
      (length(r.applied.tags) else 0) > 0
    }
  }
}

This policy enforces that all AWS resources must be tagged.

»Limiting Machine Lifespan

Another cost saving strategy is limiting the lifespan of machines by ensuring that unused machines are shut down. A basic method for achieving that result is to set a TTL (Time to Live) variable for your Terraform configurations and then setting up a mechanism to check those TTL’s and queue destroys of configurations that have expired. We have two resources that go into more depth on this strategy. The first is an open source reaper bot, which is available here. The second, is this guide, which walks through how to get the same reaper bot functionality using AWS lambda functions.

»Automated Workflows

On top of codified infrastructure and enforcement of cost-sensitive policies, Terraform provides for automation that removes manual effort and reduces risk. Terraform Enterprise can be called via API within a deployment pipeline, and the guardrails of Sentinel policy as code are automatically applied to applicable infrastructure. This automation means fewer operators focused on infrastructure and more operators working on an organization's core business.

»Conclusion

Terraform Enterprise users can realize cost savings by ensuring provisioned machines are right-sized for their loads, creating maps of infrastructure sizes fit for respective environments, and destroying unused resources. For more help getting started with Sentinel review our guide, docs, and example policies.