Products & Technology

Detecting and Managing Drift with Terraform

Jun 07 2018Christie Koehler

This guide exists for historical purposes, but a more up-to-date guide can be found on the HashiCorp Developer page: Manage Resource Drift.

HashiCorp Terraform enables you to safely and predictably manage the lifecycle of your infrastructure using declarative configuration files. One challenge when managing infrastructure as code is drift. Drift is the term for when the real-world state of your infrastructure differs from the state defined in your configuration. This can happen for many reasons. Within the context of your configuration, it happens when adding or removing resources or changing resource definitions. External to your configuration, drift occurs when resources have been terminated or have failed, and when changes have been made manually or via other automation tools.

Terraform cannot detect drift of resources and their associated attributes that are not managed using Terraform. For example, Terraform will not detect changes in a virtual machine that have occurred as a result of installing applications locally or using a configuration management tool like Chef or Ansible.

This post explains how to use Terraform to detect and manage configuration drift. We will cover:

Terraform State. The state file and how Terraform tracks resources.
Terraform Refresh. The refresh command and reconciling real-world drift.
Terraform Plan. The plan command and reconciling desired configuration with real-world state.
Terraform Config. Useful config options for managing drift.

For the rest of this post, we will use this example resource configuration snippet to illustrate different scenarios and features of Terraform:

# AWS EC2 VM with AMI and tags
resource "aws_instance" "example" {
  ami           = "ami-656be372"
  instance_type = "t1.micro"
  tags {
    drift_example = "v1"
  }
}

»Terraform state: How Terraform tracks resources

In order to create and apply plans, Terraform stores information about your infrastructure. By default this information is stored locally in a file named terraform.tfstate. It can also be stored remotely, for use in a team environment. The state file will not exist until you have completed at least one terraform apply.

The state file is essential to Terraform and performs these functions:

Map resources defined in the configuration with real-world resources.
Track metadata about resources such as dependencies and dependency order.
Cache resource attributes to improve performance when managing very large infrastructures.
Syncing, which enables better collaboration among teams.
Track resources managed by Terraform, to ignore other resources in the same environment.

The format of the state file is JSON and is designed for internal use only. For this reason, directly interacting with the state file is discouraged. Instead, use terraform show to show the current state for your entire configuration:

$ terraform show
aws_instance.example:
  id = i-011a9893eff09ede1
  ami = ami-656be372
  availability_zone = us-east-1d
  instance_state = running
  instance_type = t1.micro
  tags.drift_example = v1
  ...

You can also use terraform state show to inspect a specific resource:

$ terraform state show aws_instance.example
id                                        = i-011a9893eff09ede1
ami                                       = ami-656be372
availability_zone                         = us-east-1d
instance_state                            = running
instance_type                             = t1.micro
tags.drift_example                                 = v1
...

»Terraform refresh: Reconciling real-world drift

Prior to a plan or apply operation, Terraform does a refresh to update the state file with real-world status. You can also do a refresh any time with terraform refresh:

$ terraform refresh
aws_instance.example: Refreshing state... (ID: i-011a9893eff09ede1)

What Terraform is doing here is reconciling the resources tracked by the state file with the real world. It does this by querying your infrastructure providers to find out what's actually running and the current configuration, and updating the state file with this new information. Terraform is designed to co-exist with other tools as well as manually provisioned resources and so it only refreshes resources under its management.

The output for a refresh is minimal. Terraform lists each resource it is refreshing along with its internal ID. Running refresh does not modify infrastructure, but does modify the state file. If the state has drifted from the last time Terraform ran, refresh allows that drift to be detected.

By default, a backup of your state file is written to terraform.tfstate.backup in case the state file is lost or corrupted to simplify recovery.

»Terraform plan: Reconciling desired configuration with real-world state

Now that the state file is up to date, Terraform can compare the desired state, defined in your configuration, with the actual state of your existing resources. This comparison allows Terraform to detect which resources need to be created, modified, or destroyed and forms a plan.

A Terraform plan is a description of everything Terraform will do to implement your desired configuration when you apply the plan. Terraform plan is done automatically during an apply but can also be done explicitly. If you are just starting out and have not deployed any infrastructure, the plan will be to create all of the resources in your configuration. If you have existing infrastructure, Terraform may need to edit existing resources, or destroy and create new ones. Running terraform plan creates this plan and tells you what changes it will make to your infrastructure.

Using our same example, we can see the output of plan after having manually updated the tags on the instance using the AWS console:

$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

aws_instance.example: Refreshing state... (ID: i-011a9893eff09ede1)

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  ~ aws_instance.example
      tags.drift_example: "v2" => "v1"

Plan: 0 to add, 1 to change, 0 to destroy.

------------------------------------------------------------------------

We can see Terraform will update the value of the tag from v2 to v1. Terraform is trying to correct the drift and modify the tag to match the value in the configuration.

Not all drift can be fixed by updating a resource, sometimes resources need to be recreated. Using our same example, we can see the output of terraform plan after having manually terminated the instance using the AWS console:

$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

aws_instance.example: Refreshing state... (ID: i-011a9893eff09ede1)

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + aws_instance.example
      id:                           <computed>
      ami:                          "ami-656be372"
      availability_zone:            <computed>
      instance_state:               <computed>
      instance_type:                "t1.micro"
      tags.drift_example:           "v1"
      ...

Plan: 1 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

We can see that Terraform, having detected that the resource specified in the configuration no longer exists, will create a new instance of it with the values specified in the configuration.

When drift occurs in resources that still exist, for attributes that cannot be updated, Terraform will destroy the original resource before re-creating it.

Using our same example configuration, we specify a new AMI value:

resource "aws_instance" "example" {
  # updated AMI
  ami           = "ami-14c5486b"
}

Running terraform plan with this update configuration results in the following:

$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

aws_instance.example: Refreshing state... (ID: i-06641647ef59e4304)

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

-/+ aws_instance.example (new resource required)
      id:                           "i-06641647ef59e4304" => <computed> (forces new resource)
      ami:                          "ami-656be372" => "ami-14c5486b" (forces new resource)
      associate_public_ip_address:  "" => <computed>
      availability_zone:            "us-east-1c" => <computed>
      instance_state:               "running" => <computed>
      instance_type:                "t1.micro" => "t1.micro"
      tags.drift_example:           "v1" => "v1"
      …

Plan: 1 to add, 0 to change, 1 to destroy.

------------------------------------------------------------------------

We see that to reconcile our configuration with real-world state, Terraform will first destroy the existing instance, built with the original AMI, and then recreate it with the new AMI.

»Lifecycle options: Configuring how Terraform manages drift

Terraform provides some lifecycle configuration options for every resource, regardless of provider, that give you more control over how Terraform reconciles your desired configuration against state when generating plans.

One of these options is prevent_destroy. When this is set to true, any plan that includes a destroy of this resource will return an error message. Use this flag to provide extra protection against the accidental deletion of any essential resources.

In the last example, where we updated the AMI of our resource, terraform plan indicated that the existing instance would be destroyed. To prevent this behavior, add the following to the resource’s definition:

  lifecycle {
    prevent_destroy = true
  }

Running terraform plan now generates an error, alerting us that applying this plan would destroy resources:

$ terraform plan
Error: Error running plan: 1 error(s) occurred:

* aws_instance.example: 1 error(s) occurred:

* aws_instance.example: aws_instance.example: the plan would destroy this resource, but it currently has lifecycle.prevent_destroy set to true. To avoid this error and continue with the plan, either disable lifecycle.prevent_destroy or adjust the scope of the plan using the -target flag.

While returning an error when any resource with prevent_destory = true will be deleted is useful for preventing the accidental destruction of resources, Terraform won’t allow us to make any other changes when this happens.

Instead, another option for managing drift is the ignore_changes parameter, which tells Terraform which individual attributes to ignore when evaluating changes.

Using our same example, we add ignore_changes = ["ami"] to the lifestyle stanza and re-run terraform plan:

$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

aws_instance.example: Refreshing state... (ID: i-06641647ef59e4304)

------------------------------------------------------------------------

No changes. Infrastructure is up-to-date.

This means that Terraform did not detect any differences between your
configuration and real physical resources that exist. As a result, no
actions need to be performed.

This time, rather than an error, even though the AMI of the instance is different from what is specified in the configuration, Terraform reports that no changes have occured. This is because, in the process of reconciling configuration with real-world state, Terraform ignored the values for AMI.

Another lifecycle flag is create_before_destroy. This is used for controlling the ordering of resource creation and destruction, particularly for achieving zero down time.

»Summary

Drift is the term for when the real-world state of your infrastructure differs from the state defined in your configuration. Terraform helps detect and manage drift. Information about the real-world state of infrastructure managed by Terraform is stored in the state file. The command terraform refresh updates this state file, reconciling what Terraform thinks is running and its configuration, with what actually is. All plan and apply commands run refresh first, prior to any other work. Detect drift with terraform plan, which reconciles desired configuration with real-world state and tells you what Terraform will do during terraform apply. Terraform provides more fine grained control of how to manage drift with lifecycle parameters prevent_destroy and ignore_changes.

Detecting and Managing Drift with Terraform

»Terraform state: How Terraform tracks resources

»Terraform refresh: Reconciling real-world drift

»Terraform plan: Reconciling desired configuration with real-world state

»Lifecycle options: Configuring how Terraform manages drift

»Summary

Sign up for the latest HashiCorp news

More blog posts like this one

Terraform embraces Arm: More choice, greater flexibility

HCP Terraform introduces Hold Your Own Key (HYOK)

Enforce private module registry usage in Terraform with Sentinel