8 Terraform continuous validation use cases for AWS, Google Cloud, and Azure
Learn how you can use Terraform check blocks and continuous validation with AWS, Google Cloud, and Azure services.
Just because everything worked when you provisioned your infrastructure, you can’t assume everything will continue to work properly after deployment. Continuous validation is a foundational feature for HashiCorp Terraform Cloud Plus that helps make sure infrastructure is working as expected. Use cases for continuous validation include closing security gaps, controlling budgets, dealing with certificate expiration, or even just knowing whether a virtual machine (VM) is up and running.
Continuous validation for Terraform Cloud Plus provides long-term visibility and checks for your infrastructure’s health and security — including service configuration, identity and access management, or anything utilized by an application’s business logic — to ensure infrastructure is working as expected. Specifically, continuous validation lets users add assertions so Terraform can proactively monitor if configurations or modules with assertions are passing, and notify users if any assertions fail. This helps users identify issues when they first appear and avoid situations where a change is identified only during a future Terraform plan/apply or once it causes a user-facing problem.
Users can create assertions in their Terraform configuration using check blocks, which are available in Terraform 1.5 and later. Check blocks are a new HCL language feature that allows users to define assertions as a custom condition expression and an error message. When the condition expression evaluates to true, the check passes. When the expression evaluates to false, Terraform shows a warning that includes the user-defined error message so users can take immediate actions to remedy the situation.
Custom conditions can be created using data from Terraform providers’ resources, data sources, and scoped data sources. Data can also be combined from multiple sources. For example, you can use checks to monitor expirable resources by comparing a resource’s expiration date attribute to the current time returned by Terraform’s built-in time functions.
This blog post shows how you can define checks in your Terraform configuration to address a number of use cases, using data returned from Amazon Web Services (AWS), Google Cloud, and Microsoft Azure. You can use the list below to jump directly to the use cases for the cloud providers that interest you:
AWS use cases
Google Cloud use cases
- Assert that a VM is in a running state
- Check if a certificate will expire within a certain timeframe
- Validate the status of a Cloud Function
Azure use cases
- Assert that a VM is in a running state
- Monitor if a Container App certificate will expire within a certain timeframe
- Check if an App Service Function or Web App has exceeded its usage limit
» AWS: Ensure your AWS account is within budget
AWS Budgets allows you to track and take action on your AWS costs and usage. It monitors your aggregate utilization and coverage metrics for your Reserved Instances (RIs) or Savings Plans.
AWS Budgets can track various costs and usage, including:
- Setting a monthly cost budget with a fixed target amount to track all costs associated with your account.
- Setting a monthly cost budget with a variable target amount, with each subsequent month growing the budget target by 5%.
- Setting a monthly usage budget with a fixed usage amount and forecasted notifications to help ensure that you are staying within the service limits for a specific service.
- Setting a daily utilization or coverage budget to track your RI or Savings Plans.
The example below shows how a check block can be used to assert that you remain in compliance for the budgets that have been set up. (Note: In this example we can use version 5.2 and later of the AWS provider.)
check "check_budget_exceeded" {
data "aws_budgets_budget" "example" {
name = aws_budgets_budget.example.name
}
assert {
condition = !data.aws_budgets_budget.example.budget_exceeded
error_message = format("AWS budget has been exceeded! Calculated spend: '%s' and budget limit: '%s'",
data.aws_budgets_budget.example.calculated_spend[0].actual_spend[0].amount,
data.aws_budgets_budget.example.budget_limit[0].amount
)
}
}
If the budget exceeds the set limit, the check block assertion will return a warning similar to this:
│ Warning: Check block assertion failed
│
│ on main.tf line 43, in check "check_budget_exceeded":
│ 43: condition = !data.aws_budgets_budget.example.budget_exceeded
│ ├────────────────
│ │ data.aws_budgets_budget.example.budget_exceeded is true
│
│ AWS budget has been exceeded! Calculated spend: '1550.0' and budget limit: '1200.0'
» AWS: Detect threats with detailed findings for AWS accounts
Amazon GuardDuty is a threat-detection service that continuously monitors for malicious activity and unauthorized behavior to protect your AWS accounts, workloads, and data stored in Amazon S3. AWS simplifies the collection and aggregation of account and network activities, but it can be time consuming for security teams to continuously analyze event log data for potential threats.
The following example outlines how a check block can be utilized to assert that no threats have been identified from AWS GuardDuty. (Note: In this example we can use version 5.2 and later of the AWS provider.)
data "aws_guardduty_detector" "example" {}
check "check_guardduty_findings" {
data "aws_guardduty_finding_ids" "example" {
detector_id = data.aws_guardduty_detector.example.id
}
assert {
condition = !data.aws_guardduty_finding_ids.example.has_findings
error_message = format("AWS GuardDuty detector '%s' has %d open findings!",
data.aws_guardduty_finding_ids.example.detector_id,
length(data.aws_guardduty_finding_ids.example.finding_ids),
)
}
}
If findings are present, the check block assertion will return a warning similar to this:
│ Warning: Check block assertion failed
│
│ on main.tf line 24, in check "check_guardduty_findings":
│ 24: condition = !data.aws_guardduty_finding_ids.example.has_findings
│ ├────────────────
│ │ data.aws_guardduty_finding_ids.example.has_findings is true
│
│ AWS GuardDuty detector 'abcdef123456' has 9 open findings!
» Google Cloud: Assert that a VM is in a running state
VM instances provisioned using Google Compute Engine can pass through several states as part of the VM instance lifecycle. Once a VM is provisioned, it could experience an error or a user could suspend or stop that VM in the Google Cloud console. That change might not be detected until the next Terraform plan is generated. Continuous validation can be used to assert the state of a VM and detect if there are any unexpected status changes that occur out-of-band.
The example below shows how a check block can be used to assert that a VM is in the running state. You can force the check to fail in this example by provisioning the VM, manually stopping it in the Google Cloud console, and then triggering a health check in Terraform Cloud. The check will fail and report that the VM is not running. (Note: In this example we can use version 4.70 and later of the Google provider, configured with a default project, region, and zone.)
data "google_compute_network" "default" {
name = "default"
}
resource "google_compute_instance" "vm_instance" {
name = "my-instance"
machine_type = "f1-micro"
boot_disk {
initialize_params {
image = "debian-cloud/debian-11"
}
}
network_interface {
network = data.google_compute_network.default.name
access_config {
}
}
}
check "check_vm_status" {
# Note: in this example we reference the resource directly instead of using a data source (or a data source that is scoped to this check block)
assert {
condition = google_compute_instance.vm_instance.current_status == "RUNNING"
error_message = format("Provisioned VMs should be in a RUNNING status, instead the VM `%s` has status: %s",
google_compute_instance.vm_instance.name,
google_compute_instance.vm_instance.current_status
)
}
}
» Google Cloud: Check if a certificate will expire within a certain timeframe
Certificates can be provisioned using either the Google Certificate Manager, Google Certificate Authority Service, and Google Compute Engine APIs. In this example, we provision a certificate via the Google Certificate Authority Service that has a user-supplied lifetime argument. After the lifetime duration passes, the certificate is automatically deleted in Google Cloud. In order to prepare for this event and not be caught off-guard, we can create a check asserting that the certificate’s expiration date is more than 30 days away. When the certificate’s expiration is approaching, health checks notifications from Terraform Cloud will alert users and allow manual intervention to be completed in time.
In the example below, we provision a certificate with a lifetime of 30 days and 2 minutes (see local.month_and_2min_in_second_duration
) and create a check that asserts certificates should be valid for the next 30 days (see local.month_in_hour_duration
).
We can see the check begin to fail by waiting 2 minutes after the certificate is provisioned and then triggering a health check in Terraform Cloud. The check will fail and report that the certificate is due to expire in less than a month. (Note: In this example we can use version 4.70 and later of the Google provider, configured with a default project, region, and zone.)
locals {
month_in_hour_duration = "${24 * 30}h"
month_and_2min_in_second_duration = "${(60 * 60 * 24 * 30) + (60 * 2)}s"
}
resource "tls_private_key" "example" {
algorithm = "RSA"
}
resource "tls_cert_request" "example" {
private_key_pem = tls_private_key.example.private_key_pem
subject {
common_name = "example.com"
organization = "ACME Examples, Inc"
}
}
resource "google_privateca_ca_pool" "default" {
name = "my-ca-pool"
location = "us-central1"
tier = "ENTERPRISE"
publishing_options {
publish_ca_cert = true
publish_crl = true
}
labels = {
terraform = true
}
issuance_policy {
baseline_values {
ca_options {
is_ca = false
}
key_usage {
base_key_usage {
digital_signature = true
key_encipherment = true
}
extended_key_usage {
server_auth = true
}
}
}
}
}
resource "google_privateca_certificate_authority" "test-ca" {
deletion_protection = false
certificate_authority_id = "my-authority"
location = google_privateca_ca_pool.default.location
pool = google_privateca_ca_pool.default.name
config {
subject_config {
subject {
country_code = "us"
organization = "google"
organizational_unit = "enterprise"
locality = "mountain view"
province = "california"
street_address = "1600 amphitheatre parkway"
postal_code = "94109"
common_name = "my-certificate-authority"
}
}
x509_config {
ca_options {
is_ca = true
}
key_usage {
base_key_usage {
cert_sign = true
crl_sign = true
}
extended_key_usage {
server_auth = true
}
}
}
}
type = "SELF_SIGNED"
key_spec {
algorithm = "RSA_PKCS1_4096_SHA256"
}
}
resource "google_privateca_certificate" "default" {
name = "my-certificate"
pool = google_privateca_ca_pool.default.name
certificate_authority = google_privateca_certificate_authority.test-ca.certificate_authority_id
location = google_privateca_ca_pool.default.location
lifetime = local.month_and_2min_in_second_duration # lifetime is 2mins over the threshold in the check block below
pem_csr = tls_cert_request.example.cert_request_pem
}
check "check_certificate_state" {
assert {
condition = timecmp(plantimestamp(), timeadd(
google_privateca_certificate.default.certificate_description[0].subject_description[0].not_after_time,
"-${local.month_in_hour_duration}")) < 0
error_message = format("Provisioned certificates should be valid for at least 30 days, but `%s`is due to expire on `%s`.",
google_privateca_certificate.default.name,
google_privateca_certificate.default.certificate_description[0].subject_description[0].not_after_time
)
}
}
» Google Cloud: Validate the status of a Cloud Function
Google Cloud Functions can have multiple statuses (ACTIVE, FAILED, DEPLOYING, DELETING) depending on issues that occur during deployment or triggering the function. In the example below we create a second-generation cloud function that uses source code stored as a .zip file in a Google Cloud Storage bucket. A .zip file containing the files needed by the function is uploaded by Terraform from the local machine.
In the check, we use the google_cloudfunctions2_function
data source’s state attribute to access the function’s state and assert that the function is active. (Note: In this example we can use version 4.70 and later of the Google provider, configured with a default project, region, and zone.)
resource "google_storage_bucket" "bucket" {
name = "my-bucket"
location = "US"
uniform_bucket_level_access = true
}
resource "google_storage_bucket_object" "object" {
name = "function-source.zip"
bucket = google_storage_bucket.bucket.name
source = "./function-source.zip"
}
resource "google_cloudfunctions2_function" "my-function" {
name = "my-function"
location = "us-central1"
description = "a new function"
build_config {
runtime = "nodejs12"
entry_point = "helloHttp"
source {
storage_source {
bucket = google_storage_bucket.bucket.name
object = google_storage_bucket_object.object.name
}
}
}
service_config {
max_instance_count = 1
available_memory = "1536Mi"
timeout_seconds = 30
}
}
check "check_cf_state" {
data "google_cloudfunctions2_function" "my-function" {
name = google_cloudfunctions2_function.my-function.name
location = google_cloudfunctions2_function.my-function.location
}
assert {
condition = data.google_cloudfunctions2_function.my-function.state == "ACTIVE"
error_message = format("Provisioned Cloud Functions should be in an ACTIVE state, instead the function `%s` has state: %s",
data.google_cloudfunctions2_function.my-function.name,
data.google_cloudfunctions2_function.my-function.state
)
}
}
» Azure: Assert that a VM is in a running state
An Azure Virtual Machine gives you the flexibility of virtualization without having to buy and maintain the physical hardware that runs it. However, you still need to maintain the virtual machine by performing tasks, such as configuring, patching, and installing the software that runs on it. Provisioned VM instances can pass through several different power states as part of the VM-instance lifecycle. Once a VM is provisioned, it could experience an error, or a user could suspend or stop that VM and that change might not be detected until the next Terraform plan is generated. Continuous validation can be used to assert the state of a VM and detect if there are any unexpected status changes that occur out-of-band.
The example below shows how a check block can be used to assert that a VM is in the running state. You can force the check to fail in this example by provisioning the VM, manually stopping it, and then triggering a health check in Terraform Cloud. The check will fail and report that the VM is not running:
data "azurerm_virtual_machine" "example" {
name = azurerm_linux_virtual_machine.example.name
resource_group_name = azurerm_resource_group.example.name
}
check "check_vm_state" {
assert {
condition = data.azurerm_virtual_machine.example.power_state == "running"
error_message = format("Virtual Machine (%s) should be in a 'running' status, instead state is '%s'",
data.azurerm_virtual_machine.example.id,
data.azurerm_virtual_machine.example.power_state
)
}
}
The full example can be found in the AzureRM provider's examples/tfc-checks/vm-power-state folder in GitHub.
» Azure: Monitor if a Container App certificate will expire within a certain timeframe
Azure App Service Certificates (and other resources) can be provisioned using a user-supplied certificate. The example below shows how to check that a certificate should be valid for the next 30 days (see local.month_in_hour_duration
):
locals {
month_in_hour_duration = "${24 * 30}h"
}
data "azurerm_app_service_certificate" "example" {
name = azurerm_app_service_certificate.example.name
resource_group_name = azurerm_app_service_certificate.example.resource_group_name
}
check "check_certificate_state" {
assert {
condition = timecmp(plantimestamp(), timeadd(
data.azurerm_app_service_certificate.example.expiration_date,
"-${local.month_in_hour_duration}")) < 0
error_message = format("App Service Certificate (%s) is valid for at least 30 days, but is due to expire on `%s`.",
data.azurerm_app_service_certificate.example.id,
data.azurerm_app_service_certificate.example.expiration_date
)
}
}
The full example can be found in the AzureRM provider's examples/tfc-checks /app-service-certificate-expiry folder on GitHub.
» Azure: Check if an App Service Function or Web App has exceeded its usage limit
App Service Function and Web Apps can exceed their usage limits. The example below shows how a check block can be used to assert that a Function or Web App has not exceeded its usage limit:
data "azurerm_linux_function_app" "example" {
name = azurerm_linux_function_app.example.name
resource_group_name = azurerm_linux_function_app.example.resource_group_name
}
check "check_usage_limit" {
assert {
condition = data.azurerm_linux_function_app.example.usage == "Exceeded"
error_message = format("Function App (%s) usage has been exceeded!",
data.azurerm_linux_function_app.example.id,
)
}
}
The full example can be found in the AzureRM provider's examples/tfc-checks/app-service-app-usage folder on GitHub.
» Getting started with Terraform Cloud
Terraform is the industry standard for provisioning and managing any infrastructure. Continuous validation gives users the visibility to ensure that their infrastructure works as expected, and will notify users if it fails. For more information, visit the Workspace Health page in the Terraform Cloud documentation.
Try these new features today — and if you are new to Terraform, sign up for Terraform Cloud and contact sales for a trial of Terraform Cloud Plus.
Sign up for the latest HashiCorp news
More blog posts like this one
Terraform delivers launch-day support for Amazon S3 Tables, EKS Hybrid Nodes, and more at re:Invent
The Terraform provider for AWS now enables users to manage a variety of new services just announced at re:Invent.
HashiCorp at re:Invent 2024: Infrastructure Lifecycle Management with AWS
A recap of HashiCorp infrastructure news and developments on AWS from the past year, from a new provider launch to simplifying infrastructure provisioning and more.
Simplify policy adoption in Terraform with pre-written Sentinel policies for AWS
HashiCorp introduces a new pre-written policy library co-developed with AWS, aiming to reduce the barrier of adoption for policy as code infrastructure workflows.