terraform

Monitoring as Code with Terraform Cloud and Checkly

Learn how to use the HashiCorp Terraform Provider for Checkly to automate infrastructure-monitoring setup and configuration.

This guest post was written by Hannes Lenke, co-founder and CEO at Checkly, the active monitoring platform for developers.

HashiCorp Terraform Cloud enables you to seamlessly provision infrastructure as code, consolidating configurations in source control and bringing transparency and replayability to a previously manual workflow. The same approach can be used to define the way our APIs and web apps are monitored.

In this blog post, you will learn how to use Terraform together with Checkly to provision your monitoring setup as code.

»Manual Provisioning is a Bottleneck

But before we dive into how to use Terraform and Checkly together, it’s important to understand the issues created by manual provisioning of monitoring checks.

The story of the world’s fourth-largest retailer, Germany’s Schwarz Group, which operates the Lidl and Kaufland grocery store and hypermarket chains, mirrors that of many Checkly customers. At Schwarz Group, several teams relied on manual procedures to manage their monitoring checks across many websites and connected backends, even when Terraform was already being used to manage infrastructure. This clash of approaches presented multiple challenges.

»Handling Checks at Scale Produces Large Overheads

The need to provision monitoring checks for multiple large APIs and websites meant internal users had to spend large amounts of time going through repetitive manual flows. With changes being rolled out to the target applications on a daily basis, the cost was significant.

»Low Transparency Makes Cross-Team Collaboration Harder

Manual flows meant users had to create tickets in order to have new monitoring resources provisioned for them, or request permission in advance to apply the changes themselves. In turn, the central IT team needed to work through different UIs and flows based on the service provider, and the resulting monitoring configurations then lived on separate platforms.

This made it difficult to maintain consistency across the entire infrastructure while avoiding duplication of effort across teams. It also complicated the task of auditing changes, making it difficult to review wrongly configured monitoring checks, thereby lengthening an important feedback loop.

»Non-Agile Workflow Slows Delivery

Eventually, the speed of checks-provisioning could not match the pace at which the target applications were evolving. This was the result of a mismatch of approaches: the CI/CD workflow through which the websites and APIs were iterated upon on one side vs. the fully manual approach on the other. In response to these issues, Schwarz Group’s central services team went looking for an approach that mirrored the existing infrastructure-as-code (IaC) workflow.

»Monitoring as Code

Applying lessons learned from IaC, monitoring as code (MaC) brings check definitions closer to the source code of the application by having them written as code. A declarative approach means that the user does not need to specify how the provisioning happens, or which specific actions and calls need to be made, but rather what the final results should look like.

This method allows check definitions to live in source control, which in turn boosts cross-team visibility, as sharing access to a repository is often simpler and cheaper than sharing seats on different monitoring-service providers. Additionally, having a history detailing every change increases transparency and makes it easier to roll back changes in case of incidents.

With software such as Terraform taking over the provisioning of monitoring checks, hundreds or thousands of checks can be created or edited in a matter of seconds. This is a game-changer for development, operations, and DevOps teams, allowing them to reallocate time spent on manual configuration towards improving the coverage and robustness of their monitoring setup.

According to Andreas Lehr, Team Lead at Schwarz Group IT, "Checkly integrated with Terraform enables us to quickly create, modify, and deploy API and browser checks for a broad and diverse audience of internal customers. The codified workflow ensures full transparency, thanks to built-in auditing and documentation!"

To summarize, MaC is revolutionizing the way monitoring is configured by providing:

  1. Better scalability through faster, more efficient provisioning
  2. Increased transparency and easier rollbacks via source control
  3. Unification of previously fragmented processes in a CI/CD workflow

Just like the Schwarz Group, any Terraform user can reap the benefits of monitoring as code. In the next section of this post, we will guide you through configuring a MaC setup with Terraform Cloud and Checkly.

»The HashiCorp Terraform Verified Provider for Checkly

The HashiCorp Terraform Verified Provider for Checkly allows users to configure API and synthetic monitoring checks as part of their existing infrastructure codebase. These checks then run on a schedule or on-demand to monitor single functionalities or end-to-end user scenarios over time, alerting the responsible contacts as soon as any misbehavior is detected.

Terraform Cloud to Checkly workflow diagram

»Monitoring APIs as Code

Here’s how it works: A Checkly API check makes an HTTP request to an API endpoint and examines the response, ensuring it is both correct and quick enough, according to parameters specified by the user. If these conditions are not met, the user is alerted through channels such as OpsGenie, PagerDuty, Slack, and SMS.

As an example, let’s create an API check against a demo website. The goal is to ensure the users of the webshop can request a list of available books. The first step is to add the Checkly Terraform provider, which we will use to define every aspect of the check, to our Terraform file. In this tutorial, we will do all our work in the main.tf file:

variable "checkly_api_key" {}

terraform {
  required_providers {
    checkly = {
      source = "checkly/checkly"
      version = "0.8.1"
    }
  }
}

provider "checkly" {
  api_key = var.checkly_api_key
}
variable "checkly_api_key" {} terraform {  required_providers {    checkly = {      source = "checkly/checkly"      version = "0.8.1"    }  }} provider "checkly" {  api_key = var.checkly_api_key}

We also need to add a resource for the API check. Let's keep things simple and specify a few key parameters, including the name, schedule, locations, and assertions:

resource "checkly_check" "webstore-list-books" {
  name                      = "list-books"
  type                      = "API"
  activated                 = true
  should_fail               = false
  frequency                 = 1
  double_check              = true
  ssl_check                 = true
  use_global_alert_settings = true
  degraded_response_time    = 5000
  max_response_time         = 10000

  locations = [
    "eu-central-1",
    "us-west-1"
  ]

  request {
    url              = "https://danube-webshop.herokuapp.com/api/books"
    follow_redirects = true
    assertion {
      source     = "STATUS_CODE"
      comparison = "EQUALS"
      target     = "200"
    }
    assertion {
      source     = "JSON_BODY"
      property   = "$.length"
      comparison = "EQUALS"
      target     = "30"
    }
  }
}
resource "checkly_check" "webstore-list-books" {  name                      = "list-books"  type                      = "API"  activated                 = true  should_fail               = false  frequency                 = 1  double_check              = true  ssl_check                 = true  use_global_alert_settings = true  degraded_response_time    = 5000  max_response_time         = 10000   locations = [    "eu-central-1",    "us-west-1"  ]   request {    url              = "https://danube-webshop.herokuapp.com/api/books"    follow_redirects = true    assertion {      source     = "STATUS_CODE"      comparison = "EQUALS"      target     = "200"    }    assertion {      source     = "JSON_BODY"      property   = "$.length"      comparison = "EQUALS"      target     = "30"    }  }}

We have two assertions against the response here:

  1. We are asserting that the status code is 200
  2. We are checking the number of items returned as part of our response to make sure all the data we expect is being sent back.

Terraform users can have changes in a configuration stored in GitHub automatically applied to a linked Terraform Cloud workspace as soon as they are merged into the master branch. We want every successful push to master on our Git repository to be automatically applied to our Terraform Cloud workspace. For this reason, under Settings > General, our plan is set to Auto apply.

Apply method selection

The Apply Method section of the Terraform Cloud workspace's General Settings.

We also need to create a free account on Checkly. Once that is done, we can fetch our Checkly API key from our Checkly Account Settings…

Menu

...and add it as an environment variable on Terraform Cloud, under the Variables section, with the key TF_VAR_checkly_api_key:

Environment variables

We can now commit our changes. As soon as we have them merged into the master branch, the current run will appear on our Terraform Cloud dashboard:

Terraform run dashboard

Once that is done, we will see our new API check appear on our Checkly dashboard:

Checkly first check

The check will now run every minute, monitoring the status of our endpoint from the locations we selected. Should it fail, it will immediately alert us on our channel(s) of choice:

Checkly channel list

»Monitoring E2E scenarios

In order to make sure our web app is functional for end users, we need to monitor key user journeys on our frontend as well. Checkly leverages Puppeteer and Playwright to automatically go through the most important flows of your web app, just like a user would. As soon as one breaks, it will alert you, just like with API checks.

Let's look at an example: we want to keep an eye on the login flow of our online bookstore, so we write or record the following script using Playwright:

const { chromium } = require("playwright");

const browser = await chromium.launch();

const context = await browser.newContext();

const page = await context.newPage();

await page.goto("https://danube-webshop.herokuapp.com/");

await page.click("#login");

await page.type("#n-email", "user@email.com");

await page.type("#n-password2", "supersecure1");

await page.click("#goto-signin-btn");

await page.waitForSelector("#login-message", { visible: true });

await browser.close();
const { chromium } = require("playwright"); const browser = await chromium.launch(); const context = await browser.newContext(); const page = await context.newPage(); await page.goto("https://danube-webshop.herokuapp.com/"); await page.click("#login"); await page.type("#n-email", "user@email.com"); await page.type("#n-password2", "supersecure1"); await page.click("#goto-signin-btn"); await page.waitForSelector("#login-message", { visible: true }); await browser.close();

Let's save the file in scripts/login.js, and then reference it in our main.tf file:

resource "checkly_check" "login" {

  name                      = "Login Flow"
  type                      = "BROWSER"
  activated                 = true
  should_fail               = false
  frequency                 = 10
  double_check              = true
  ssl_check                 = false
  use_global_alert_settings = true
  locations = [
    "us-west-1",
    "eu-central-1"
  ]

  script = file("${path.module}/scripts/login.js")

}
resource "checkly_check" "login" {   name                      = "Login Flow"  type                      = "BROWSER"  activated                 = true  should_fail               = false  frequency                 = 10  double_check              = true  ssl_check                 = false  use_global_alert_settings = true  locations = [    "us-west-1",    "eu-central-1"  ]   script = file("${path.module}/scripts/login.js") }

Let's also commit and merge these changes to have them reflected on Checkly:

Checkly 2nd check

Our check is now fully configured and will run every 10 minutes as specified, informing us if anything goes wrong with our login flow.

We can keep going and add as many checks as we want: Checkly and Terraform Cloud scale seamlessly together, and many users manage thousands of checks.

»Conclusion

Monitoring as code extends the benefits of infrastructure as code to monitoring, making complex real-world infrastructure more resilient and delivering a better end-user experience through better observability.

As shown in the Schwarz Group example, MaC with Terraform Cloud and Checkly help improve reliability through an efficient and transparent workflow, which is a force multiplier for large IT organizations.

That is it. No matter the size of your team or business, you now know all you need to get started and combine the power of Terraform Cloud and the Checkly Terraform provider. Happy monitoring!

Sign up for the latest HashiCorp news