Modern Infrastructure Automation with Packer, Terraform, and Consul
Mar 19, 2021
HashiCorp co-founder and CTO Armon Dadgar walks through the modern infrastructure automation workflows of HashiCorp Packer, Terraform, and Consul in this whiteboard video.
- Armon DadgarCo-founder & CTO, HashiCorp
In this whiteboard video, Armon Dadgar explores the state of modern infrastructure automation, and the urgency of adopting...
Learn which HashiCorp open source products can help include these patterns in organizations' provisioning workflows as they move to public cloud + on-premises hybrid cloud models, as well as multi-cloud workflows.
Hello. I want to spend some time today talking about modern infrastructure automation.
When we talk about infrastructure automation today, a lot has changed about what we consider best practice. In HashiCorp's view, I think there are 2 core tenets that we believe in.
One is this notion of infrastructure as code, the idea that we're going to capture all the aspects of the process of managing our infrastructure, defining it, configuring it, etc., in a codified way. That allows us to automate against it, apply version control, etc., get all these great benefits.
The other big one is the notion of immutability. With an immutable approach, we're baking images or artifacts that are versioned. Then, when we need to make changes, instead of trying to upgrade our infrastructure in place, we bake a new version of that image.
So we go from version 1 to version 2, upgrading atomically rather than trying to in-place migrate.
Why do we care about these 2 approaches? What it's about is, How do we manage complexity, how do we manage risk? And how do we go quickly?
Going Faster with Infrastructure as Code
By virtue of capturing things as infrastructure as code, now we can automate the standup and provisioning of that infrastructure, and that allows us to go a lot faster. We can repeatedly stand up and provision new infrastructure and stamp it out without manually having to point and click and reconfigure it each time.
While that allows us to go fast, because it's machine-automated, it eliminates human error. This is where we start to reduce risk.
From a complexity viewpoint, each of these deviations, even if they don't fail, starts to create special snowflakes of infrastructure. If we provide 50 servers that we manually provisioned, all 50 might be slightly different. We've introduced a bunch of accidental complexity to our infrastructure versus if all 50 were identically the same.
Simplifying with Immutability
Immutability is the same concept. It simplifies things so we don't have to think about these 50 different web servers that are slightly different in their configuration. We maybe have 40 that are running version 1 and 10 that are running version 2.
We can think about it in this discreet way that simplifies the complexity but also reduces the risk that things break in unpredictable ways.
If we're installing packages and running configuration management at runtime, the risk is that those things could fail, in ways that we don't expect, and that introduces different risks that we don't have in an immutable world, where we move toward atomically: Did it boot or did it not boot? It reduces and simplifies the problem space.
How do we realistically apply this? In our view, there are 3 different areas where we'd apply the HashiCorp tooling. The first is with Packer. Packer looks at, How do we start by building a Packer configuration file?
We define, through a set of infrastructure as code configuration, all the inputs that we need. This might be source code, it might be configuration management tooling, it might be various security controls, might be compliance controls that we care about, etc.
We'll define all of these things as the inputs to our Packer file, all of the components that we need. And then we feed this into Packer itself, and Packer's job is ultimately to generate for us an artifact. And this artifact can vary.
The goal of Packer is that we have a common workflow, but ultimately it can generate a whole bunch of different things. This could be, for example, a VM image. It could be an Amazon Machine Image in the cloud, on Amazon. This could be a Docker container. The list goes on.
In some sense, we don't care what the artifact is; we just care that we have a consistent process of translating these inputs of source code and configuration management through a repeatable process where we build these images, and ultimately we probably land it into some sort of artifact management store.
Maybe this goes into something like Artifactory, as an example.
Let's say it's a web server that we're building.
We're going to define our source code and our config. We're going to go through this process and ultimately end up with, let's say, version 1. This might be a Docker container that lands in the artifact store, or it might be a machine image.
Infrastructure as Code: Terraform
Next is Terraform. You're going to start to notice there's a recurring pattern here, because it starts with infrastructure as code.
Again, we're going to take this infrastructure as code definition of all of the pieces of my infrastructure that I care about. I might need a load balancer that goes to a set of web servers that talk to a database, as an example.
I'm going to declare through my infrastructure as code definition that I have some infrastructure layout that looks like this. Then I'm going to feed that into Terraform. You're going to notice there's a pattern here, which is Terraform doesn't care whether I am talking to AWS, Google, Azure, VMware on premises, etc.
It doesn't matter to Terraform. It can handle all of these different environments.
AWS, GCP, Azure, VMware etc. Terraform supports, I think, hundreds of different providers. But ultimately it's going to go out and create that sort of infrastructure.
These things might be paired together as part of this configuration. We say, "I don't want to deploy this version 1 of my web server using this image that we have. These 3 web servers that come up, web server 1 through web server 3, are all running version 1.
This becomes great. We've now built our immutable image. We've used Terraform using infrastructure as code to deploy and define how those things go out.
Network Infrastructure: Consul
Where Consul fits into this is, How do we pair this infrastructure? It might be dynamic, and we'll see over time with things like our load balancers.
Again, what we might want to do is to define, using Terraform with an infrastructure as code approach, and say, "What I care about is, in this case, my set of web servers that are coming in. And I'm going to define, using Terraform, how that should go out and configure. In this case, this load balancer might be my F5 BIG-IP.
Integrating the HashiCorp Stack
How these things are then all integrated is: I can have an image-building pipeline that uses Packer. Now I have version 2 of my source code, if this gets modified, as an example, I rerun Packer. When it goes through, it's going to pick up the new source code, the existing configuration security controls, etc.
And now we might build this V2 artifact image. Then we decide that's what we want to deploy instead of V1. We make a change with Terraform. Terraform, when we run its plan, will tell us, "You have an existing set of VMs. These that I've already provisioned will need to be destroyed. And in favor of them, we're going to create a new set, a replacement set. These will run V2.
Terraform by default takes this very immutable-driven approach to infrastructure. When you make this change, it's not going to try and upgrade those web servers in place. It's going to create a new set and replace the old set.
Along the way, this might not be atomic. We have to first bring up the new version 2s. They're going to operate at a different set of IPs. Then, as we feel comfortable about that, we'll tear down the version 1 things.
Consul operates in runtime. It's a runtime service. The first thing it does is it has a bird's-eye view of all of the applications. When we had web server 1, 2, and 3, they were all registered with Consul.
Then, based on this infrastructure as code definition, the fact that it was subscribed to a set of web servers, Consul would trigger Terraform and say, "Web servers 1, 2, and 3 exist." Terraform went and configured the load balancer to say, "These are the 3 web servers that should sit behind the load balancer."
As Terraform is executing and we bring up these new version 2 instances—this is web 1 prime, web 2 prime, web 3 prime—these get registered with Consul. Consul will then re-execute Terraform, because the set of web servers has changed.
We'll update the load balancer and add the additional 3 instances to it. Then when Terraform destroys these 3 and they get de-registered from Consul, we'll execute Terraform, update the load balancer, and remove the old 3 that we had.
A Modern Way of Doing Provisioning
This starts to show you how we can pair these different technologies together into a modern way of doing provisioning.
Historically, what you would have had is a manual build process where you built a golden image and rev that maybe once a quarter. Now we can build Packer into a continuous integration system. Anytime we have a change on the source code or the configuration management, we can rerun and rebuild those core images.
With Terraform, we can define our infrastructure and reference these different versions that we need. As we make changes over time, Terraform can evolve that, and we can apply the infrastructure as code best practice.
But Terraform operates in this immutable way. It's not going to try and upgrade these things in place. It's going to do this in an immutable fashion and try and create new resources and destroy the old ones.
In this example, I use a VM-type scenario, but it could be the same thing if you're using Terraform to define a workload, for example, running on top of Kubernetes.
We might be defining a deployment or a set of pods on top of Kubernetes—same sort of thing. We would create the new deployment running the new version, make sure that succeeds, and then tear down the old one. Still applying this immutable upgrade methodology.
As we bring that to networking, what we often see is that networking changes tend to be behind a ticket queue. You deploy your new application, and then you file a ticket for someone to update the firewall, the load balancer, or the API gateway, etc.
The idea here is, Can we split this? You're publishing anytime you deploy a new instance of your application. Consul knows there's a new web server running at this IP. And what it enables is that, as a networking operations team or as a network team, we can subscribe to those changes in an automated way. We can define what happens and say, "Every time you see a change to the web server, here's how that should be reflected to, for example, in this case, the load balancer."
But it could just as well be that we say, "We're going to use Terraform to update our Palo Alto firewall," as an example.
We might have a Palo Alto firewall and say, "Anytime you see a new web server show up, update the firewall and allow that web server to talk to the database."
In this case, what we're trying to do is move away from manually filing tickets to update our network and really think about a modern infrastructure automation that's end to end. It's everything from, How does the network get updated and reconfigured? How do we think about the underlying infrastructure that the applications are running on?
And even at the application layer, how did those things get defined, packaged, and deployed on top? That can be using modern platforms like Nomad and Kubernetes. It could be a serverless environment, could be VMs in a more traditional architecture.
But it's really about thinking about that end-to-end experience of network, infrastructure, application, and applying these best practices of infrastructure as code and immutability.
Hopefully this gives you a sense of how Packer, Terraform, and Consul can be used in conjunction for that.