Recorded Webinar

Create, Change, and Orchestrate AWS Infrastructure with Terraform

The co-founder of HashiCorp and an AWS solutions architect talk about using Terraform with AWS.

Transcript

Jana Boruta: Hello, everyone. Thank you all for joining us. My name is Jana Boruta. I’m the director of events at HashiCorp. We’re really thrilled to have this webinar today. Today’s webinar will include a presentation from Brandon Chavis. He’s a solutions architect at Amazon Web Services. And then, a presentation from Mitchell Hashimoto, our co-founder and CTO of HashiCorp.

If you have any questions during the webinar, please enter them on the side panel, and folks from our team will try to answer as many as we can during the webinar. Then, there are some questions that we’ll assign to Mitchell and Brandon while they’re presenting. Then, some of those questions we’ll keep to the very end, and we’ll hold a Q&A.

Without further ado, first up is Brandon.

Brandon Chavis: Thanks, Jana. Hey, everyone. I’m Brandon Chavis. I’m a solutions architect with Amazon Web Services. I work in the Amazon Partner Network, which means that I work with our technology and consulting partners that bring products and solutions to our AWS customers.

You might be wondering why AWS is here, but what I’m here to talk about is how AWS and HashiCorp work together to bring interesting solutions to our customers that solve for their cloud computing needs.

The first thing that I want to talk about was how fast things change at AWS. I’ve been here for about four years. Maybe the most impressive thing has been just the constant rate of innovation. This graph is interesting to look at. This is the amount of features that we release year-over-year. This only goes to 2015. It’s even crazier this year. Basically, I work here and I can’t keep up. We have over 70 services to interact with. We’re constantly adding new features to all of these services. At the time this graph was produced, we released over 2,300 new features and/or services since the inception of AWS. Basically, what this means is, it’s crazy to try to keep up with this rate of change. This just stresses the need for effective tooling to interact with AWS.

Another important thing to think about with Amazon Web Services is that we have over a million customers in 190 countries. We provide many options for global deployment. We do that while also providing for extremely high reliability. We currently provide 13 regions and 35 availability zones.

To elaborate on that availability zone concept, an individual region will always contain at least two availability zones, and those AZs are on completely separate power grids and floodplains and always comprised of multiple data centers. This is to provide redundancy within that availability zone. We don’t build regions that only contain a single data center, because it just wouldn’t offer the reliability that our customers require to run enterprise businesses, for example. This highly available global network allows you to deploy your applications near your customers much faster and in a simpler way then what can be achieved through a traditional model.

We also give you a lot of options in how you deploy. AWS has supported hybrid cloud deployments ever since our inception. We found that with legacy systems, there’s always going to be a period where you’re in both environments. You’re running on-premises and you’re running in the cloud. We’ve put a lot of time and effort into making sure that your on-prem resources can operate seamlessly with the cloud.

We’ve been working with enterprises since our inception in 2006 to use AWS in all manner of hybrid architectures. We built a set of services and capabilities that provide the broadest and deepest hybrid functionality of any cloud provider today. We give you functionality to extend your network into AWS, ways of controlling both on-premises and AWS resources at the same time.

Terraform is a fantastic tool for our customers, because whether they’re all-in on the cloud or in transition, Terraform provides a way of creating and managing the resources used in AWS, as well as on-premises.

We’ll dig in to Terraform a little bit more, especially once Mitchell comes on, but real quick, I wanted to talk a little bit about the HashiCorp and AWS relationship and what we’re doing to bring solutions to our customers.

AWS and HashiCorp working together

I wanted to frame this by talking real quickly about the Amazon Partner Network, which is basically an organization inside AWS that’s specifically for working with our technology and consulting partners. We work together with these partners to build products that solve needs for our customers. We also use the APN as a mechanism to help our customers identify which partners can help them in their specific cloud journey.

All customers have individual needs. One of the things we’ve done is we’ve built and operated this program called the competency program. We use this to maintain what’s basically like an all-star list of technology companies and consulting partners that are best in class. We’ve got a variety of categories like life sciences or big data or DevOps in particular. I’d recommend you take a look at these competencies if you’re trying to identify a partner that could solve for a specific need on AWS.

HashiCorp is a DevOps competency partner. That means that they offer some of the premier DevOps tools that work with AWS products. More specifically, our DevOps competency partners provide solutions to or have deep experience working with businesses that help them do things like implement continuous integration and continuous delivery practices or they help them automate infrastructure provisioning and management with configuration management tools in AWS. Also part of these competencies, we ensure that these competency partners have public references, which helps us ensure that our partners have proven success in solving problems for AWS customers. HashiCorp is a great example of one of our DevOps competency partners.

Another interesting component of the partnership is there’s a huge active community of developers that use AWS and Terraform together. I think if you look at the project on GitHub, it reflects the amount of enthusiasm that Terraform users have for AWS and vice versa. AWS and HashiCorp customers and just the community contribute back to the codebase. The result of this is that we see AWS features supported extremely rapidly. Despite the massive rate of change you see in the AWS platform as we talked about a little bit ago, the Terraform community does a great job of keeping up with it. This ensures that Terraform can help you keep up with the rate of change at AWS.

It’s fun to see all this development done in the open. The maintainers are excellent at being responsive. It’s a wonderful community. We’re excited to be a part of that.

It’s important to call out, though, that it’s not just the community that provides the AWS integrations. For example, AWS and HashiCorp work together to bring a number of integrations in these products. One of which that’s interesting to call out is HashiCorp Vault as an AWS-specific secret back end, which brings customers some helpful features, one of which is called secure introduction, in which you can automatically retrieve a Vault token by trusting AWS as a trusted third party.

Essentially, this uses some cryptographically signed dynamic metadata information that uniquely represents each EC2 instance and trust that as a proof of identity, which is an awesome way of interacting with Vault. You can also use it to generate AWS access credentials dynamically based on IAM policies. Basically you can generate policies on the fly and credentials on the fly, and then automatically revoke them when the vault lease expires.

We’re working together to solve some of these challenging problems across the HashiCorp suite of products. Vault is simply the most recent and most interesting collaboration, I think, but what we want to convey here is that if there’s something that you think that AWS and HashiCorp could help you resolve by working together, by all means let us know. At AWS, we’re extremely receptive to feature requests. If there’s something that you need resolved and there’s something that you need in order to achieve a specific use case, definitely give us that feedback. We’re always excited to hear what our customers want.

Helping the community

You’ll also occasionally find AWS and HashiCorp collaborating on blog posts and other technical material just to help out our community. A good example of this is earlier this year, we posted a blog. It’s about getting started with Terraform on AWS specifically.

In this particular blog, we talked about some best practices with handling secrets, for example, like, how you can use the standard AWS credential providers. This includes IAM roles, which is really cool. If you run Terraform from an EC2 instance with an IAM role applied, Terraform will use it automatically, which is great. We love IAM roles.

We also in this blog post talk about some aspects of structuring your Terraform templates to logically correlate to the structure of your AWS infrastructure. We talked about how to use variables and modules, but Mitchell will probably talk about all these specifics in a little bit here.

Then, finally, this is maybe the latest aspect of our collaboration. We’ve built two new Quick Starts. Quick Start is a program that we run here at AWS to help our customers quickly deploy production-ready reference architectures. We worked with HashiCorp to build both Consul and Vault Quick Starts. We think this is great because these are two widely used HashiCorp products on AWS.

We wanted to help our customers get up and running quickly on AWS while providing a little bit of AWS common sense, as I like to call it, built into the templates. These are available as cloud-formation templates. They’re open source, so you can even deploy them just as they are or you can modify them and make them your own and use them as a base for your own deployments.

The Consul Quick Start can be deployed into your existing VPC or you can let the Quick Start set up a VPC for you. That VPC will be configured with some best practices. In this case, it deploys the VPC across multiple availability zones with both private and public subnets. In the public set subnets, we’ve configured some NAT gateways.

Then, all the Consul nodes get deployed into the private subnets. We let you configure three, five, or seven Consul servers and then as many Consul clients as you want. They’re configured in an auto-scaling group. Then, the Consul servers at EC2 auto-recovery turned on, which is a nice feature, in case of failure of your instance.

A lot of the heavy lifting of getting this up and running is taken care of for you. Essentially it’s one click and filling out a couple of parameters and then you have a fully running Consul cluster. Then, to extend that, we’ve also built a Vault Quick Start. The Vault Quick Start is built to take advantage of the Consul Quick Start. It will use Consul as the back end. We chose that because of the integration between Vault and Consul. It provides high availability for Vault, which is great.

If you deployed the Vault Quick Start, you can choose to deploy the Consul Quick Start at the same time. Then, in the Vault Quick Start specifically we deploy and configure two Vault instances. They both have EC2 auto-recovery enabled. Then, we’ve done some of the work to set up monitoring cloud watch logs, pushing Vault audit logs to cloud watch logs, for example.

Then, we’ve written a nice deployment guide to walk you through the whole process, from spinning up the infrastructure to actually logging in and getting started with Vault. You should have a complete step to getting this running in production in your account. Go check these out. You can find a summary blog post on the Amazon Partner Network blog. Then, links to all the assets from there.

Case study

Finally, the last thing I want to do is just talk about a customer that’s using AWS and Terraform Enterprise to provision and manage their AWS infrastructure. Red Bull Media, it’s a case study that we’re working on right now. Excuse me. Red Bull Media is the arm of Red Bull that provides all the live streaming TV and audio services. They’re running a fairly large deployment on AWS, between 300 and 500 instances at any given time. They’re using Terraform Enterprise to automate and manage all this infrastructure at once. But what moving to Terraform Enterprise enabled for them was essentially codified infrastructure that they can store in version control. Then, it helped to establish common and repeatable workflows that allow them to iterate on this infrastructure across their development teams, which is fantastic.

Essentially, what’s happening is Red Bull Media house is able to leverage the HashiCorp ecosystem to take advantage of many of the features in AWS and improve their agility and their ability to create and manage their infrastructure, all from a single source.

That’s it for me. I’m going to pass it off to Mitchell now to dive into some of the specifics of Terraform. Thanks, everyone.

Why Terraform

Mitchell Hashimoto: Hello, everybody. I’m going to be talking about Terraform, obviously, introducing Terraform. Thanks, Brandon, for the high-level information about how HashiCorp works with AWS. I’m going to focus today on introducing you to Terraform. If you’ve already used Terraform in some capacity, some of this information is probably redundant, but I want to start with just an introduction to why we built Terraform, what problems it’s solving, and how you get started with it. Let’s get going.

To start, I want to talk about the problem, what inspired the creation of Terraform. It could be represented in a couple ways. This is one way I like to look at it, which is the problem of rising data center complexity. This diagram is one we use in a number of our talks to represent our view of what I like to call the modern data center. What’s funny is we call it the data center, but the word data center is becoming more and more abstract. It’s becoming less of a physical thing you could point at or look at and just representing, to me, the collection of all the things you need to run in order to run an application.

Before, it used to just be servers in the data center, but nowadays it’s a lot more nebulous. There are still data centers. In Amazon, there might be regions and availability zones. Within those, there’s still servers running. You’re running VMs on those servers. On those VMs, you might be running containers. You might have servers that you’re just running containers on directly. There’s a mishmash of a ton of different things going on there that you all have to manage. It’s pretty complex.

Then, at the same time, all those things communicate with other resources. Represented by these cloud icons, there’s a ton of just service providers out there that run what used to be core software of your data center. They run that now as a service. Things like DNS, CDNs, databases. These are things that you can now just buy basically and get full management. Obviously, Amazon provides all those things. When you ask somebody now to spin up the minimal infrastructure needed to run your application, it’s not just as easy as starting one server anymore. It usually involves starting a number of things and connecting them together. It only gets worse as you go from minimal infrastructure necessary to real production-scale infrastructure with multiple applications and multiple business units and things like that in your company. We wanted to build a tool to basically automate all of this for you.

One other way to look at it is this diagram here. This diagram is just a diagram of a fairly standard vanilla web application. You basically have a CDN in front of a load balancer, which is in front of web server tiers and a database and some static stuff in S3. There’s nothing in this diagram that’s particularly, I would say, too advanced. It’s obviously not super beginner level, but it’s not something that only an expert should be able to do. This is a pretty standard web architecture. When you look at this diagram, you can see here, too, that if someone asked you to spin up a new web application that is more than just for development, that’s meant to serve real traffic, you’re faced with spinning up all these resources. It could be daunting. We wanted to automate that.

In summary, the problem really is that the data center is a complex multi-provider problem, the minimum infrastructure required for deployment is high, and manual creation just becomes too time-intensive very, very quickly. Which leads us to Terraform, which is a tool we created to solve this. The goals of Terraform from the beginning were to unify the view of resources using code—so being able to actually code your infrastructure no matter whether you’re using ELBs or Route 53 or instances, or maybe even another provider with AWS. It is to support the modern data center, which is what I just defined, but this involves all layers of infrastructure as a service, platform as a service, and software as a service. It’s pretty rare today to find any reasonably sized deployment that doesn’t use all three of these things.

As an example, IaaS in Amazon is EC2. Platform as a service could be a number of things in Amazon. It could be something like Beanstalk, it could be something like OpsWorks, or it could be ECS even in some cases. Then, software as a service is things like RDS and Route 53 and so on. You’re usually running all three. You need to be able to communicate with all three. It’s not enough to have a tool, for example, that just spins up virtual machines. That’s not going to be enough for this use case.

We also wanted a way to safely and predictably change infrastructure. At the time before Terraform existed when I was looking at this problem there were a lot of tools out there that that made it pretty nice to create infrastructure, but there weren’t many. It was very, very rare to find a tool that stuck around as you changed your infrastructure going forward. Even with the tools that did that, I saw a lot of people who, the way they were using the tools was, they completely codified how they create their infrastructure. But once it’s created, they go back to normal manual modification and change. That isn’t very scalable. We wanted a tool that you would feel confident could safely change your infrastructure.

Then, we want to provide a workflow that’s technology-agnostic. This goes in line with multiple providers. We live in a world where a lot of your things might be on something like Amazon, but there’s always one or two things that aren’t. An example for me—I won’t name any specific vendors. I don’t even know if I’m allowed to—but I use Amazon for most things, but I don’t use Route 53 for DNS. I use something else for DNS. I still want to be able to use a tool to spin up my Amazon resources, and then configure the DNS with those resources, even though the DNS isn’t on Amazon. I wanted one tool to do that. Terraform can do that.

Then, the final thing is a little bit more broad, forward-thinking, visioning about the project. What we recognize is that everything today has an API. Pretty much everything has lifecycle associated with it. You create something. You update something. You destroy something. Anything with an API usually has those lifecycles. We viewed Terraform as a more abstract tool to pretty much manage anything with a lifecycle API. That was the goal here.

Terraform versus other tools

In the abstract, we wanted to do better. To do that, we aim to provide a high-level description of infrastructure. You’ll see that with the language that I’ll show you shortly. We wanted to allow for composition and combination of multiple providers, but also multiple levels. One of the first demos we ever tried with Terraform 0.1, way back, just to see if it was solving the problems we had, was a demo, which, on bare metal, created the OpenStack-type installation, requested VMs from that sort of thing, installed a scheduler on top of that, and then deployed applications on top of that. It’s like a four-level system. The only reason we didn’t use AWS for that was to just try to go as low as we can just to see if it was possible. It was. That composition was really important with Terraform. The other thing is parallel execution. Terraform paralyzes as well as it can and ends up doing things very, very efficiently.

Then, the last thing was the most important thing—I remember saying when we first created Terraform, “If it can’t do this, we might as well not create it at all”—which is separating planning from execution. The only way to get to that goal that I stated in the previous slide of being able to trust Terraform to change your infrastructure over time is to be able to ask Terraform what it’s going to do before it does it. Terraform supports this notion of planning, which shows you what it’s going to do before it does it, so you could be confident that it’s going to do the right thing. We’ll talk more about that in a second.

Then, finally just before we dive into how you use Terraform, I just want to talk about the state of Terraform. First of all—Brandon mentioned a lot of this, too—is it’s open source. I’m surprised how many talks I give on our tools where, at the end of the hour, people come up to me and think I just gave a talk on commercial software for the whole hour, but I wanted to say that Terraform and everything I talk about here is open source, completely free.

The first release was a little over two years ago. It was in July 2014. It’s not a super old project, of course, but it is also very mature. There are over 700 contributors currently. It’s very active, a very large community. Over 6,200 GitHub stars. That’s a vanity metric, but even though it’s a vanity metric, it’s usually a pretty good measure of community acceptance of something. Take that as you will. We average a release about every two weeks. We’re very, very active in shipping. We do that very often.

Just to show the power of the community, especially when we started Terraform—Brandon, again, talked about this—but one of the biggest concerns we got from people was, How can a tool that aims to control all these different providers and not just AWS, but even just AWS, how can a third party ever keep up with them? It comes down to the power of community, the fact that at HashiCorp, we have a number of people, a dozen or so, working on Terraform, but they don’t even need to keep up because we have hundreds of community members. It’s like having hundreds of people working on making sure Terraform supports the latest features.

Since Terraform is a little over two years now, we went back and looked and saw the average time between an announcement of an AWS feature and the pull request opening to support it is about 30 minutes. That could be when people at HashiCorp are asleep. It doesn’t need to be an official thing. We get a pull request really quickly to support new features. Also, as part of that, when there are announcements that are particularly impactful—for example, when Amazon announced application load balancers, people wanted those really, really quick. We’ll prioritize releases for special announcements just to make sure that new features are out of Terraform.

We do average a release every two weeks. For example, when ALB was announced, we did a release the next day, just to make sure that the community had access to manage ALB resources.

The basics of Terraform Let’s dive into the basics of Terraform. To get started, I’m just going to show you what Terraform looks like when you write the code to manage resources. Here’s valid Terraform syntax. This is managing an AWS instance. As you can see, it’s human-readable. It’s all code. You just describe what you want. We’ll jump back to that code in a second.

The goal of infrastructure as code here is to provide a codified workflow to create a managed infrastructure. You get benefits like integrating with application code workflows. Since it’s code, you just treat it like an application. You put it in source control management. You’re able to do CI on it. You’re able to do code review with it. It’s not something that a person just manually does or you follow a bunch of checklists. It’s actually like you see the code and how it evolves.

The most important thing with this, this last point, is distribution of knowledge. I remember in the first operations position I took at a company, the process was I was hired. There were various wiki pages around of how the infrastructure worked, but they were usually outdated. How I ended up learning how everything worked was the more senior operations people over months and months would just slowly tell me how things work. When something failed, they’d be like, “Oh, yeah. You haven’t seen this before. This does this. This is how I manage it. This is how it works.” It’s this oral tradition of knowledge getting passed down just orally, just locally. It’s really inefficient.

It’s also prone really easily to mistakes. If somebody says, “Oh, this is how it works,” but somebody changed it since that person said that, it’s no longer true. You’re just relying on human memory, which I think is well shown to be pretty fallible at this point.

Being able to codify your infrastructure is really important, because with something like Terraform, what happens is you hire somebody, and that new person says, “Okay, how does the infrastructure work? How does the networking work? How do things communicate?” The more senior people can say, “Just look at the code. We don’t have to tell you anything.” The more senior person could quit, something could happen, and all the knowledge is still there. You don’t need to rely basically on more senior people to get your more junior people up and running.

That’s super important mostly from being able to grow, because the number of applications and servers that we’re deploying is just growing at an insane pace and relying on just manual education isn’t enough. We need to allow people to self-educate on this stuff. Terraform and infrastructure as code does a good job of that.

Back to the configuration syntax, though. The syntax that I just showed you and I’ll show you on the next slide is called HCL. It stands for the HashiCorp Configuration Language. It’s actually a language we use across all our tooling or most of it. It’s been a few years now since HCL’s existed, and 99% of the community thinks it’s a very pleasant way to work with our tools. However, since it is in our own configuration format, we do include complete JSON compatibility with it in every case. If you want to write JSON, you could do that although the community-accepted best practice, almost all the Terraform you’re going to see is going to be written in HCL. The JSON nowadays is very important. It’s a first-class feature, it’s not a second-class sort of thing. But it’s primarily used for machine-generated configurations. It lets machines like CIs and other things generate Terraform configuration, which is pretty useful.

Let’s go back to this example. This just shows you as text so it’s easy to see diffs. If someone committed a change to your infrastructure using something like Git, you could just easily see the change. You don’t need to dig in too deep to understand what happened.

Define your end state Another important point here is that Terraform configuration only represents the end state of what you want to achieve. You don’t tell Terraform how to get there. You don’t give Terraform a set of steps. You just tell Terraform what you want your infrastructure to look like. Terraform’s job is to make it so. We’ll talk more about the nuance there later.

The syntax, the first part of it, in bold here, “resource,” is just a key word that says we’re defining a resource. A resource in Terraform is anything with lifecycle attached to it: so create, destroy, update. It’s something that’s managed.

In this case, the resource is an AWS instance. The second part is the type of thing you want to manage. You could also put an ELB here, Route 53 record, etc.

The third thing is a unique name. This name is just for Terraform. AWS doesn’t see this name. It doesn’t map to any specific parameters in AWS. This is just for you to reference and understand within Terraform itself. We name it in this case “web.” This has to be unique. If you created another AWS instance, you wanted to manage two, you couldn’t name that one web as well. You would have to rename that web2 or anything else, pretty much.

Then, finally, these things on the inside are the attributes. Attributes are the configuration for the type. This will differ for each type. If I was configuring ELB here, it wouldn’t have an AMI option, for example. It’s different. In this case, we set some types that tell Terraform how you want the AWS instance created. Again, to reiterate, this only includes the end state, what you want it to be. Not what it is today.

Then, the last feature is what we call interpolations. They’re ways to reference attributes of other resources within Terraform. For example, if we’re managing an ELB, one of the attributes there is to specify the instances that the ELB is load balancing to. That’s a list of instance IDs. One way we could do that is, if we wanted to load balance to that instance that Terraform just created, is to just reference the ID of the instance it just created. We don’t know it until it’s created. By putting this here, Terraform at runtime will fill in that value and make the ELB manage that instance.

Terraform Plan The first command that you’re introduced to after writing Terraform configuration is Plan. Plan is that feature I talked about earlier, which just shows you what’s going to happen. This is super important. It’s better than just a dry run. There are some tools out there that provide what they might call no-op modes or dry runs. Terraform plan is that, but more, because what you can with Terraform Plan is also save the plan. If you save the plan, you could run Terraform with that plan and tell Terraform basically you only do this. Usually, what a dry run is basically saying—< not Terraform competitors or anything, just other classes of software—what a dry run usually says is, “If you ran this software right at this moment for this state of the world, this is what would happen.” But the state of the world changes. If you run it a second from now or a minute from now or a year from now, it’s probably going to do something different.

With Terraform, by default, a plan is a dry run, but you can also save a plan and that gives you the additional guarantee of telling Terraform to do only something and don’t do anything else even if the state of the world changed. This becomes very important as you manage production-critical infrastructure with Terraform.

The problem with the solving is, prior to Terraform, people had to just guess what was going to happen. Early on, when we released Terraform, we’d go to some big potential users that wanted to learn about it. They would have a team of 20 or 30 people managing their AWS infrastructure, but there were one or two gatekeepers to the whole thing, I like to call them the oracles of the infrastructure, because they had the knowledge where, if you showed them what you wanted to do, they would have to divine what the ultimate rollout effect of this change would be, like, Would changing the AMI of this instance trigger a change that’s necessarily in the ELB to point to this instance? Does that trigger DNS changes? They would have to think of this full rollout.

What Terraform Plan does is alleviate this. That company was a real example, and that company adopted Terraform and was able to no longer have gatekeepers. Everybody is able to deploy to production as long as the plans look good. There’s still people approving the plans, but a larger number of people could do it. This gives more junior people more power, but safely.

A plan, when you output it, is formatted similarly to a diff. There are colored symbols next to it to indicate what it means. A green plus means that a resource will be created. A red minus sign means the resource is going to be destroyed. A yellow or orange tilde means that a resource is going to be updated in place, so it’s not going to be destroyed to be updated. We could update it in place. That still might mean there’s downtime depending on the type of resource you’re updating, but at least we don’t need to destroy it. Then, a minus/plus, again in yellow-orange, indicates that a change requires a resource to be destroyed and then re-created. An example here might be if you’re updating an AMI on an AWS instance, you can’t do that in place. To update an AMI on an instance, you have to create a new instance. That would be a minus/plus operation.

On that configuration we just had, here’s what it might look like if you ran Terraform Plan. In this case, we don’t have any instance yet. What it’s showing you here with the green plus is it’s going to create that instance. You can see all the attributes that it’s going to have. Some of them are known, so the AMI and the instance type we specified. It just tells you that’s what it’s going to be. But there are also a number of other things that Terraform knows that it will know, but that it can’t know until it’s created, things like block devices or IP addresses and so on—those things Terraform knows will become available after the instance is created. Tthe reason why Terraform knows this is so that other things like ELB or DNS can reference those things. Your DNS resource can reference, for example, the public DNS of a non-created instance. Because Terraform can guarantee that that will be available and order things properly so that your DNS records only created one set is available.

Terraform Apply

Once the plan looks good, the next command you hop into is Terraform Apply. What Terraform Apply does is execute those changes to reach your desired state. Everything you see in the plan, Terraform Apply actually performs. It’s important to just reiterate that the plan does not impact your infrastructure. A plan never performs changes to your infrastructure. It’s purely read-only. It might query your infrastructure to see what its current state is, but it never performs a right or a change operation. Terraform Apply, on the other hand, executes all of those changes.

What Terraform Apply does is it determines the ordering. In that previous example with the ELB, that reference instance, that tells Terraform the ordering that’s necessary. In this case, for example, you’ll notice that we didn’t explicitly specify any ordering of “the instance must be created before the ELB,” but Terraform does consider that interpolation being an explicit direction to do that. Because that interpolation is there, Terraform knows that the instance must be created for the ELB and does the ordering properly.

In this simple example, it’s easy in a manual fashion to know that you must create the instance for the ELB. What gets really interesting with Terraform is, when you start managing more and more complex infrastructures and doing more and more complex changes, it’s understanding the ordering necessary to make things safe. It becomes more and more difficult. It’s nice to run a Terraform plan and just see what’s going to happen, know that Terraform’s going to handle the ordering for you properly, but again you can verify that with the plan.

Also, it parallelizes things. With this example, if we were creating two AWS instances, but we were only putting “web” in the ELB, it would create both the instances in parallel and also create the ELB before maybe the second instance was ready, because it doesn’t need it. Terraform will parallelize as much as possible.

This is super important for cloud resources because cloud resources are fast, but you don’t get them instantly. They take seconds, dozens of seconds, minutes. Some things like RDS could take multiple minutes because they’re pretty complicated. So it’s really important that a tool be able to do multiple things in parallel in order to just get as much done before the cloud control plane can respond.

Then, the last thing that’s also critically important for the tool is to be able to handle and recover from transient errors. Clouds are complex things. If you’re building infrastructure that has hundreds of dozens of resources, it’s not unusual to get a transient error—just a temporary error happens or maybe it might be on Amazon’s side. But a lot of times it might be on your side. If you’re creating infrastructure that takes ten minutes to come up, it’s possible in those 10 minutes that your Internet cuts out or something just happens.

Because of this, you need to be able handle it safely, and Terraform does. One of our sales engineers, when demoing Terraform, one of their favorite things they used to do was start a reasonably long, multi-minute Terraform Apply. Then, in the middle of Terraform Apply, turn off their Wi-Fi. The really fun thing about this is Terraform will error out and say, “Lost it, can no longer communicate with Amazon,” and then it exits. But then when you turn on your Wi-Fi again and rerun Terraform, it completes the operation. It doesn’t create duplicate resources. It knows what it did already and it completes it. That’s a good example of being able to handle these transient errors.

I recommend when you’re just getting started with Terraform to just try things out like that in order to build confidence and understand how Terraform behaves. But in the face of error, it tries to complete as much as possible, exits, and then, when you run it again, it just completes what it needs to reach your end state.

Here’s an example of what Terraform Apply looks like when you run it. It just says what it’s creating. This is our example we just had. It shows you the instance it’s creating, shows you the attributes that I got. It has some nice output about letting you know things are still happening. Again, with cloud resources, it’s very common for things to take a while. Instead of staring at a completely non-interactive screen, Terraform lets you know every 10 seconds what’s still going on. Then, at the end, it tells you the summary of what it did: How many things it created, how many things it changed, how many things it destroyed.

Terraform Apply gets you from your current state to your target state. If you remember, I said it a few times that your configuration only represents your end state of what you want your infrastructure to look like. It says nothing about what your current state is. It’s Terraform’s job to figure out that diff, to figure out what what’s going on currently, and what changes do I need to get there. When it inspects this, if possible, it’ll just update existing resources when it can. When it can’t or if they don’t exist, it will create resources.

If we take that example we just did, in this blue here, we added some tags. We just made a change. We already applied it once. Now we’re just making a change to the tags. Tags don’t require creating a new resource. You could add tags after an instance is created. If you run Plan, you could see that. The plan shows you the squiggly line, which means that it’s going to be updated in place. Then, the plan shows that the only things that are changing are these tags.

You can see that the number of tags is changing from one to three and that the two tags that are being added are foo and zip. You can see in the summary, nothing’s being created, nothing is being destroyed, but one thing is being changed. Then, when you apply it, you can see that instead of creating, it’s now a modification operation and does it in place.

In this case, it didn’t destroy your old instance. It didn’t incur any downtime or anything. It just did this all in place. You can see that by the way it ran. This is, in this small example, really easy to say, “A person could do that. It’s not complicated.” But when you’re changing the end state of a complex infrastructure, it’s not uncommon for a plan to have dozens of changes, dozens of additions, dozens of destructions. In those scenarios, it’s hard to reason about the ordering of things that need to happen or the effect of it.

One of my favorite examples of this on—and this wasn’t an AWS example—but someone was using a cloud provider where they requested an instance. One of the things that their company would do would be to use Terraform programmatically to resize the memory allocation on the instances. This required destroying and re-creating those instances. They ran their business on this and they did other things, but what makes the story really cool is that at some point, that cloud provider added the ability for memory change to be an in-place operation and no longer require downtime. You could dynamically change the RAM allocation on a machine.

Because this company is focused on other things, they didn’t keep up with the news and this sort of thing. But one day they updated Terraform and suddenly their applies were doing in-place updates instead of full resource destruction/creation. They just got that for free. That’s one of the really important properties of Terraform that I just can’t state enough, which is that with Terraform, you get the combined infrastructure knowledge of the entire community to make Terraform work. You don’t need to keep up with the latest changes of how things work. Sometimes you just update Terraform and it just does things better. That’s a pretty cool thing to get.

Terraform Destroy

Then, the last command I’m just going to cover here is Terraform Destroy. It’s really simple. It just destroys everything that it created. It only touches the infrastructure that it managed itself. It doesn’t touch anything else you have in your infrastructure. But it’s a nice way to clean things up. It makes it easy to use Terraform for things like staging environments or development environments and just tear them down when you’re done. Terraform Destroy has all the same properties as Apply with handling partial errors and so on. If you’re running Destroy on something like a CI to clean up, it’s nice because if it fails, you just rerun it. You just rerun it until it succeeds. It’s not common to have failures like that, but it’s nice to know that you don’t have to think about it. You just run it until exit code zero and you’re good to go.

Real-world uses

Okay, that was like a whirlwind introduction to Terraform. Terraform, like many tools, is something that is easy to get into—I think you know enough at this point to easily get started with Terraform—but it has a lot of depth, too, and due to the time constraints of the webinar, I can’t go too deep. But something I wanted to talk about is, “Okay, you have this basic knowledge. How do you start applying this to a realistic example?”

This is going to be a very high-level workflow example of what you should do when you’re getting started on Terraform. The examples I show you here aren’t going to be runnable exactly as is or anything, but they should give you enough breadcrumbs to get to where you need to go.

Going back to this example that I showed earlier in the webinar, this is a standard web application deployment. When you start using Terraform, you’ll start to see these things like this. With this example, what’s happening here is that the purple circles are resources. Then, the arrows between them are the interpolations. When you start thinking about your infrastructure, very quickly after you start using Terraform, this is how you’re going to start seeing things. But it’s a useful way to see things. Going back to here, the purple circles are the resources, and the arrows are the interpolations.

Let’s get started turning this into a Terraformed infrastructure.

What’s really important about adopting Terraform is that you don’t have to go all in. Terraform isn’t useful only if it’s completely managing your infrastructure. It could manage just the partial part of your infrastructure and still be very useful. That’s the best way to adopt it, especially when you’re getting started, just identify something that’s easy to automate, low-risk, and just start there.

In this case, I put a purple box around what is probably a safe thing to automate in every web application, which is the web server tier. The web server tiers are usually stateless. They could come up and down, and load balancers handle routing for them only when they’re healthy and draining when they’re not. They’re usually a pretty safe way to get started.

If we’re getting started here, pretty much just purple circles. There are no arrows fully contained in the box yet. We just have to create a number of resources. That might look like this. We create the resource. It’s an AWS instance type. It’s part of the web tier, so we’ll just name it “web.” Remember, that’s just an internal name for us. Then, we specify a number of attributes. The one thing that we didn’t talk about—that I will now—is this count thing.

Every resource in Terraform can specify a count. This is a meta parameter, so it’s a parameter to Terraform and not to AWS itself. What Terraform does with this is just duplicate this thing exactly that number of times.

In this case we’re saying, “Count equals four,” because in that diagram I just showed you, there were four instances. It might just look as simple as this. In the diagram it’s using an auto-scale group, but to get started with Terraform, you might start with this and work your way up to that. We’re just going to show this AWS instance without an auto-scale group as an example.

Once you have that in place, the green box shows us what’s done. We’ve now Terraformed those four things. Now, we can think what’s next, like what’s the next easiest thing. You can go in any direction, but in this case this load balancer looks like a good place to go.

If we want to do the load balancer, we’re now looking at this purple circle, it’s a batch load balancer, but also the arrows that are pointing to those web servers. We need to be able to reference those instances somehow. Here’s how that might look. We add an ELB resource. Then, for the instances, we just reference all of them. This is, again, is new syntax. I’ve shown you interpolations of referencing one, but when you have multiple you could use the star syntax that’s in here to say, “I want all of them.”

In this case if we change the count of AWS instance, for example, from four to five or four to three or four to 40, after doing all that, Terraform would update your ELB with all of those instances and IDs. That automatically creates those four arrows that we had.

Now, at this point, you would have a green box around all these things. You’d be automating all of this with Terraform. At this point, you just rinse and repeat. You choose what you want to automate and what you don’t want to automate. Even for people that use Terraform really heavily, it’s not uncommon for people to just feel comfortable with a few things still managed manually, for whatever reason. Maybe it’s just difficult to automate. Maybe they have a really good system in place that doesn’t require Terraform. You don’t need to apply it to 100% of your infrastructure. That’s one of the reasons it could be so great. So just rinse, repeat.

Next steps

Since I know this is a very surface-level talk, I’ll give you some next steps to guide you toward—if this was interesting to you—what you could look at next in order to learn more about Terraform and implement it successfully.

There are a few features that you should definitely look at. You know all the basics, so these features wouldn’t be scary to look at, but variables and outputs are a way to parameterize your Terraform configuration, which becomes important pretty quickly. So those would be important.

Interpolation functions—you saw interpolations like referencing the ID and stuff. Terraform ships with a rich library of probably 50 or so interpolation functions that you could call to mutate some of that stuff. This comes from basic things, like turning things to all lowercase, to more advanced things, like manipulating things like CIDR blocks and network addresses and parsing IP addresses, which becomes really important with infrastructure. I recommend taking a look at the interpolation functions. It’s just like a standard library that comes with Terraform.

Modules are a way to encapsulate your configuration so that you don’t need to repeat yourself. And they’re sharable. For example, you could wrap your company-approved way to manage VPC layouts. You could wrap that in a module and tell everybody, “If you ever need a VPC, use this module.” Users of the module wouldn’t have to know the details of what kind of CIDR block you use, what size space, how many subnets, how do we configure NATs, how do we configure routes? You don’t need to know any of that. You could just say, “I want a VPC.” Then, the module would create it for you and give you the relevant information like security groups and the ID of the VPC and things like that. Modules, they pop up relatively quickly once you start using Terraform.

Then, the last thing is remote states. Remote state is a way to basically collaborate on Terraform more efficiently. I’ll just leave it at that and you could take a closer look.

But these are the things that you probably want to look at next.

Then, the resources that are available to: The project website has docs covering everything. The project website is going to be indispensable, as you use Terraform, as reference material.

There’s a third-party GitHub organization called Terraform-community-modules, which we’re gearing up to become more involved in. But this is a community thing that spun up. It’s a good way to look at sample Terraform modules. They’re pretty high-quality. Take a look at those.

Google’s your friend. Especially in the past six months, Terraform has been on a rocket ship level of adoption. There are a lot of blog posts out there that cover how to use Terraform. Don’t be afraid to Google things. You’ll probably find an answer.

As part of that, I should mention that Terraform also has official community avenues. There’s a Gitter channel for real-time communication, which has a couple hundred people in there on average. I hang out in there and see it’s pretty active. Then, there’s a mailing list. The mailing list is a good way, if you don’t want an answer now, send it to the mailing list. You’ll probably get more people viewing your thing on the mailing list than you will in the real-time chat. If you could wait a little bit, the mailing list usually gives you very high-quality responses, too. HashiCorp as a company also does various events like this. We do user groups. We also do trainings as well if your company wanted that.

Then, the last thing that I want to mention is books. Just in the past month, we’ve gone from zero Terraform books to two Terraform books. Terraformbook.com is written by someone in the community and is available now. You can take a look at that. Then, just today—so I didn’t even update the slide; I didn’t even know until today—but just today O’Reilly announced that they have an early pre-release version of a Terraform book as well. Those are both pretty up to date with the latest version of Terraform, and you can check those out.

Speakers

Build your entire AWS infrastructure with one command.

Servers, network, storage, DNS, CDNs, load balancers, and much more all have APIs. Terraform is a tool to model all of these resources in a single language across multiple cloud providers, then bring then up and connect them all in a single command. Terraform can then be used to model changes to your AWS infrastructure and safely effect those changes.

Terraform's safety comes from the ability to "plan" changes: Terraform shows you an execution plan of what it will do before it does it. You can then determine if a change is safe or not and whether to apply it.

Along with this safety Terraform is highly resistant to errors: it retries failed operations and is idempotent and can be run again if an operation cannot complete the first time. In this talk we present the problems faced in automating infrastructure and how Terraform is being used to solve them in production.

More resources like this one

  • 2/3/2023
  • Case Study

Automating Multi-Cloud, Multi-Region Vault for Teams and Landing Zones

  • 1/5/2023
  • Case Study

How Discover Manages 2000+ Terraform Enterprise Workspaces

  • 12/22/2022
  • Case Study

Architecting Geo-Distributed Mobile Edge Applications with Consul

zero-trust
  • 12/13/2022
  • White Paper

A Field Guide to Zero Trust Security in the Public Sector