The Hitchhiker's Guide to Terraform Your Infrastructure
Jul 07, 2020
Pick up a few HashiCorp Terraform best practices and learn about pitfalls in this 'hitchhiker's' guide.
When you start using Terraform it can be confusing, you might run into a few pitfalls and be stuck with an infrastructure that's not fully automated or error-prone. This talk will show you best practices and tricks learned while building a Kubernetes infrastructure in AWS using Terraform.
HashiCorp also recommends trying out the free tier of Terraform Cloud if you want an automatic best-practices path to Terraform deployments and don't want to deal with things like managing remote state.
Speaker: Fernanda Martins
Hi, welcome to HashiConf. My name is Fernanda Martins, and I'm here to give you the "Hitchhiker's Guide to Terraform Your Infrastructure," and to tell you about some tips that I've learned by using Terraform.
I am a DevOps minion. I really like infrastructure as code. I am a CI/CD enthusiast. I really like open source and advocate for it. I'm also a gamer. I try to play piano sometimes, and I am really struggling to learn Dutch.
Currently, I work as a DevOps engineer in L1NDA. We make software for hospitality. We want the interaction between workers and businesses to be as smooth as possible with no hassle, so restaurants can hire the best waiters and so the waiters can be satisfied with their job.
I will start by telling you a little bit about what Terraform is, but I won't be going into much detail on that. Then I will do a little disclaimer about Terraform 0.12, and then I will go through the tips I've learned for things like modules, states, naming, null resources and ... well, don't panic, and bring your towel.
If you don't know Terraform and you are here looking for more, you can go to the link shown here. But this talk is really not about what Terraform is, but about the tips inside it.
One disclaimer: All code in this presentation is 0.11, but the tricks and the best practices I've learned are all 0.12-compatible.
Using tfswitch to migrate to Terraform 0.12
So the first tip: if you're going to migrate to 0.12, use
tfswitch. It's a very nice tool. You can have multiple versions in Terraform, and it helps you to migrate.
Use More Community Modules
Moving on to the real insights, starting with modules.
Modules are a good thing to have when you are coding, because you want to divide your code as best as you can and organize it best as you can. You should be using modules. I used to write my own modules for everything. People in my team and I would use these modules. It was nice to write my own modules, but then I started to build a lot of them.
I realized that it's a good thing to use modules from the community. The community has more tests. You can also help the community and give a little bit back. I found it to be more effective.
I still build modules on my own, but I use the modules from the community plus business rules from my company. I merge them to make my own module. You get the testing from the community, the support from the community, and your business growth, and you make your own module. It's the best of the both worlds.
If you have secrets, you don't want them committed into your version control. You want to find a way to pass them in the command line using
TF_VAR. Also, you want to secure your outputs using the sensitive attributes, or you can use a secrets manager like Vault.
It's very important that you don't commit passwords without
repository in Terraform provides you ways to do that.
Also, I really like is a tool called terraform-docs, where you can generate automatic documentation. As you can see on the screen, it will appear with very nice inputs and outputs, and people can refer to this documentation when they want to use your modules. Everything is nicely, automatically generated.
You don't have to worry, except, of course, about putting the description in your inputs and outputs. But other than that, everything's very nicely generated.
Module Order and Dependency
One other thing that I've struggled with a bit was module order and dependency.
As you know, Terraform executes everything in parallel. You can establish the order of execution by using 1 input as an input to the other module.
There is also
depends_on, but it doesn’t work with modules (Terraform 0.13 now supports this), only resources. But it's something you can use when you want to chain resources and establish an execution order.
On screen, you can see that we are establishing an order of execution, but using 1 module as input to the other. In this example, first you are receiving the
arn from the bucket. In the role
arn, you created a separate module. That's how you usually do it.
Challenges with Execution Order
Sometimes you will face execution order issues where the 2 modules that don't need to receive things from each other. I've run into issues where we had race conditions and, for example, a role runs, and before it's fully created, Terraform already gets this role and uses it somewhere else. It's a bit of trouble.
This is an issue in Terraform, a very well-known issue, that you cannot use
depends_on within modules, or it cannot explicitly tell you the order.
I have heard that this will be possible in 0.13. I have much hope that this will be possible in the future.
But even if this is fixed in 0.13, we're still going to have this issue with 0.12. So I use Terragrunt to manage my dependencies. Terragrunt ensures that things will run in the order that I want them to. Terragrunt is a wrapper on top of Terraform. In the back, Terragrunt will be running Terraform.
It has a lot of other features, some of which will be discussed later in this talk.
To quickly recap on modules: Use open source, make your modules based on the community modules, and use the auto documentation for your other teams, and you start going for dependency control for now.
Terraform states map your infrastructure to the real world, and it's important that you understand them.
When I was starting with Terraform states, I wanted to keep my states secure. I wanted them to be accessible remotely. I wanted to have locking mechanisms, so 2 people can't change my state simultaneously. And I wanted a state per environment and no manual creation of the states.
As you can see on screen, the first 3 are supported by Terraform. I was looking at how to do this with environments and how to not manually create the states. It was very tricky for me as a beginner to understand it all.
At first I thought Terraform Workspaces would solve my issue, but what I was seeing in the community and hearing at talks suggested a Terraform Workspace wasn't a very good solution. There was a lot of pushback from the community and the presentations I went to.
To manage environments in the states, I'm using Terragrunt. It organizes the state for me in a nice way. You can check in the Terragrunt GitHub for more information, but I really like the way the state is built.
Labels and Tagging
Labels and tagging are very important for your infrastructure. Labels in Kubernetes are really nice for labeling your resources. And tags, in the case of AWS, which is my context, are very nice if you want to do some cost management.
I use Terraform to apply my tags. In this example on screen, I'm applying labels to Kubernetes. You can see that I can have reusable tags with environment information and module information, which is really nice. I can look at the assets in an AWS Git, and I can trace back in the code to where this was created or how this was made.
I like to call null resources the rebel in Terraform. It's the way that you run shell script commands, especially for things that are not supported by a provider in Terraform. I had to use it for Kubernetes.
One thing I've learned is it is very important to add triggers to your null resource, so your null resource knows when it has to be triggered. Also, sometimes I needed to run the command before 2 actions were done. I use
depends_on here also. I want all this to be done before I run the command.
We can also destroy with null resources, with the clause "when." You can see on the screen that you can program the create action, the apply action, and the Terraform destroy action. But you have to be very careful with this. Null resources could first try to destroy, and then apply. It's a behavior. I learned this from an experience that quite shocked me.
I ran our cluster manager, Kops. I ran a delete and then a create, and while I was doing that, it almost destroyed my complete infrastructure, because I thought that the creation would always be triggered if there's a change.
But what happens is that every time this null resource is triggered, it runs a destroy and then a create. You have to use it with caution. Always keep in mind that the null resource will be first destroyed and then will be created.
Local Files and Templating
I used Kubernetes a lot at first. Now say you have a situation where you have a file, and you generate a template with this file, save this file, and then use it as a feed to Kubernetes. In this example, I am creating a configmap with Terraform templating, and then I'm feeding it to a command that will run Kubernetes.
But in the end, I thought, this is not a good idea. Because what happens is, if you use a CI or something like that, every time that it runs Terraform, it will say, "I'm going to create this local file." And that really can pollute the output of the Terraform plan.
I was getting annoyed, because when you look at the Terraform plan, you want it to be as clean as possible and with no distractions. I don't want to know if a file hasn't been created in my machine; I wanted to know important things like what's going to be really changed, what is going to be run.
So I started to not save this local file anymore, and I use as a feeder to the Kubernetes command.
As you can see on screen, now I create the configmap with the templating, and then every time it changes this template, it renders a file change. Then if you run the Kubernetes command using this EOF thing, everything would be done.
I don't have the local file being saved in my machine anymore, and I don't have to see that all the time in the output.
Using Kubernetes with Terraform
Why would I want to do this? Why not use only Kubernetes commands and not add them into Terraform?
I wanted 1 execution tool, and I have several environments, so I wanted to see the templating from Terraform instead of adding another templating technology into my pipeline.
I thought about it. If I execute everything with Terraform, Terraform has templating, which I need for my environment. Then why not? Also, I have situations where I need to do some stuff with Kubernetes, do a little bit more with Terraform, do a little bit more with Kubernetes.
So I have this alternating, which can really complicate things. One example for this is I want to create a load balancer with Terraform, and then I want to add my Kubernetes configuration to it. Or adding the DNS records to it. That was what motivated me to use Kubernetes on Terraform.
At first, I wanted to avoid null resources. I gave you some tips on how to use it, but I prefer to not use it because you need to control the flow, you need to control several things. It's really complicated to manage.
Here is an example for 1 deployment or 1 configmap. But imagine this replicated 10, 20 times. It's too much null resource, and you end up being a bit crazy.
I wanted to avoid using it. So I started to experiment with the Kubernetes provider. Terraform has a Kubernetes provider, and then I started using it instead of null resources. It was quite nice for a while, but then I realized that I lost my YAML. If you use Kubernetes, that YAML is your baby in Kubernetes, and I lost it. I lost it when I started to use the Kubernetes provider. I was really upset by that. (There is a new Kubernetes provider arriving and with Terraform’s built-in function or tfk8s, you can convert YAML to HCL)
For low-maintenance Kubernetes resources, for example, if you want a monitoring deployment that you will trigger once and leave it alone except to upgrade it, the Kubernetes providers are the way to go. But for more complex deployments, like your application itself, wherever you need YAML, you need flexibility etc., I would advise using another tool, or even continue using null resource.
I told you that I use Terragrunt with Terraform and how all of these things are combined. I use Terragrunt plan-all and apply-all. What happens is, Terragrunt apply-all will run all your modules in a specific order based on your dependencies. It's nice because I can enter a folder, like an AWS development environment, and then run the plan-all.
In that the development environment, everything will be run and will be built; that infrastructure will be built and shown to you. That I really like.
And you can also run apply-all, which will apply everything. You ensure that your whole infrastructure for that whole environment will always be run by Terraform, and you're always getting to the states that you desire.
Terragrunt has way more features that are not covered here. But that's how I've been running my infrastructure.
You also have Terraform plan/apply, of course, for 1 module. If you don't want to run a bunch of stuff, like all of the modules, you can run only 1, like when you are testing: "Oh, I'm changing this, so I'm going to test this right now." For that, I still use Terragrunt, because once you have it, you have it. So you need to run Terragrunt plan, but it calls Terraform.
I also use Terraform-landscape to make it colorful and pretty. I also can see the diff of my changes, what it was before and what it is now, but in a more organized way.
And you can see, if you build a JSON or something, as part of the Terraform, you can really see the diff. It really helps. I really like it.
You can use landscape as well with normal Terraform without using Terragrunt. You do the Terraform plan and the landscape, and you can see everything there. It's also nice because sometimes the plan-all is really a lot of stuff. Imagine if you are running the entire infrastructure plan; it can be a lot of outputs. So it's nice that you can zoom in with the color from nice outputs.
There are more tips in this slideshare link.