Terraform Fireside Chat with Mitchell & Armon
Oct 28, 2020
As HashiCorp Terraform begins its march to 1.0, the next release includes new features such as dependency pinning. Watch the two creators of Terraform, Armon Dadgar and Mitchell Hashimoto, talk about the long road toward ultimate success for the product.
- Armon DadgarCo-founder and CTO, HashiCorp
- Mitchell HashimotoCo-Founder & CTO, HashiCorp
Welcome to the Terraform product keynote. To kick us off today, me and Mitchell wanted to spend a bit of time talking about the history of Terraform and how we came up with the approach that we ended up taking.
I relate it back all the way to when we first met at the University of Washington, where we were both on a research project figuring out how to enable the scientific compute cloud.
This was circa 2007, so early for a public cloud infrastructure. Amazon was your only option, and it was very primitive at the time, so a lot of our focus was, How do you build and manage infrastructure in a public cloud, given the dearth of tooling that existed?
One of the most painful parts of that was, How do you provision and manage the lifecycle of infrastructure? We've taken multiple stabs at this problem over the last few years, from our efforts on the research project to projects we hacked on in our spare time and presented at user groups to some of the efforts we did after college in a number of startups, and then ultimately to our efforts at HashiCorp.
The Birth of Terraform
Mitchell, maybe you want to share a bit of how you started to think about the infrastructure as code paradigm, and what was your first exposure to that way of thinking about it?
I think my first exposure was probably configuration management and setting up and installing software on servers. At the time, we viewed servers more as pets than cattle. You tend to set up a server and keep it running for a very long period of time.
The tooling circa 2008-2009 was pretty good. I was happy using tools like Chef and Puppet to do things like that, and I thought a big gap in setting that stuff up was creating the servers, creating the load balancers, and things like that. I felt like I was clicking a lot to do that, and it was not very automatable.
I think it was 2009, Amazon announced CloudFormation. I wrote a blog post I think a day after that release talking about how excited I was about this CloudFormation release, because it really felt like a pain point that I had been imagining was validated, and this was a solution to that.
I quickly played with CloudFormation. One thing I attached to right away is I wanted to improve it. My background and my passions really lay with open source, and here was this closed-source, proprietary, all-Amazon tool that I couldn't improve.
I wanted to add support for more features and more resources that weren't supported by CloudFormation, but available in AWS, and I just couldn't do it. At the same, time there were some other systems, like Linode and other VPS (virtual private server) providers, that I wanted a similar CloudFormation-like experience with, and I couldn't get it, again because it was a closed ecosystem.
So I wrote this blog post saying, "This is a really fantastic idea. I'm super excited about CloudFormation. I can't wait until somebody does this in an open-source fashion that can support any sort of infrastructure provider."
I published this blog post thinking any day now, someone was going to do it, whether it's a new company or an existing config management company or someone else. Anybody.
This was before HashiCorp existed, before we wrote any of our open-source projects. I was a second-year student in college, so I certainly didn't expect that we would do it. But fast forward 4 or 5 years, and now I'm graduated, I'm working in a job, and I still have this problem, and it's still not solved.
I thought, "I guess we will do it. We will give this a shot."
I was feeling a bit more confident then that we had the knowledge and expertise to be able to pull this off. And that was really the basis of Terraform and how Terraform came to be.
It was something I viewed as an open-source ecosystem to manage any infrastructure provider as code, and to foster this community where everybody helped make the support for more and more resources grow at a rate that no single company could do on their own.
To my happiness, that is how it turned out, and Terraform is this really vibrant, active community that makes Terraform work for pretty much anything in the world with an API.
That's really how Terraform came to be, and it was based in this notion of, Let's automate everything.
It Wasn't All Smooth Sailing
Mitchell, the way you described it makes it seem like there was this obvious jump for us, from how we thought about the problem and what we wanted the solution to look like to Terraform as we know and think about it today.
But the reality that me and you lived through was multiple fits and starts of us thinking about this problem space, trying different approaches to it, feeling like it wasn't quite right, and then retrying and iteratively getting there.
I feel like we both agreed that, from the user's configuration perspective, we wanted something that felt like a declarative statement. Very much similar to Puppet, similar in many ways to CloudFormation, though I think both of us had a bit of an allergic reaction to the JSON syntax of CloudFormation.
We knew we wanted something that was going to be a user-friendly language, and there was a lot of work for us to evolve HCL (HashiCorp Configuration Language) to where it is for Terraform.
But we wanted something user-friendly, we wanted something declarative, and we didn't have a good understanding necessarily of what the implementation would look like.
I think Terraform as we think of it today is maybe version 3 from our internal approaches. It might be helpful if you talked through a bit of those left turns that we took that ended up being a dead end, or the right turns that we took that we decided we had to change course just in terms of the design and the approach that we ultimately took.
That's a really good point. Maybe I was trying to forget those false starts. But Terraform really started out augmenting existing configuration management tools to try to manage infrastructure.
One of the first things that we thought was, "Puppet, Chef, those sorts of tools have a syntax that's close to what we want, and they manage software installations close to what we want, so maybe they can manage infrastructure as well."
We started out by implementing it that way, and one of the false starts we hit really quickly was that a lot of the tools then were agent-based. And the agent in this case didn't make sense because we were an outside actor operating on something from the outside versus something running on a machine, operating on it from the inside. Architecturally, it felt a little weird.
There were some missing features that we felt were really important for infrastructure that were less important for software.
The big one that really put us over the edge toward building our own solution was this concept of "create before destroy." One of the first features we hit was, "I want to stand up a new EC2 instance and add it to a load balancer before I delete the old one.”
It's a basic concept, but I think that that desire permeates deeply into how the system works, as well as all the features around it. So those sorts of things pushed us away from, "I don't think the config management way of thinking and architecting applies perfectly to this."
So we started moving into the Terraform world, and some of the surprises we had in Terraform were people want to do a lot more with infrastructure than we imagined was possible. So bringing things like modules to make reusable infrastructure blueprints, and things like looping constructs and conditionals.
In our original basic designs and basic needs for the size startup we were at the time, we didn't need that, and we were pretty happy with what we built. But as the realities of the world came into this tool, we realized that more and more of these things were super-valuable to support more complex use cases.
But all of this came back to needing a specialized runtime, a specialized core focused on infrastructure management.
I'm happy with where Terraform is at today, but along the way, I don't think any other solution really hit that for us.
Even internally, we felt like, not only were the existing approaches wrong, but we experimented with a few different ones as well. Initially it was more of an imperative, CRUD-based model, where we had to explicitly write in the CRUD operations, and then the system would help us order it.
Finding the Right Model
We felt that was heavy and difficult for users to deal with, so pretty quickly we transitioned off of that and said the core engine shouldn't be responsible for the ordering of the CRUD operations.
But then how do you make the bigger system make sense? We played around initially with a model asking, "Could each resource be a finite state machine (FSM)?" And we modeled it as a series of transformations between state, and then each FSM would be able to interact with other state machines so that you could create dependencies to say, "My load balancer depends upon the VM."
Quickly we started asking ourselves, "How do I do these really complex transformations across these different FSMs, and what happens when I have these undefined states?" We felt like, This model doesn't make sense.
Then we transitioned to an actor-based model where each resource was almost an actor, and there was a message-passing interface between them.
This allowed the system to be highly concurrent the way Terraform is today, but also confusing for users to deal with and very difficult to build a programming model around, because the ordering of execution was so random and everything was happening concurrently.
We tried the FSM model. That didn't feel right. We tried the imperative model. That didn't feel right. We tried the actor-based model. That didn't feel right.
Where we ultimately got to was this graph-based engine, where we model all the infrastructure as a graph. The engine has a very predictable execution order. It has a limited concurrency on purpose so it doesn't overwhelm the underlying cloud systems. Nor does it overwhelm the user with completely randomized output.
But this fourth approach had a bunch of nice benefits. It freed you from the imperative logic. It wasn't terribly complicated if you wanted to write integrations. The Terraform providers are basically just CRUD. You just tell the engine how to deal with the CRUD, and then it's responsible for the ordering on your behalf.
But there was this evolution of approach, and the other piece is that the way we write providers changed quite a bit.
With Terraform 0.1, there was a lot of work. There was a raw interface between Terraform core and the engine itself to what a provider had to implement, which was really all of the delta logic, all of the CRUD logic, really figuring out some of these edge cases that were pretty complex.
I think it was you who identified, "This is too difficult. This is too painful for somebody else to try and integrate with Terraform. We need to lower the bar."
Making It Easy to Extend the Product
I'm curious how the firsthand experience of writing a Terraform provider and the pain involved ultimately led to your thinking around the notion of
helper/schema, the frameworks around Terraform providers?
One thing I'm really proud of with HashiCorp tools in general is we've always done a really excellent job of building approachable interfaces for people to extend our software. It's learning over multiple products.
We had Vagrant first, which had plugins, and then we had Packer, which had plugins. Terraform was the next tool and it learned from those that it's very important that writing a plugin be as easy as possible.
Because Terraform is really a glue tool that orchestrates all these different resources and providers for these glue tools. Glue tools live or die by the community that grows around them, and so if it's very hard to extend them, then the core itself, no matter how good it is, just won't survive.
It really came from that first experience. I think we were a week out from releasing 0.1, and 3 of us were sweating in a New York summer in an apartment trying to build providers as quickly as possible.
I was just realizing that I couldn't imagine anybody other than us, who had built the core of the system, writing these providers.
It was too soon to release a fix for that, so we shipped 0.1, but right after 0.1, or it might've been just before, I wrote a framework that effectively survived until a few months ago (we're starting to see some major changes now).
But I wrote this framework that allowed people to easily write providers. I like to call it "the Ruby on Rails for infrastructure as code." It was just a quick way to write a provider, and I think it was a success, because there are over, I think, 200 providers today.
It's worked pretty well, and I'm really happy about that.
I think there are over 300 providers today. (Editor’s note: Actually, there are over 450 Terraform providers on the date of this interview)
Terraform Was No Overnight Success
But this brings up another interesting point. For a very long time, I don't think it was obvious to us that we'd got the problem right with Terraform. Today people often say Terraform is an overnight success, and they see the usage of it, but the reality is that 12 to 18 months after release, there was a flatline of Terraform adoption and usage. It wasn't a vibrant community the way it is today.
There's this interesting dilemma, the conviction we felt of this being the right approach versus helping bring the market and users with us.
I'm curious if there's anything you want to share, Mitchell, from that period in terms of how you thought about the conviction we had in terms of the approach and the architecture and infrastructure as code as the answer versus what the general sentiment was in terms of people thinking about infrastructure management.
We've had very few overnight successes, especially at that time, so I was pretty familiar with how that felt. Terraform to me felt really similar to how I felt about Vagrant when that came out. Terraform wasn't super popular early on.
Like you said, it was about 12 months of slow, slow growth. But every time I used Terraform, it just felt like the right solution. It was fun to use, it solved my problem. It was fast. It gave me a lot of confidence in what I was trying to do, and so it just felt right.
I didn't see anything else filling that gap. There was no other option for me to try. I felt really good about it, and I think that helped give us that conviction to keep working on it and keep growing the community into what it is today.
For a lot of different reasons, Terraform got a lot better. The core got a lot better. Early contributors came on board and built a lot of great providers. Early partners came on board and started pushing Terraform a bit more, and all of that came together to growing Terraform to the success it is today.
Commercializing Terraform and Other Products
Another funny part of that history is that, because it took so long to get that initial bit of success, there's some question about, Did we plan on monetizing this thing? How are we thinking about making it successful as an open-source project versus monetizing it? Armon, I think it'd be helpful if you talked about our views toward commercializing Terraform in particular.
That's a great point. I think this applies to not just Terraform, but really to the approach we took with all of our open source.
Our philosophy has been pretty consistent with all the products: Stay focused on the problem the end user is trying to solve what the open-source tool, and let's make that excellent. Let's make the experience excellent. Let's make the technical capabilities needed for that excellent.
It's really focused on making sure that, if you're a practitioner or an end user trying to solve provisioning problems with Terraform or security problems with Vault or networking problems with Consul, etc., make that an excellent experience.
And that's not the focus where we commercialize. Where we commercialize is really looking at moving from an individual solving this problem to a team within a larger company or a massive corporation trying to provide Terraform as a service across thousands of developers, or Vault as a service across dozens of application teams.
That's where we look in terms of those challenges that come with that type of scale. Some of those might be just the true scaling challenge of multi-datacenter and the need to replicate data and things like that. Some of those might be driven by compliance requirements. You're a big company, and you need FIPS certification or PCI or FedRAMP or etc.
Some of them are just collaboration challenges. If I have dozens, hundreds, thousands of people trying to collaborate, how do I do that safely? How do I do that where I'm managing access controls and audit capability and single sign-on, etc.? How do I put the guardrails in place to say, "I want to have hundreds of people work together, but I want to do it in a way that I'm managing the risk around that."
That's been a consistent way that we've thought about monetizing all this stuff, certainly for the first few years. With Terraform, certainly the first few years that it existed, it was purely a focus on the open source and the experience around it.
More recently it's been around Terraform Cloud and Terraform Enterprise. And even with Terraform Cloud, we really look at it as great for small teams, but can we bring some of those collaboration capabilities and make it free?
If you're a team of 1, 2, 3, 5 people, using Terraform Cloud for free solves some of the challenges of small-team collaboration.
And it has those capabilities that you need as you get to scale.
Terraform Product Updates
At this point I would like to introduce Robbie Th'ng, who's the product manager for Terraform, to share a bit of the product updates and what's planned for Terraform and the ecosystem around it.
Next video: (Keynote - The State of Terraform (plus Vagrant and Packer))[https://www.hashicorp.com/resources/keynote-the-state-of-terraform-plus-vagrant-and-packer]