Adopting a Cloud Operating Model With the HashiCorp Stack: Where Do You Start?
Aug 28, 2019
HashiCorp’s stack makes life in the cloud much easier. But which product comes first?
Founder & Co-CTO, HashiCorp
Founder & Co-CTO, HashiCorp
Mitchell Hashimoto: People often bring up that HashiCorp has a lot of products and they do quite a lot of different things: “What’s the best approach for adopting those products? Would you, why would you, and how does that fit into the operating model in the cloud?”
Armon Dadgar: It’s a super-common question because there are a lot of tools and a lot of places to start. And what we’ve seen is a pretty common pattern around organizations that are either operations-led or security-led, depending on which team tends to be the front foot for new initiatives.
And so for organizations that are more operationally led, their first challenge is: “I want to go to cloud. How do I provision anything there?” My Day 1 provisioning challenge is the primary concern of the ops group, and so it tends to be a Terraform focus around, “Can we bring in Terraform?” And it gives us some way to define a workflow around Day 1 provisioning, Day 2 management, that’s going to work not only in our first cloud, but inevitably in our second and third, and probably private clouds as well.
So it first starts around the journey of Terraform: “Let’s get the first 10 VMs running and define our network.” But from there the challenge pretty quickly becomes, “We have 5 VMs running, but how does the web server get the database credential? And what about these encryption keys, and what about the API tokens we need to distribute?”
So then the challenge becomes, “If we want to automate the delivery of these apps, we have to do it in some way that’s secure.” And then the next logical tool becomes Vault. It’s like, “We’ve automated spinning up the 5 web servers. How do we now have somewhere that they can automatically, when they boot, go provision the credentials they need?”
Step two becomes: Deploy Vault as an initial identity broker and secrets management store so that the apps can come up, authenticate against Vault, and fetch the secrets they need to do useful things.
So I think the “Terraform first, then Vault” approach tends to happen for ops-led organizations. For security, it’ll be backward.
Their first challenges is, “I have a bunch of Amazon keys and certificates and usernames and passwords that I need to securely distribute, but I’m not going to let any apps go to the cloud until I know what I’m going to do with these things.” So it tends to be that those organizations start with Vault and figure out, “I can run one copy on premises, one copy in the cloud, and replicate between them. So I have a consistent way of managing policy and secrets no matter where it’s running.”
So they usually start with that. And then it becomes a conversation of “I’ve run this thing; how do I now deploy 10 apps—one’s a web server, one’s an API, one’s a database—and allow them to programmatically get those secrets?” It becomes a Terraform conversation. Those tend to be the two starting points.
Mitchell: Yeah, definitely. And we built our products because we viewed each of these problems as fairly fundamental in terms of adopting cloud and just adopting a more dynamic way of looking at infrastructure. And whether it’s Terraform, Vault, Consul, Nomad, the idea is that you will reach those problems, and, as you said, depending on the type of organization or depending on the pain points that are in your organization, who’s most innovative, you’ll hit one of them first.
But the idea is that in a modern, dynamic environment, we don’t need to sell you on anything. You’re going to hit that problem. And we believe our solutions are the right way to solve them.
Armon: And it is funny: We didn’t build them in the order we see people adopt. Consul predates Vault and Terraform, but in practice tends to be the third tool that gets adopted. Our thinking at the time was, “I’m gonna deploy all these microservices, and they’re all gonna be independent teams. Team A and Team B and Team C aren’t gonna coordinate. They’re going to just deploy at different rates. And so the problem is if they need static IPs to coordinate between them. It will never work.”
Our thinking from the very beginning was you need Consul first. But in practice it turns out you need Terraform to deploy the apps to begin with, and you need Vault to secure them.
Then your third problem is, “I’ve deployed these apps that are secure, but they need some way to talk to each other now. I booted them all, But how does my web server find the API? How does the API find the database?” So my Day 3 problem becomes, “I need a dynamic networking framework for all of this.” So it’s funny that we started with Consul, but in practice it’s the third tool that gets deployed.
Mitchell: Getting into the weeds a little bit: I think that the prioritization was, “What had adequate Band-Aids at any given moment?” And I think that security would just encrypt everything in the database. It’s an adequate Band-Aid but doesn’t scale very well for what you need. But Consul was a huge issue that we didn’t have a solution for. So building that first was more of, What could we get away with? Versus the way the problems occurred.
Armon: It’s interesting that Nomad is last. Because in some sense, if you think about the necessary challenges of the cloud operating model, my first problem is, “How do I provision?” My second problem is, “How do I secure this?” The third problem is, “How do I connect it all?” The fourth and last problem is, “I have this common stack, but now I want to light up my developers to launch 100 different types of applications on top.”
So I want a common deployment and runtime platform. And it made sense, I think, that that was the last piece. You have to solve the others first.
That tends to map to what we see from an adoption. You’ve got to nail the first 3, because that affects every app, no matter whether it’s Spark or Kubernetes or Nomad-based. You have those problems, and once you get to that runtime layer, that’s when you have a bunch of choices. You might say, “I’m going to go pure cloud-native with Lambda. I might go Kubernetes and run it on and off prem. I might go Nomad and have a large-scale scheduler.” But the other 3 are more essential.