7 Years On: Remembering the Origins of HashiCorp Consul
Oct 21, 2020
Join the HashiCorp founders Mitchell, and Armon as they take us through the origins of HashiCorp Consul, our service discovery and service mesh product.
- Armon DadgarCo-founder and CTO, HashiCorp
- Mitchell HashimotoCo-Founder & CTO, HashiCorp
Welcome to the Consul Product Updates. To kick us off, I wanted to spend a little bit of time talking about what brought us to the idea of Consul and the approach that it takes.
It's a relatively unique product. I think it goes all the way back to the Tao of HashiCorp, which is a document we published around some of the essential design philosophies we have for the whole portfolio.
This includes things like our natural-workflow-over-technology orientation. What we mean by that is focusing on the core workflow problem, but making sure that the tool is pluggable to different technologies.
In Consul's case, for example, we don't really care if you're running on premises or if you're running in cloud, if you're running on bare metal, virtual machine, containerized environment, etc. The actual environment—the platform, the technologies, Mac, Windows, Linux, etc.—is designed to be pluggable against a common workflow. That's an approach we take in all of our products.
A different, important Tao for us is this idea of immutability. When we talk about immutable infrastructure, it's this notion that, once I create a particular version—I have my version 1 server, for example—I don't modify it in production anymore. I boot it from this golden image that describes version 1 and then that's it.
If I wanted to deploy a version 2, I'd create a new golden image that describes version 2. I'd boot a server in that image, and I'd never touch it again. In this sense, it's immutable; we don't mutate it once it's been created.
This was always a core ethos of HashiCorp, from the very beginning.
Microservices in immutable architecture
If you go back to circa 2013, and put yourself into the realm of the tooling that existed, it was a world that was heavy on things like configuration management. The most common practice was: You deploy a server, let's say a web server, and you want that web server to know the IP address of your API service or your database so that it can connect to it upstream.
Probably the most common practice at the time was you'd run Chef or Puppet or Ansible to update the IP address of that upstream service. Great, I deploy my web service, then I run Ansible. I say, "The IP address of my database is 10.0.0.5," as an example.
Very early, we asked ourselves, "What would it take to get to an extreme immutable view?" To get to a point where I can deploy an image and, with nothing else, no additional configuration management, no additional convergence run, I can have that discover and connect to its upstream services—API services, databases, etc.—without needing to run any additional configuration management.
That was the original problem: How do you think about microservices in an immutable architecture, considering that these things are dynamic?
One option would be to bake the IP address into the machine image. I can hard-code the IP address of the database, let's say, in my web server's image, so when it boots, that IP is hard-coded.
We didn't want to do that because it defeats being able to have a dynamic, ephemeral infrastructure. If my database fails and it comes up at a different IP, or if I'm auto-scaling a service and IPs are coming and going, that would be a very brittle approach.
The gossip approach
We didn't want to do something that was a hard code. Our first approach to this was, What if we could do something using a pure peer-to-peer, gossip-based mechanism?
The first approach to this problem was building a tool that was completely decentralized. You ran an agent on every node, and we would gossip, in the true sense of a gossip protocol, the IP addresses of the different services to one another, and they could use that dynamic catalog to figure out the upstream and downstream IPs.
That was our first approach, and what we quickly found was it was a bit too unstructured. It didn't have a strong domain model where you have nodes and datacenters and services and there are health checks associated with it.
There wasn't this domain model saying how these different pieces interact with one another. It put the burden on the end user to define all of it. By virtue of being purely decentralized, it was hard to have a coherent view of the datacenter.
At different points in time, each node would find out about a service coming and going. This is a fundamental part of the decentralized nature of the gossip protocol versus, "If we could run a centralized server, we'd have a stronger consistency and a coherent view of what the datacenter looked like at any given time."
The birth of Consul
Ultimately, that led us into the development of Consul.
This was in early 2014, almost 7 years ago. The real focus was on this original service discovery use case. I want to launch a set of immutable images. They're going to have a Consul agent baked on it. Can that Consul agent then dynamically register that note and say, "This instance is now a web server running at IP 1"?
The web server can then discover the IP of the API, of the database, of the cache. This would be totally dynamic, so as nodes would come and go or things would fail or they'd auto-scale, the system would be able to handle it in a fully dynamic, elastic way.
That was the real focus, thinking about this kind of service discovery, this kind of networking automation: How does it work with a microservice in an immutable architecture?
That's what we strove to create in the initial version. Since then, we've looked at Consul as being able to solve a broader set of networking automation challenges, but that was the original inception behind the product.
That outlines the initial vision of Consul really well. We wanted to solve this practical problem of an immutable image coming up and being able to join a cluster immediately to find its database, to talk to any other dependencies it has, to get its configuration, and things like that.
When we looked at this problem, we described it broadly as a service discovery or service networking challenge.
The KV store problem
One of the things we put into the 0.1 release was a key-value store. That really threw people off as to what we were trying to solve.
I feel it was a bit unfortunate for us to put that key-value store in there at that time, because we were in an environment where another tool, called etcd, came out a few months prior, 6 months prior, something like that, which was a distributed key-value store that could be used to solve things like service discovery potentially.
When we put a key-value store in Consul, it drew a lot of early criticism, or just confusion, around how Consul relates to etcd.
In reality, we never thought we were in the same space. I regret putting that key-value store in there because what we were trying to solve was this problem that Armon brought up, which was: Bring up a node, make it available via DNS or the HP API or something like that.
That's where you can start to see the differences. We had built in 0.1 DNS interface to find the services. We had ways to bring configuration via something like the KV into your service. Then, fast forward to today, and we support many more use cases, really focused on solving this broad service networking problem.
We didn't want to build this toolkit where you could build those types of tools on top of Consul. We wanted Consul to be the solution, the final solution that made your service-oriented or microservice architecture really tick and work together.
That's why Consul today has so many of these features out of the box. To talk about all these features in detail, I'd like to introduce Neena Pemmaraju, the director of product management for Consul.