Network Modernization in a Multi-Cloud World With Consul
Jul 11, 2019
Armon Dadgar opens the 2019 HashiConf EU keynote with a review of our multi-cloud, multi-platform reality and explains where Consul's service mesh networking capabilities fit in.
In this segment of the HashiConf EU 2019 Opening Keynote, HashiCorp co-founder and CTO Armon Dadgar shares the growth statistics of HashiCorp and explains how we see the evolution of IT operations. As new technologies evolve (such as containers and serverless) and encapsulate old technologies (such as VMs and mainframes), the network infrastructure is the key component that ties everything together and allows these dissonant technologies to interoperate.
Learn what HashiCorp's Consul service mesh does to bring these old networking practices into the modern IT era, where applications are service-based and infrastructure is dynamic and heterogeneous (cloud & on-prem).
Founder & Co-CTO, HashiCorp
Good morning, everybody. I know Mitchell beat me to it already, but I want to welcome you all officially to HashiConf 2019 here. It's super exciting for us to be back here. Amsterdam has become a bit of a second home for our conference. This is now the third year that we've done it in Amsterdam. Last year we did it literally I think just across the street from here. HashiConf 2016 was actually the first time we did an event in Amsterdam and it was literally at the transformer house.
It's been fun to watch the growth of the European user community. At the time, this was the first conference that we did in Europe, and we were thinking maybe we could get 100, 200 people to show up, and we were pleasantly surprised we had about 300 folks who made it to the first event. As you can see, we've grown a little bit since then, wouldn't quite be able to fit into the old space. Today, there are a little bit over 790 people, 800 people joining us in total, and then another few thousand people are joining over the live stream.
What I'd encourage you to do is use this as an opportunity to make new friends, make new connections. For many of you, I think this is probably the first HashiConf. I'm curious, a quick show of hands, how many have ever made it to a HashiCorp event? Okay. Then, show of hands: first conference for any HashiCorp event? Okay. This is the first time for a lot of folks, so I'd encourage you to use this as a good opportunity to meet the new people, share ideas. I think many of you will find maybe you've used Terraform, but someone else here has used Vault and you can learn about that, or one of the other tools, so this is a good opportunity to get together and learn from folks in the community.
» HashiCorp Community
I think what's been fun is, when we first did our conference in 2016, there were no employees based in Europe. We were probably only 30 people at the time totally based out of the States, and today there are now 85 European employees out of about 600 in 6 different countries. At the same time, the user community has grown even bigger than that, so one thing we like to talk about and encourage you to join is the HashiCorp user groups. These are small Meetup groups that are local, they're in your communities, and they're really an opportunity to just get together on a regular basis monthly, quarterly, and just talk about how are you using the tools, how are other people using them, what are the use cases, and it's an opportunity to network in person and get some of that expertise. A few years ago was the first hub based out of San Francisco, and today there are a little over a few hundred chapters. There are 28 just in the Greater European Union, in 16 different countries, and about 25,000 members around the world.
» HashiCorp Diversity Scholarship
Another thing that we're really proud of, we launched this a few years ago, is the HashiCorp Diversity Scholarship. I think one thing that we're aware of is not everyone has the privilege to attend events like these, and get to have their travel paid and their tickets paid and get to come and learn from these types of venues, and so our goal was, "How do we bring a diverse community of people who don't necessarily get to attend?" So, we created a scholarship to bring in folks from both regionally within Europe, but also around the world, people who want to otherwise be able to attend HashiCorp events. So you'll find people among us who are here because of the Diversity Scholarship. Again, just use this as an opportunity to meet new people, not just the folks you know from other events.
» HashiCorp's ethos: Everything is going "multi" – Support it all with one workflow
With that, I want to switch gears into talking about, "What's the HashiCorp mission? What's the ethos for us? What do we focus on and believe in?" and I think for us I can boil it down to one common view which is: everything is going multi. When I talk about everything going multi, I think obviously one that comes to mind is the world going multi-cloud, this is maybe the most obvious version of the world going multi, and I think this is going to not be a surprise for most of the people here.
I think over the last few years, we've seen the world go through a few iterations of this. I think cloud in its single form is not really new, I think Amazon EC2 launched in 2006. I think we've had this notion of public cloud being around for a while, but I think the long-time view was, "Hey, small companies and startups, they're going to use Amazon. Large enterprise businesses, they're going to stay on their traditional on-premises infrastructure, and Amazon is the only cloud," and I think we've seen that evolve over the last few years where it's not just Amazon. There are many very credible clouds, including Azure, GCP, AliCloud, etc., and I think the view from large enterprises has also changed. In 2006, it was sort of inconceivable that the enterprise would use the cloud. Now, when I think you talk to most large organizations, they all have a public cloud strategy and a multi-cloud strategy, so everyone is now adopting everything.
Beyond multi-cloud, our view is that things are going multi-platform. It used to be that our applications were largely homogenous monolithic applications running on VMs on-premises, and that's changing, right? I think a few years ago we saw the rise of Docker, which quickly led into container management platforms like Kubernetes, and now we have applications that might span traditional VM-based infrastructure, as well as more modern Kubernetes container-based systems; but already, before we've even finished with that adoption process, we're already seeing the frontier shift to serverless platforms. Whether we define serverless as function-as-a-service (FaaS), or something like Lambda, or more of a container-as-a-service model... Fargate and Azure container instances... or a source model like Google Cloud runs, I think broadly you can think about this as a next generation of platform.
Our view is that no one of these things will ever be all of your application; you're going to mix and match. There's some set of application, traditional database, that's going to fit best on virtual machines; some set of application that works best on containers. Maybe your new integrations that are lightweight, those will live on serverless. So, our view is all of these are going to be a part of your infrastructure; it's never going to be one or the other.
The last piece of our world view is that things are going multi-service. Historically, we might have stuck with a monolithic development pattern, had a relatively large single application, and for most users that's still the right place to start. When you're starting a new project with a new company, start with a monolithic app. But over time, as the complexity grows, as the team size grows, that application is going to get split into many different services.
We might break up our frontend from our intermediate API layers and from our backend storage layers, and this lets us reuse and mix-and-match different functionality. As we do this, there's a whole world of operational challenge that we inherit. Now, we have to worry about, "How do these services connect to one another? How do we secure it? How do we deploy and manage this much more complex application where bits and pieces of this might run on different platforms, different clouds?"
In our world view, there are four major groups when we think about delivering an application:
For each of them there's a set of assumptions that changes as the world goes multi.
Operational challenges in the "multi" world
I think for operations, we used to live in a world which was very consistent; it was dedicated servers, largely physical and virtual machine—that's going away. Now, we're going to have capacity on demand. We're not going to have 100 machines that we bought and racked and stacked. We're going to scale up and down as needed, and it's going to be a mix of different types of infrastructure: virtual, container, function, etc.
Security challenges in the "multi" world
For our security teams there's a total shift in their model. I think historically the security models are what I like to call "castle and moat." We have the front door that we brought all of our traffic over, and we put our firewalls and our web filters and our intrusion detection systems there, and we basically asserted that the outside was untrusted and insecure, but the inside was trusted and secure.
So, "The castle walls defended us and we trust everyone on the inside," but as we go through this shift to multi-cloud, multi-platform, multi-service, it gets much harder to secure that inside. It's not clear where the walls begin and end because I have now five different data centers and five different cloud environments. What's the front door? All these things are connected to one another; traffic is going back and forth.
As we transition, we start to lose this notion of a clear perimeter. It's not obvious anymore what the traffic flows are, and so we've changed our security model from saying, "Everything inside is trusted," to instead, "We're going to assign an identity. Every application, every service, has a unique identity, and we're going to use that to govern access." This is the notion behind zero trust networking, or identity-based networking.
Networking challenges in the "multi" world
Then, when we talk about connecting these different services together, traditionally we took a very host-oriented approach. We had a load balancer that pointed to the set of machines that were web servers, or pointed to the set of machines that were databases.
But now as we transition into the multi-platform, multi-cloud world, what does it mean if we don't even have a machine running behind it? How do I point to a Lambda? Maybe it's not even running. Until I invoke it, there is no IP. There is no static machine running there. So, the whole notion of how I think about routing and networking has to reorient not around host and machine, but around logical service. Maybe until I invoke that service, it doesn't even exist.
Development and deployment challenges in the "multi" world
Lastly, the way we think about managing and deploying our applications has changed. Historically, we thought about physical dedicated infrastructure—these 50 were web server, these 10 were database—to now saying, "You know what? I have a 100-node Kubernetes cluster. I don't really know what's running on any particular machine. It's dynamic. It's ephemeral. Things move around. We're going to schedule across a dynamic fleet rather than statically managing many different smaller fleets."
» One workflow for the "multi" world
I think if you zoom out and look at the cloud landscape, these problems have all been solved with a variety of different tools. Within each cloud, within each environment, there are different approaches to this.
Whether if we're talking about provisioning, I might use CloudFormation in Amazon, I could use ARM in Azure, I could use Deployment Manager in Google; each of these provides a way of doing an as-code provisioning process. But if I'm operating in a multi-cloud, multi-platform world, I don't want to have to be exposed to the full complexity of this entire matrix. I don't want five different tools for the same workflow, and I think that's really the ethos behind the HashiCorp portfolio.
Our view is that these problems exist across all these environments, across all these platforms, but how do we have a common workflow, a common way of doing it, whether it's provisioning, whether it's identity-based security, whether it's networking, whether it's the way we deploy our application? That's how we think about the tools in relation to each other and the clouds themselves: To provide that common way to operate in this environment.
» The common thread: Networking
I think one of the interesting things when we talk about this mixed platform, this mixed mode, is that in some sense this is not new. If we got back as far as you want to go back, at mainframes and then scrub forward from there, you'll find organizations that still use all of these technologies. Go to any large bank, and at the center sits a mainframe, around that is the bare metal, around that is the VMs, around that is the Kubernetes environment, and around that will sit the serverless stuff; it's like rings in an old tree.
When you talk about what's made this possible—how we've been able to mix and match 40 years of different technologies,—it's ultimately about the fact that there's a common network. The new serverless application can communicate with the mainframe because they both speak TCP; they both get an IP address, they both speak a common protocol, and so in this way the network has been the common denominator. It's been the piece that's allowed all these technologies to interoperate, and shielded the fact that it doesn't really matter if one app is in a container and one app is in a VM as long as it has a network IP and it speaks the right protocol.
I think there's always been three key challenges that we've pushed into our network and thought about as a networking problem: one of those is naming, one is authorization, and one is routing, so we'll touch on each of these quickly.
When we talk about naming, it's really about, "What is the address, what is the logical name, my downstream uses to talk to my upstream?" In this case, my downstream web server wants to talk to my upstream authentication service. What's the name that the Web server uses to reach it? What is the hard code of the URL string?" and I think historically the approach was static IP. In this case, the .51 IP would be the name that the web server is using to reach the authentication service.
What's common is you end up using something like a load balancer or a virtual IP to do that sort of disintermediation, so .51 isn't actually an instance of the authentication service, but it's taking us to a load balancer. The load balancer is then acting as the name, and it's connecting back to one instance of the authentication service. This is a common problem.
Another common challenge is doing authorization. The way we typically see this solved is putting something like a firewall into the network in between the path. So, if I want to have authorization around who's allowed to talk to this service, and I put a firewall in front of it, and then manage a set of rules, so I can say, ".51 is allowed to reach .104," and the firewall now is acting as an authorization function.
The final piece is routing, and routing is a big world. There might be many different reasons you're doing routing. You could either be doing it for load balancing; maybe I've got a lot of traffic, and I need to run 10 instances, and I want traffic to split across it. I could do it for high availability; maybe I have two instances in my database, my primary and secondary, and I don't want to overwhelm one of them, or I'm doing it as a failover mechanism.
The other use case might be more sophisticated things like canarying of traffic; in this case, maybe I have two versions of my app, version 1 and version 2, and I want to send some percent of traffic, maybe 1% to version 2, to understand, "Does version 2 work? Are there bugs? Is it performing the way I expect?" and then if that looks good, I can crank up the amount of traffic that goes to version 2. So, all of this I've loosely put into the same category of routing.
Here's where you would again see a pattern like putting a load balancer in the middle. What this allows us to do is not burden the downstream with the logic of this sort of routing. I don't want my web server to have to know about traffic shifting to its upstream, or dealing with multiple versions, etc. We shield it, let it talk to a load balancer, and move that logic out of the application. This is common.
We look at a traditional data center where you'd see it's a very clean north-south path. Internet traffic comes in from the north, it flows through our perimeter firewall; this is how we make sure the untrusted bad things stay out, and the trusted good things are on the inside of our four walls. From there, we probably go to a frontend load balancer, which brings us out to our web servers, they do some processing, and ultimately are reading and writing data from a backend database. The backend database is probably shielded as well with a firewall that prevents services that shouldn't reach it from talking to it. This is the traditional data center, and here we have to probably manually manage things like our firewalls and load balancers.
Now, as we extend that and go to multi-data center, you might say, "Okay, now I have a cloud region that I'm going to add onto this," that needs to connect back to on-premises with some form of network technology—whether it's VPN, SD-WAN, Dark Fiber, etc., and there's some networking link back between these zones—and I might adopt a newer platform. This goes back to the multi-platform theme we see—one might be my more legacy environment, one is where I'm adopting Kubernetes for my new microservices, and so there's going to be some new databases that exist in the cloud. My new services might use a key-value store, I might use a graph store, etc., but very likely they need to communicate back to the existing on-premises environment. I need to reach back to old data, old services, and so I need to manage names in a way that is portable between environments, I need authorization in a way that's portable, routing in a way that's portable, and this problem only gets worse as we get bigger.
The bigger we get, we start adding more datacenters, an explosion of network linking in-between them; it's not just one VPN tunnel now—now you end up with 8 different tunnels or 16 different tunnels as you start scaling... and even more platforms, so the complexity grows. It's in this environment that we really designed Consul. Our goal, when we talk about Consul, is really talking about solving the service networking challenge. These are the key challenges.
» Networking maturity model
Within that realm, there are two key ways to think about Consul: one is as a traditional service registry and discovery mechanism, and one is as a service mesh; and when we talk about this, we don't think about them as being this binary distinction; it's not one or the other. I think it's part of a logical maturity journey of the network.
I think if you look at most networks, where most users start is centralized hardware, running on premise, with a manual process around it. I file a ticket, and someone logs in and updates my firewall or updates my load balancer manually. From there, I think the service discovery journey is really about automation of that process; so I can keep my centralized hardware, but now I move away from manually managing the load balancer, manually managing the firewall, to allowing it to be automatically updated based on logical rules. From there, as we say, "Great, I want to adopt a public cloud environment. I'm not going to ship my hardware device up to Azure."
So, how do I change my approach? I move from a hardware-based approach to a central software approach. Think about running NGINX as your frontend load balancer instead F5 Big-IP. It's fundamentally the same sort of logical model; it's a centralized appliance when we're moving from hardware to software. How do we still do it in an automated way?
I think the final evolution of that is moving away from a central appliance and pushing that out to the endpoints of our network, and doing it with a distributed software approach, and that's what becomes a service mesh. In this way, we don't think about it as a binary one or the other; it's part of a journey. In fact, you're probably going to have a mix of all of these things in your network at the same time.
» Modern naming
When we talk about service discovery at its core, it's really about using a name as a name instead of an IP as a name. As a web server, instead of using the .52 address or the .104 address, we would use a logical name like auth.service.consul, and then that's getting resolved dynamically into a physical address. In this way, we can scale up and down, we can have different IPs; if that instance fails, we can resolve it to a different node, and so it's really about moving the naming functions of something that's dynamic.
» Modern authorization
When we talk about the network automation piece, instead of managing those static firewall rules that said .51 can talk to .104, we instead manage a logical rule. We say the web service is allowed to talk to the authentication service. The set of IPs, though, are dynamic, so we don't know which set of IPs are web services because maybe we're scaling those up and down, and same thing with our auth service.
What we do is allow Consul to act as that dynamic registry and know what set of IPs are web, what set of IPs are auth, multiply those together, and that becomes your firewall rule. If I boot a new web service, I scale it up and down, I have it on the autoscaling group, etc., now the firewall can be updated on demand. I don't need to file a ticket until I manually add the new IP rule to automate the flow in the middle of the network.
When we get to the final version of this, what we're really doing is pushing that functionality to the edge of the network. What we might have is something like Envoy running within our pod, within our VM, as a sidecar to the application itself. So what happens now is the web server, when it wants to discover and connect with upstream, is really talking to the local Envoy instance, so the name is always local. The local Envoy is doing the routing and it's determining, "Where's my upstream? How do I get to it? What percent of traffic should go where?" and then the upstream Envoy is responsible for authorization. When the connection comes in, it determines whether or not that client is allowed to talk to it. In the same way that network authorization function still exists in the network, it's just been pushed to the edge.
» The goals of HashiCorp Consul
Summarizing at a very high level, I think the goals for Consul are threefold: we want this to be pluggable with your deployment process, your CI/CD process, the way your networks should be automated, but we also want to acknowledge that the world is going to be multi-platform and multi-cloud. It's very unlikely that you'll ever be able to standardize on one technology that's forever future-proof. The other side of this is really thinking about, "What's the orientation of the network?" I think if you ask some vendors, the network is the center of the universe and everything else revolves around that; our view is the opposite: the application is the center of the universe, and everything else revolves around the application."
How do we get the network really to be in some sense a detail? What we want to do is manage the high-level intentions of, "Great, the traffic should be allowed," or "We should do a failover between data centers," and then the network should just be automated to support that. As apps come and go, get scaled up and down, that becomes a detail that gets automated away; it's not the primary focus.
With that, I'd like to introduce Mitchell Hashimoto, Co-Founder, to the stage to talk more about the Consul updates we have for you.
Watch the full keynote for more announcements and case studies from HashiConf EU 2019.