The Cloud Operating Model: DevOps, Security, and Networking Challenges & Solutions
Jun 05, 2019
In this video, HashiCorp CEO Dave McJannet shows how creating a central set of shared services that provide automation around operations, security, networking, deployment, and policy governance enables companies of any size to compete against agile competitors.
We talk to organizations of all sizes about their infrastructure plans and specifically how they're adopting this cloud operating model for infrastructure as they navigate this transition to building new applications to differentiate their business.
I'm going to spend a couple of minutes here describing how our products address some of the core challenges around ops, security, and networking in particular—and hopefully it will leave you with an idea for how to address the challenge.
I think we can all agree we're going through this generational shift in the way that people run infrastructure. I often find it useful to simplify—at the highest level—what’s happening. I would draw like this: We're going from a world where we have a dedicated data center—where we all have servers that we run—to this new world which has the notion of cloud. It started off as—okay, let me create a pool of infrastructure on premises, then let me add Amazon to that, then that became Azure. And now we have this multi-cloud reality that we all appreciate to be real.
» Where we'll use cloud: Net new systems of engagement
Now the catalyst for that is pretty simple. Why are we all moving to cloud infrastructure? Well, it's about building net new systems of engagement. Geoffrey Moore uses the terms: systems of record and systems of engagement. Systems of record are those core systems you have in any company. Systems of engagement are the net new applications you're building that leverage the data in your core systems. By and large, your systems of record are going to be on premises.
The catalyst for the shift in infrastructure is largely around the creation of net new systems of engagement that are going to be on cloud. It therefore is inevitable that you're going to be multi-cloud because you're going to have some of your infrastructure on premises and also you're going to have some on Amazon, some on on Azure. But even if you're just going to run on Amazon, you're also going to have some data on prem.
The implications of that are two-fold. Number one is on the tooling you're going to use to run that infrastructure—which has to contemplate an entirely new reality. The second one is on the process—you're going from a very traditional world of tooling based on this static data center, which we would generally refer to as being static infrastructure—to this new world of dynamic infrastructure where the properties are very different.
But as it relates to the people side of the process you're going from a ticket based model—I need infrastructure, open a ticket, you stand it up—to a self-service model. Those two things: Both the changing nature of the infrastructure itself plus the changing requirement for self service—so I can build net new systems of engagement quickly—is causing an entire a re-platforming of how people run infrastructure.
» The 4 layers of infrastructure
I would then decompose it into the layers of infrastructure.
There's the core infrastructure. Think of that as how I provision compute capacity. I'm going from a world where I have dedicated infrastructure—that’s my server, that's your server—to this world here, our dynamic world, which is on demand. I provision compute capacity when I need it.
At the security layer, pretty significantly, a different world. I had the ability to say everything within this IP range here I can trust. So, basically, it's high-trust—and I use IP as the basis of who can talk to whom. If I have an application here I can talk to a database in here. As long as it's in that data center I will let that happen.
I then move to a world of low-trust because I no longer control the network here. Therefore the big shift is to the use of identity as the basis of security—let me authenticate your identity before I give you access to something. I think this concept is relatively well understood in the cloud world. As a result we are all seeing the idea of identity and security being a pretty common thesis.
At the networking layer, I go from what we would refer to as host based networking—this machine has a physical location therefore I can assign an IP address to it because it has a physical host—to the world of service based networking. I may have an application artifact here, I may have a database over here. They are going to move around.
Let me not try and pretend that I can control the IP address of them and move to the world of IP based, host based networking to server based networking in this new world.
So a conceptually a very different world. Your going from—my developers dropped the applications onto dedicated infrastructure here—to dropping it onto a fleet of infrastructure here.
» Solutions for ops, security, networking, and development in a multi-cloud environment
So, these four changes are pretty profound as it relates to how people operate infrastructure—and it's causing the market at large to re-shift the way they think about infrastructure entirely.
Our whole thesis is predicated on this shift—let me decompose the problem in the way that I've done here. In this new world of the dynamic infrastructure model there are ops people, there are security people, there are networking people, and then, of course, there are developers—and they are the ones that are building these net new systems of engagement.
All four of these participants have to figure out how to run infrastructure in this new model. Let me break it down into those roles.
The ops people are having to figure out how to enable people to provision infrastructure on demand across a wide variety of infrastructure types. Terraform is our platform there. Terraform is ubiquitous, in that it is used as a common provisioning mechanism across many infrastructure types—whether that's on premises, Amazon, Azure. I don't know what percentage of the infrastructure provisioned by Terraform users in the cloud happens every day, but it's a significant percentage. Very simply—the way Terraform works is like this: It has the idea of a Terraform core—just a command line tool. And then there's a provider for every environment you want to interface with. So, on Amazon there might be 500 different services that you can configure. On Azure, there may be 300 services you configure. On Google, there may be 200 because it's less mature. On vSphere there may be 200. You get the point.
Terraform allows you to have a common operational experience to configure all the unique services on top of any of these cloud platforms. There are two categories of providers: There are the core infrastructure providers and then there's a provider for every other ISV you want to configure. Whether I want to configure F5 as part of the provisioning experience, I want to provision Kubernetes, I want to provision Palo Alto Networks; there are no less than 200+ third party providers that you can use.
What I can now do is codify the provisioning of maybe the four Amazon services, plus the six dependent services in a single Terraform template. This allows the ops people to say here is your approved template of not just the Amazon components that I'm going to allow you to provision, but also the configuration of all the dependencies on top of it.
This is how people, by and large, are addressing the challenge of: how do I enable this provisioning on demand model across multiple infrastructure types. We're not creating a lowest common denominator of services, we’re creating a common workflow across all these different infrastructure types.
At the security layer we have a product called Vault. Vault allows you to leverage any source of identity to enforce access to systems or applications. Very simply: I’ve talked about the idea of—in the traditional world—I have a client. Maybe it's an application that needs to make a request of a backend system, maybe it's a database. Historically, if you give me the current credentials, I give you the information.
In the high-trust world that we're accustomed to that's a valid assumption and a valid model. In the low-trust model of cloud that's probably not the recommended way. You want to be able to authenticate the identity against some trusted source of identity that you have before giving access to that application or that backend system. Vault inserts itself in the middle and says—let me authenticate against whichever identity model you care about.
And you're going to have multiple because it turns out all of the cloud providers use identity as the basis for how they do security. Amazon has something called the Amazon IAM model. Azure has something called Azure Active Directory. Google has the IAM model. You might use Active Directory or LDAP on premises. You may have an application running on Amazon that needs to connect to a database that's running on premises. You're now having to use identity as the basis for that connectivity but they have different identity models.
Vault lets you authenticate against any identity model that you may care about—Active Directory, the IAM models of the clouds, maybe it's Okta if that's what you use for some of your pieces, maybe it's OAuth from GitHub. It doesn't matter, Vault lets you authenticate against any of them. You can change the routing that the client makes to request of Vault to authenticate against the identity model of the backend system. Once that's approved, the credential is created for you and what's given back is a token to the end user.
So you have a way to centralize secrets management across any application type—and the policy of that is set by the security team. Vault now allows you to use this notion of identity to enforce application to any system or application in your fleet regardless of the identity model being used. The idea of creating a central secrets management platform is extraordinarily compelling in this new world.
There's a secondarily—derivative—use case once people are using Vault. This first one is around centralizing secrets. The next one ends up being around encryption. Because every application has to come into Vault to get a token to authenticate before it's given access to a backend system, I can now enforce a policy that says—now encrypt everything in that flow.
There's no change to the application—it’s just a setting that the security team is applying to enable encryption of anything in the cloud fleet. If you can centralize secrets and credentials management and you can encrypt all data at rest and in flightin this cloud world you've gone 99% of the way of addressing the security challenge.
The third element of our portfolio is something called Consul. Consul, believe it or not, is our most widely used product. Consul allows you to address the challenge of service based networking. So, in the traditional world, when I dropped my application artifact into my environment, I would've said—that’s a database, this is an app server. You're saying assign an IP address to it.
In order to establish a static IP address I would normally need to put a load balancer in front of every one of these things that are deployed—then you end up with hundreds, if not thousands of load balancers to create that mechanism. Then I assign 184.108.40.206. and 220.127.116.11.
What Consul does—rather than requiring you to do that—is create a common service registry. Then, when an application or an artifact or an element gets deployed, it has a little client library with it that gets discovered. Now Consul knows where everything is and it's doing it, not by IP address, but by name. So that's app server one. That's database one.
Consul has now created a common service registry as to where everything is. Now, I can interrogate Consul to be able to route traffic. Rather than having to have people install expensive load balancers—which are expensive financially but also more expensive in terms of application traffic requirements—I can route traffic directly between the services rather than going north-south for every application request.
It doesn't eliminate the need for load balancers completely but it certainly reduces the number of load balancers that are required. Consul is—number one—a common service registry, but then the derivative use cases of that are now I can use that for routing. Number two, I can now use that for service segmentation.
This is where people start to talk about the idea of a service mesh. The service mesh lets me declare who can talk to whom—and, when they talk, enforce encryption between the services when they speak. Very conceptually similar to the way that Vault thinks about changing security. It doesn't try and mimic was what was done in the old world, but it says what are my key problems? Let me try and solve those.
Consul, in the networking world, lets me think about the key problems I have, which is—where is everything? Then—number two—it lets me start routing traffic based on the location of everything and lets me now enforce mutual TLS across all the services that I'm going to have.
These end up being the three core elements that enable all the different kinds of applications you have in your environment. You have some Java apps, some C# apps, some Cloud Foundry apps, some Kubernetes apps, some container based applications. The problem here is one of scheduling.
Our fourth product is called Nomad. Nomad is an application scheduler that says—whether it's a C# application or a VM, or whether it is a Java application or a Cloud Foundry based application, let me schedule that application across the fleet. So that, if you're going to spin up all this infrastructure, at least the binary packing is being done in a way that best schedules the work across the fleet that you're now paying for.
» Speed and compliance
These are—essentially—the core concepts. Number one is the shift from static to dynamic infrastructure, and a double click on what that means at each layer. Number two—this is how our portfolio works. These are individual products that meet the needs of each of these four elements I've described.
Now, the third thing—I think the most common thing—is how do I solve this challenge of self-service as I'm going cloud? And what most of our customers do, they start drawing the picture and say I have this application element, or this application artifact, and I need to figure out how to get it to here: That's cloud number one, that's the cloud number two, that's maybe cloud number three.
What they have to do is say—rather than deploying the application artifact itself, I need to satisfy the needs of the ops, security and networking teams so that when that application gets to the cloud environment it's safe, it's going to see traffic, and it's not super expensive. I have to solve the needs of the ops, security and networking people. To do that, most of our customers end up creating a central, essential shared service of Terraform, Terraform Enterprise, Vault Enterprise, and Consul Enterprise.
Terraform allows them to create this set of templates that, as long as someone uses, they have essentially got my implicit authorization for that application to go to cloud. Now you can provision it a hundred times a day if you want to because I know that it's got all the application elements that I, as an ops person, require.
Number two, if every artifact comes to Vault to get its credentials then the security team is going to say great—now I'm not worried about your hard-coding credentials in your applications and you've also given me a control point for all encryption that I might want to do in the future. Then I'm gonna let you go.
Number three, if you include Consul client with the artifact that's deployed, that Consul server will discover it and I can start routing traffic immediately. So those three elements become the consistent pieces, whether it's a Java app, whether it's a C# app, or a Cloud Foundry, or whatever it might be.
To solve the needs of security and policy, the commercial versions add a common way to do a policy. For example, in Terraform, nobody can provision after 5:00 PM on a Friday. If it's not 5:00 PM on a Friday, then yes, let it go. If it is, don't let it go. It also requires the idea of governance. These are highly regulated companies for the most part—therefore, the need for audit trails. Who did what? How did it go? Who was allowed access to be able to create things like ACLs?
If you overlay a central set of shared services with our commercial versions that also address the needs of policy and governance, you've created the industrial process for delivering new applications quickly on the cloud infrastructure—that both acknowledges how different the realities of this cloud world are and creates the ability for self service. By creating a central shared service within their organizations, Global 2000 entities can deliver applications the same way that the smaller startup community does.
» Learn more
I hope this gives you a sense for how we think about the evolution in the world of infrastructure, and, specifically, how the HashiCorp products enable this cloud operating model that frankly underpins many of the applications we use all day, every day.
We have a white paper that you can take a look at called [The Cloud Operating Model](https://www.hashicorp.com/cloud-operating-model) on our website. I encourage you to download that. Or, if we can help more directly, we're certainly available and eager to whenever you're ready.
To start learning how to use HashiCorp tools, visit our HashiCorp Learn website.