Security in a world of multiple clouds
In this brief video, Armon Dadgar, a founder and co-CTO of HashiCorp, explains how to secure your infrastructure in a dynamic, multi-cloud world. The key: Tools that adopt the zero-trust network philosophy.
In the traditional data center, primary security came from securing the perimeter. We typically used security middleware, such as IP firewalls, web application firewalls (WAF), and intrusion detection systems (IDS). The idea was to create a zone of trust inside the data center.
A typical refinement to this model was to subdivide the high-trust zone, using virtual LANs or software-defined networking—often segmenting it by line-of-business.
But when we move some of our workloads to a public cloud, things become more challenging. This becomes even more challenging when you introduce multiple cloud services or cloud zones.
In modern environments, it's better to assume that there is no perimeter—or at least it's better to not depend on one. This is also a good way to counter insider threats.
In this new world of zero-trust networks, Armon suggests three important aspects for success:
- Good secrets management: don't hard-code service passwords
- Segmentation: don't segment by network, but do it by service
- Data protection: encrypt data at rest
In his brief whiteboard video, Armon describes how to achieve all three.
Founder & Co-CTO, HashiCorp
Hi, my name is Armon Dadgar, and today I wanted to spend some time talking about multi-cloud security. I think what's helpful is first talking about what security looked like in the traditional private data center. And oftentimes, what we see is a perimeter-centric approach.
So we start by saying we have these four walls, and we're going to constrain all traffic over a moat. And over this moat we're going to deploy a bunch of different types of middleware to secure it.
So we're going to employ things like our firewalls. We're going to deploy things like WAF. We'll deploy our IDS. So we have a host of different security solutions that we're deploying over the central ingress and egress point. And what we're really trying to do is establish a difference between the outside, which is a low-trust zone, and the inside, which is a high-trust zone.
[01:00] And we're asserting that anything that makes it through all of these appliances, all of our policies and checkpoints that we have imposed here, is trusted and should be on the inside. And so we make this clean distinction, and now we can say anything that's happening in here is privileged and trusted.
Now, if we get to be an even larger organization, we might want to reduce our blast radius. And so what we'll often see is a segmentation at a relatively coarse grain, where we might use something like VLANs or even software-defined networks to split this one network into multiple pieces—each of which maybe has dozens or hundreds of services corresponding to, let's say, a line of business.
But the challenge comes when we start to go to a multi-cloud architecture, because we start transitioning from a world that looks like this to now bringing in multiple different landing pads.
So we might have, for example, a region in AWS and a region in Azure, and then we're connecting all of these things together with VPNs or direct peering relationships.
[02:00] Now, a few challenges start to come up right away:
The first one is we start really stress-testing our assumption of having a perimeter because these environments don't have four walls. We are one API away from any node in these environments being able to receive or send traffic out to the Internet. So all of a sudden our notion of this fixed network perimeter is really, really being challenged by these different cloud environments.
Additionally, when we really look at these environments and are honest with ourselves, very rarely was there only a single ingress/egress point. There were the shadow points, VPNs, connections back to the corporate fabric, things like that, that allow traffic to flow around the central ingress, egress point.
One of the things that immediately starts to become relatively challenging is this idea that we actually have a perimeter! Instead we have to start to look at what would it look like to not have a perimeter, or not necessarily depend on our perimeter as being our primary line of defense.
» Insider threats
The other thing that's really shifting is an acknowledgement that most of our breaches happen from the inside. There's two really interesting illustrative examples.
[03:00] The first one—when we talk about the lack of a perimeter—a really interesting attack was a very publicized one where the customer production set of credit-card data was breached through an HVAC system. And on the one hand, you might ask yourself, how could the air conditioning of a store lead to the breach of central corporate credit card information?
Well, in many ways a retail store looks kind of like this. It's another site that's connected over a VPN into our backend data center, and it turns out there's many doors to our physical store. And the way the attackers broke in was a weak wi-fi password that allowed them to hop on the network. That retail store network was connected back, but the assumption was that we have this high trust, high integrity network.
[04:00] So once the attacker was on, they were able to get access to this database and exfiltrate data back out. This is an example of a type of attack that becomes possible because we're depending on the strength of this large perimeter, which is in fact permeable.
The other perfect example is looking at something like the Snowden documents. When we look at Snowden, he was a contractor of the NSA, so he had VPN credentials. He had an ability to log in and access these privileged systems. So this is not an attack from the outside of the organization. This is an insider threat.
» A question of trust
As we start to look at these types of challenges, what we see is an emerging philosophy of what we call zero trust or low trust networking, right? And so what it means is instead of saying being on our network implies some level of access—some level of trust—what we're really saying is this implies little to no access to systems. By virtue of being on my network, you can't actually access anything in addition to someone who's on the outside of the network.
[05:00] Once we start to take this approach—this philosophy to security—what starts to change? Well, I think there's a few really critical things.
» 1. Secrets management
I think the first one that becomes essential is secrets management. And so when we talk about secrets management, what we're really talking about is managing all of the credentials that give us access to privileged systems. So an example of this is maybe I'm running a web server over here, and my web server has an ability to talk to a database. That database has a username and password that is governing access to it. And historically what we would see is relatively sloppy handling of these credentials. They might be hard coded in plain text in the application. It's in plain text and configuration files. It's in plain text in our version control systems. And the reason this was okay is because we said the database is inside this high-trust perimeter, so it's okay if these credentials are easily accessible, because you can't get to our database—you're not on our network.
[06:00] The challenge is once we take this assumption of saying, you know what, maybe you can get in our network. Maybe there is a way on our network, and we don't trust the perimeter as the ultimate line of defense, and maybe you're someone who is already on the network, right, maybe you're an insider. Should it be this easy to get access to our database password?
And what becomes clear is what we really should be doing is protecting these secrets much more carefully. We should encrypt them. We should have access controls around them. We should be auditing who has access to credentials, and it really should only be given out on a need to know basis.
» 2. Segmentation
The other thing that becomes essential is the ability to segment traffic, or service segmentation. So when we talk about traditional segmentation, we're really going back to this picture of having relatively coarse-grained buckets, right?
We're using VLANs or SDNs to have these large line-of-business buckets that are dozens or hundreds of services. But now our challenge is—as we go to say AWS and Azure—we have different segmentation capabilities. We have Security Groups up here in Amazon. We have Virtual Networks down here in Azure.
[07:00] Our challenge is our VLAN is not interoperable with our virtual network, our virtual network is not interoperable with our security group, and our security group doesn't know anything about our VLAN. So now, how do we start to think about the segmentation of our network across these different environments? Because now I'm going to have an application over here that's calling into a database over here, and how am I enforcing that access of who is allowed to talk to who?
So what we'd like to be able to do is push this up a level and describe our rules at a more logical level. We can say our web server is allowed to talk to our database. Our API server is allowed to talk to our web server. And what we really want to focus on here is thinking not in terms of IPs.
We're not saying IP1 can talk to IP2 in a host mode. Instead, we're talking about identity. The service that's identified as a web server is allowed to talk to the service that's identified as a database. There's a few advantages of this.
[08:00] One is, this unit is scale-independent. It doesn't matter if we have one, ten, or a thousand web servers, it's the same rule. Unlike a firewall rule, which is IP based where it does matter. Those rules are sort of exponential of the size of my infrastructure. The flip side is: not using IP as our unit of management frees us from thinking about how we segment out at an IP level.
It doesn't matter now if we flow over a VPN or an app, that's rewriting the IP address. Where if we were doing an IP based approach, any rewriting or translation of the IP breaks our security controls.
So how do we think in terms of service-level of segmentation and not network-level segmentation? The other challenge with the network-level segmentation is we don't control our network anymore. We can't change the Amazon or Azure networks to support our notion of VLANs, so we have to work at a higher level, and bring in this notion of a service segmentation.
» 3. Data protection
[09:00]The final challenge is how do we think about data protection? So previously if we look at this example, our web server was very likely to be writing its data to our database unencrypted, because again, going back to the security assumption that our perimeter is keeping the bad guys out, and we're assuming everyone who works for us is trusted. So it was okay that we wrote unencrypted data to the database because you couldn't get to it to read it.
Once we change these assumptions, you know what? Maybe I don't trust my database operator. Maybe they represent an insider threat, and they could go read it out of the database. Or an attacker might find their way onto my network.
All of a sudden I care a lot more about protecting the integrity of this data. So what becomes essential is data that's at rest—whether in database, object store, network file systems—all of that data should be encrypted.
» How to implement multilayered security
The advantage this now has is: you now need a multifactor compromise. It's not enough just to get access to my database and be able to read the data. You must also be able to break my encryption algorithm or get access to my underlying encryption keys to be able to decrypt that data.
[10:00] So how do we add more barriers to make it more difficult to exfiltrate data or leak our data out? So these become essential.
The first two are, in many ways, simpler to implement because we don't necessarily have to modify our applications. We can wrap existing applications and just change the source of their secrets configuration. We can change then undo the segmentation transparently to the application. Well, this in some sense is the final step.
If we talk in terms of the maturity curve, there is this linearity here where these are sort of easier and data protection and requires more integration and awareness from an application level. In many ways these are not new challenges. So why do we need kind of a new way of thinking about it? Why is there a new approach?
It really comes from when we talk about traditional tooling. The traditional tools looked at a static infrastructure. They were designed to operate in a world where we had relatively little churn, and we operated in a private data center. What you'll notice is most of these tools are IP based.
[11:00] Whether we talk about traditional privilege access management, or we're talking about technologies like firewalls, their unit of management is this IP, which is a difficult unit to manage if you have dynamic scale, or a lot of variability, or operate a multi-cloud setting. What we're seeing is a transition into more modern tooling, focused on a more dynamic world.
As we move to a dynamic world, where we expect infrastructure to come and go, we're using containers and much smaller units of management. We need a different approach because IP becomes too painful. Instead, it becomes a focus on service identity: Mapping back to the identity of the service and saying the web service is allowed to talk to the database, not this particular IP address.
As we talk about going through this transition, some of the tooling that we think is interesting and things that we provide, one of them is our Vault tool. Vault is a tool for doing secret management. It provides a central location to encrypt an access control and manage our secrets as well as provides an encryption as a service capability for developers.
[12:00] As we give developers the mandate to encrypt their data in transit [and] at rest, how do we give them a facility to do this? The other tool is Consul, our service-mesh tool. It really looks at how do we do the service segmentation? How do we allow a networking or firewall team to define these high-level rules and then decouple that from application and operations teams—who are deploying applications but don't want to be bound to rigid process or a rigid IP based controls that take days, weeks, months to make changes to.
So these are two key areas. I hope this is a useful high-level introduction into the challenges of multi-cloud security. I would encourage you to check out our online resources and learn more about Vault for any of those challenges, or check out our online resources and learn more about Consul.
Thanks so much.