Introduction to Sentinel - HashiCorp's Policy as Code Framework
Jun 05, 2018
Learn about Sentinel, a powerful policy as code language and framework built-in to HashiCorp Enterprise tooling to allow guardrails, business requirements, legal compliance, and more to be actively enforced by running systems in real-time.
Sentinel limits risk exposure by codifying business and regulatory policies to ensure infrastructure changes are safe. Together infrastructure as code and policy as code empower users to safely automate infrastructure management. Watch HashiCorp co-founder and CTO Armon Dadgar illustrate the benefits of Sentinel in this whiteboard video.
- Armon DadgarCo-founder and CTO, HashiCorp
Hi. My name is Armon Dadgar, and today, I want to talk about using Sentinel or policy as code, as a way of making our application delivery smoother. One of the projects that we announced in the middle of 2017 was this project we call Sentinel. The idea behind Sentinel is a more general one. It's this idea of applying policy as code. So, if you're familiar at all with HashiCorp's tools or our approaches to things, we're big proponents of this idea of infrastructure as code.
When we talk about infrastructure as code, the goal is how do we capture the way we're doing different parts of our infrastructure automation pipeline whether that's packaging an application, provisioning, describing how we want an application deployed and scheduled and capturing that as code, right? The idea is if we can capture these processes in a way that's codified, then there are many benefits we get from it. One, we can check it in and use version control systems. So, we get versioning for free.
As we define our application, we check it in. Then, as we make incremental changes to it, we continue to check it in much like we would do for an application. So, this gives us a change history of who changed what, along with commit messages and why they changed and give us a better idea of why our infrastructure has evolved in a certain way. The other advantage we get is the ability to automate this. Instead of just doing our infrastructure as a set of documentation and then manually going in and pointing and clicking or running a series of commands.
Instead, by capturing this as code, we can simply execute at one time, ten times, a thousand times and automate the process of building images or provisioning infrastructure. The other major benefit of it is documentation. By capturing it as code, then it's much more obvious how we're building our packages or what the structure of our infrastructure is once we've provisioned it because we can always go and look at the code to see exactly how this process works and how it's changed over time. These are all of the advantages we get when we think about infrastructure as a codified object and using that approach.
Now, how does this relate to policy as code? One of the challenges we frequently see for customers is, as they start adopting this infrastructure as code approach, how do we take this to scale? When we have a single operator who's using a tool, let's just say Terraform, and they're writing a series of configurations. This individual is probably more trusted and more sophisticated. They're early users. They're just trying to figure out how can my organization adopt cloud. Over time, it goes from one person to maybe a team of people, right? A few additional people come in and these are still knowledgeable, and they're helping expand our usage of that environment.
What happens as we start to try and scale this out to a much broader organizational usage? Well in general, there are three different kinds of concerns that start to come into the picture. One is security. So, what happens as we add more people who can provision infrastructure, who are maybe less familiar with the details of infrastructure? How do we make sure someone doesn't define a firewall rule that allows all traffic from the public internet into our network? They might do this accidentally just because they're less familiar with infrastructure management.
Then we have things like compliance. So, as a large organization, we might be bound to things like PCI compliance, FIPS compliance, GDPR compliance, different regulatory regimes that change the way. Maybe we have to handle data or provision infrastructure or even our change management process. Lastly, we have things like operational excellence or operational best practice. These might be things that we're not legally required to do, in the sense of compliance, but they're things that we learn over time or things that we should be doing.
These can be a whole range of things. For example, if I'm deploying a service, instead of deploying only one instance of my service, I might want to deploy at least two at any given time, so if one of them fails, I have another instance that's alive and serving traffic. So in this way, it's an operational best practice to run at least two of any service, right? Other examples might be, we should use a medium-sized instance on the cloud as opposed to a very large compute-optimized costly instance. There might be many dollars per hour.
These are three totally different types of policies we see in the real world. Now, when we look at how these are traditionally implemented in most organizations, you usually have different groups of people whether it's security or compliance or your operations team that codify these things as a word document, right? So this might live as a Word doc. It might be a document that sits inside of a Wiki. But ultimately, what we're doing is defining this in sort of english-language prose, right?
We're saying, "Here's our security policy that you should abide by at any time you're provisioning something in cloud," or, "Here is a checklist of our operational best practices." What we often see is how these get implemented is through sort of a waterfall process. You might have a series of developers who are writing Terraform code itself, who are then submitting this to a set of reviewers, who are then checking off that we're following these processes.
They will then go through the checklist and say, "Does this thing follow our security best practices and our compliance practice practices and our operational best practices? If so, then great. We will allow them to actually run this Terraform code. If not, we'll go back and tell them, 'No, sorry. You're in violation to try this again.'" The challenge with this workflow is two things. One, it disempowers the end user, right? So as a developer, I'm not empowered to just write my terraform and go out there make a change and solve my problem, right?
I can write this terraform, and now, I have to file a ticket and wait for some group to review it and give me feedback on it. This has the problem of disempowering me as a developer. I'm not able to make a change directly. But it also costs these long queues, right? I have to file a ticket and wait days or weeks to get a response back. And then, as a set of reviewers, our job ends up being very tedious and rote. We have to just review endless amounts of Terraform and continuously re-evaluate against a checklist.
So the idea of policy as code is really looking at this challenge and asking, "Can we apply this same infrastructure as code best practice? Can we go from a world where what we're doing is defining basically a word document, and instead, translate that into a policy language that's actual code?" If we do this, then we can get all of these same benefits, right? We can take this policy code document. We conversion it and evolve it over time. So as we add new operational best practices, we can incrementally refine what the policy is.
However, we also get clear documentation of what these are in a way that's not necessarily as English-language-prose, but the key becomes, we now can automate it. So what does this look like in practice? A lot through an example in one of our applications. The example I've been using here is Terraform. So how we might apply this to Terraform would look something like this. We have a series of, let's say, policy authors, which might be security, as an example, and they're going to commit to writing out their policies in Sentinel language.
So Sentinel is meant to be a very high-level easy to learn programming language. It was specifically built in conjunction with customers who had this problem, and they wanted non- developers to be authoring these policies. So it's designed for security and compliance people, with some familiarity, but who aren't necessarily programming experts to an author. Then we have our developers or operators who are writing Terraform configuration. These are then both being uploaded into a central system, like Terraform Enterprise.
So Terraform Enterprise allows different Sentinel policies to be registered and enforced across all of the different terraform executions in the system. As a security group or as an operations group or a compliance team, I can introduce my policies and have high confidence that every Terraform run is being evaluated against it. Now, as an operator, I still have the cell service I want. I can write a Terraform configuration for a new service I want to deploy. This can be submitted to Terraform Enterprise. As long as my checks pass and I'm not in violation of any of the policies, I'm allowed to push, apply and let Terraform go out and build this infrastructure.
Now, if I'm trying to do something that's out of compliance or out of security, for example, I'm setting a firewall rule that opens up all the traffic, Terraform Enterprise might tell me that I've violated a policy. It might say, "The firewall policy has been violated. Here's the rule that you've broken." Now, I get an immediate feedback loop, right? I don't have to file a ticket and wait days or weeks to get a response back. The moment I submit might change, and I failed the policy check, I'll get that feedback.
Now, I can come in, update my configuration to Terraform Prime. Submit it again. This time, ideally, if I'm in compliance, I'll be allowed to go through and apply it. So this way, we break that challenge we had where we were disempowering our developers and introducing this multi-day or multi-week delay and solving both of those things by applying a policy-as-code approach. Sentinel is a more general framework for doing this, and it's a tool that we've integrated across all of our tools.
So this is an example within Terraform Enterprise, but Sentinel is also available in Console, Nomad and Vault Enterprise as well, looking at really solving similar challenges of policy governance. If you are interested in learning more about Sentinel, I recommend checking out our website and our other online resources. Thank you so much.