Presentation

What is HashiCorp Sentinel: Policy as code and automated policy enforcement

Published 7:00 AM UTC Sep 19, 2017

Mitchell Hashimoto on Sentinel, HashiCorp’s new automated policy enforcement framework.

Automation is incredibly important to the continued growth of cloud infrastructure and services. But there remains an unsolved problem: Automated Policy Enforcement.

For example:

What if you accidentally requested 1000 instances instead of 100?
Do you really want to permit changes to critical service configurations outside of normal working hours?
Are you sure all your services have associated monitoring and health checks?
Do all TLS certificates conform to security policies, such as key length?
Do all services have a billing-entity tag?

These are examples of issues that would normally be solved by human intervention, but that’s impractical in an era of automation. Enter Sentinel—an enterprise capability that enables automated policy enforcement.

Sentinel helps you implement fine-grained policies that are repeatable, versionable and testable. You can think of them as “safety guardrails.”

The Sentinel framework is embedded into the whole HashiCorp stack. In the same way that the stack implements infrastructure as code, Sentinel implements policy as code.

Speaker

Mitchell HashimotoCo-founder, HashiCorp

Transcript

Thank you very much Armon. Those were a lot of exciting updates, a lot of great things, Consul going 1.0, a UI available for everyone in Nomad. Really, really exciting stuff. I hope what you see is that every product is growing and maturing faster than we've ever been able to do it before. We're really excited to have been able to share with you some of these exciting new features.

What we're seeing really is that the growth of infrastructure and applications has been really enabled in a big part by the ability to automate more effectively. Going on dated timeline, back to configuration and code and the rise of virtual machines, those things enabled fast, automatic machine set up that we didn't have before. Then cloud providers, infrastructures, code and tool such as Terraform are enabling fast, automatic infrastructure creation that was difficult to achieve before. Trends like containers and microservices allow us to much more easily package our application run times and get them out there. Then schedulers have enabled automatic application placement, deployment, as well as operational tasks such as rolling deploys and so on that allowed us to just deploy a much larger number of applications.

Automation has brought all these really amazing capabilities that were simply not really possible before. However, it's also brought a number of challenges. For example, in the age of VMs and manual procurement, if you place an order for 5,000 new machines, a human would probably call you and say, "Are you sure this is the correct order?" Or if you tried to deploy an application that was requesting all the resources of your biggest machine, again, someone would probably stop you and say, "Is this what you truly meant to do?" Today with automation, these things could just happen and they happen really, really fast.

Today, we want to a way to enforce problems such as forbidding changing configuration outside of business hours. This is a pretty easy way to introduce instability, page people, have people work at times you don't really want them to be working. It's something you want to check. Or ensure all services have associated health checks. We want to make sure that every service we deploy can be monitored at least at a minimal level before it's actually running and serving user traffic. Or verifying that all the TSL certificates that we issue are encrypted with a key size of at least 2,048 bits just to ensure some minimum expectation of security. Or perhaps to ensure that all instances we log in scenarios where we have multiple lines of business have a billing ID tagged with it so we know which organization is using this. These are all problems that previously could have been easily solved with human verification but today, it's very, very difficult in the age of automation.

Today, what I'm happy to announce is the availability of something we call Sentinel. It's a system and framework for policy as code. Sentinel is an enterprise only feature that is being integrated into all our products and to see how Sentinel solves some of these problems, let's take a look at those examples one more time. Each of these examples actually corresponds with a HashiCorp tool that's capable of automatically solving that problem in combination with Sentinel.

Going through each one, if we take a look at the first one of Consul enterprise, we like to forbid changing configuration outside of working hours. This is actually the associated Sentinel policy that would enforce that. We're gonna cover language syntax and other things a little bit later in detail, but I hope what you can see here is that it's pretty easy to understand and see what it's doing, which is verifying that all keys that start with the service namespace can only be modified within the standard business hours of 8:00 AM and 5:00 PM.

If we take a look next at Nomad, we want to ensure all services have associated health checks. Equally easily, we're just looking at the job structure and verifying that all tasks that are within the job structure have at least one health check associated with it.

Moving on next to Vault where we like to maintain some base security by ensuring all TSL certificates are encrypted with a 2,048 bit key. We could do this by making sure when a PKI role is created, which is how you define who could create TSL certificates within Vault that their minimum required bit size is always at least 2,048.

Then finally, with Terraform, ensuring all instances have a certain tag that we expect. This one, we could go over a plan that's been made and verify that any AWS instances in the plan contain a billing ID tag.

These are four common problems that real customers have brought up to us over the years and they all now share the same easy solution and that solution is what we call Sentinel. Those are the problems that Sentinel can solve but what is it? How does it work? Let's take a look at that next. Sentinel embraces this concept that we call policy as code. What policy as code does is bring all the same benefits you know and love with infrastructure as code and bring them down to policy. By representing policy as code, we're able to use proven software development practices with policies. This is in contrast to other policy systems that use graphical user interfaces for updating or application specific endpoints. Because the problem with these is that they're very difficult and not easily repeatable, versionable, or most importantly, testable. Sentinel solves all of these cases.

To do that, Sentinel's really four separate things. It's a language. It's an embedded framework. It's a workflow and it's an ecosystem of imports. Starting with the language, Sentinel does use its own policy language that was designed specifically to make writing policies easy. We designed with hundreds of customer examples and translating those from English into the policy and trying to make that as easy and straightforward as possible. One of the things we found was that a lot of the people that are responsible for enforcing this policy within businesses do not have a strong programming background. You cannot throw at them a programming language because they're not comfortable doing that but with something like Sentinel where it almost becomes English and it's almost one for one, they were able to effectively write and enforce policy across multiple systems. It was also built to be safe in a highly security sensitive environment such as Vault. Sentinel is integrated with Vault and we can't compromise the security of Vault because of it. That was designed with that in mind.

Then to run this language, Sentinel is an embedded framework. Embedded is a really important word because what that means is that Sentinel itself is built into the applications. You don't run another application next to it. You don't have to deploy something new. Sentinel is actively right in the data path of what's happening. What this lets you do is what we call active enforcement. Sentinel actually actively sees a behavior coming in before it takes effect and can perform policy checks on that behavior at that time and reject them if they fail. You don't detect after a key's been written to Consul. You don't even allow that key to written to Consul in the first place. The policy's run right there. While we do active enforcement, Sentinel is also capable of passive enforcement. Sentinel can continuously run in the background and continuously check the safety of your systems to ensure that they haven't been pushed into a policy violating behavior by some external force.

For example, with Terraform, we could continuously run Sentinel against the Terraform state and if someone goes into something like the AWS Consul and deletes a tag, we will detect that your system was pushed into a policy violating behavior outside of the Terraform workflow.

Sentinel also natively supports the concept of what we call enforcement levels. The enforcement levels of supports we call advisory, soft mandatory, and hard mandatory. Enforcement levels are broad types of policies that we found existed across all of our customer advisors while building Sentinel. Advisory policies allow a policy to fail but actively show a user a warning of what failed and why. These are useful for scenarios where you like to educate a user that they're perhaps doing something not quite right or if you're deprecating something and you want to warn them that they shouldn't be using this version anymore.

Soft mandatory policies require that the policy pass but it could be overwritten with the proper privileges. These privileges may be the same person. They may require another person. These policies are super useful in cases such as don't allow deploys on Friday. Deploys on Fridays, easy way to get people working late or over the weekend. Maybe that's not a good idea but of course you have to deploy sometimes on a Friday. Things happen. Soft mandatory policies are a way to require an explicit override, an explicit acknowledgement that you're doing something that could be dangerous. Overrides always appear in logs so that you know when overrides happen, who initiated the override and who initiated the original request as well.

Hard mandatory levels cannot be overridden. It must pass under any circumstance short of actually removing that policy itself, which itself is a fairly privileged operation. Hard mandatory policies are really great for business requirements as well as legal compliance in assisting and making those a little bit more robust.

We have the language and we have the embedded framework to actually run that language and enforce policy. We think that Sentinel's a really great technology but having a really great technology just isn't enough and doesn't solve the actual core problem that we're trying to solve. You also need to think about the development test and deployment workflows of these policies. We think those are just as important as actually building a powerful policy system.

What we've done is build into Sentinel a powerful workflow. Sentinel ships with a local CLI that we call the Sentinel Simulator. The Sentinel Simulator is a single binary with zero dependencies and it lets you write and test policies against any Sentinel enabled system without access to that system all locally on your own machine. The policies are developed in text files. We are pushing policy as code so you'd code your policies in text files. They could be run using a command called Sentinel Apply. This command can mock any data that a system exposes to your policy so that you could develop that policy without that system. When policies fail, you could also see the failure but also the logic that led to that failure and the various rules that passed or failed that led to that result.

In addition to simply developing policies, evaluating them locally, seeing if they pass/fail, we expect all our customers and users of Sentinel to eventually grow to hundreds, thousands of policies and reasoning about those ensuring they behave correctly is extremely important and is a challenge today. Sentinel also includes a full test runner and test framework within the Sentinel Simulator that you can use to test your policies. What this looks like here is you define an environment in which the policy is running. Again, you mock all the information that the system would give you. You don't have to run Consul. You don't have to run Vault. It all just runs in memory.

What you do is you assert that the policy behaves as you expect but you don't just assert that the policy passes or fails because perhaps that policy's passing or failing for some completely unrelated reason from what you're testing. You could also assert the various logical rules that led to that pass or fail so that you know it's passing or failing for the reason you expect. This CLI was designed to run in continuous integration environments. You could see no parameters as long as you follow the right folder structure, will run thousands of policy tests automatically, all in memory. We're encouraging everybody to run this in their CI environments to constantly have continuous testing of their policies. This also lets it work extremely well with version control, pull requests for policies, et cetera. You get all those benefits directly with Sentinel that you can't get when you're manually editing policies directly through a system. That is the powerful workflow Sentinel provides.

The last thing I want to talk about is the ecosystem of imports. Sentinel can use plugins that we call imports to source external information for use in policy decisions. This is a really extremely powerful concept especially once you think through the ramifications of this. Your policies aren't limited to only the information that the system is giving you to make that decision. You could query and get information from any external system you want in order to make your policy decisions.

As an example, we have a customer that uses Terraform that also uses ServiceNow that wants to ensure that their Terraform changes can't go into infrastructure until there's a change management request in ServiceNow that matches that change. They could do this with Sentinel by writing a plug in. These plug ins are exposed to Sentinel as imports. Imports expose functions and data you could use within Sentinel to make these decisions. The Sentinel Simulator can also mock all of this data. You don't even need access to the plug ins. You don't need to have access to a ServiceNow cluster if you're writing that plug in. You could mock all of it locally to still develop policies against this right on your own machine offline.

Anybody could build a Sentinel plug in using the SDK, which is publicly available. The SDK provides a test framework to verify your plug ins and gives you examples in order to build these things so that you could work with any external information source. These plug ins then work with any Sentinel enabled application. For example, if I wrote a plug in that sourced data from S3, the moment I build that plug in, I can now source data from S3 in Consul policies, Nomad, Vault, Terraform, all of them without modifying that binary. The plugins used, our production ready plug in system we've developed for the past five years and build on top of it. Additionally, the plug ins use the new gRPC mechanism in our plug in framework so you could write plug ins at any language.

Those are the four aspects of Sentinel that make it a really incredible system for policy as code. As I said earlier, Sentinel is integrated natively into all our enterprise tools. It's available in the next version of each and it's ready to go. I'd like to show you what it looks like in each of those.

With Terraform enterprise, you could define policies that run between a Terraform Plan and a Terraform Apply. These policies have access to the entire Terraform configuration, the Terraform Plan, the Terraform State, and a simulated version of the state after the apply. You could use all of this information to determine if an apply is allowed to happen. This is the same example as previously but you can see we're going over the plan to verify it has a tag. Equally, you could go over the plan to verify, for example, that no tags are being removed, that only tags can be added. You could do just about anything with this.

With Consul and Consul enterprise, you're able to define policies that run during KV Modify events as well as service registration and updates. In these scenarios, you could enforce whether a KV Modify is allowed to happen or a service registration is allowed to happen. In the example I have here, what we're doing is if the service name is prefixed with a user in a hyphen, we're enforcing that that must be deployed within a site or block that perhaps has higher levels of security or monitoring or something. We just require those services to live in that site or block and we can enforce that right here.

With Vault enterprise, policies are split into two types of policies. We have what we call role governing policies and we have what are called endpoint governing policies. Role governing policies are attached, as you might expect, to roles. Whenever you make an API request using a token that has that role, we execute that policy. Endpoint governing policies are attached to specific API endpoints whether you're authenticated or unauthenticated and whenever you hit that endpoint, we execute that policy. Both of these policies have access to the same amount of data, which is a lot. You have access to the full identity of the person requesting. You have the full request package and a lot more. In this example, what we're actually doing is ensuring that people who log in with LDAP are only allowed to request read only database credentials. You must log in with another mechanism in order to get write database credentials.

Then with Nomad enterprise, you're able to define policies that run before a new job is created or a job is updated. These have access to the full job definition that's being sent. What you could do with this, in this example for example, is ensure that all tasks, all jobs that are deployed into the Nomad cluster are sourcing their artifacts from an internal artifactory instance. They cannot source artifacts from an external source. You can see where you could go with this and the amount of power you actually are able to have.

One of the customers that we worked with in the design from the very beginning with Sentinel was Barclays, one the world's largest financial firms. Barclays is planning on using Sentinel to provide safety guardrails for provisioning cloud resources. This will allow their expert operators to define reusable Terraform modules within the registry, best practices with Terraform, and give them out to over 15,000 users of operators within Terraform and know that they're safely using Terraform across their entire organization. It's really great to see how Barclays is gonna put Sentinel to use in their cloud. We have to thank them for being really great customer advisors during the design of Sentinel.

We're really excited about Sentinel. You can learn about it today on HashiCorp.com

More resources like this one

2/3/2023
Case Study

Automating Multi-Cloud, Multi-Region Vault for Teams and Landing Zones

1/20/2023
Case Study

Packaging security in Terraform modules

12/22/2022
Case Study

Architecting Geo-Distributed Mobile Edge Applications with Consul

12/13/2022
Case Study

Nomad and Vault in a Post-Kubernetes World

View all resources