Policy as Code: IT Governance With HashiCorp Sentinel
Jul 30, 2019
Get an updated 2019 introduction to Sentinel, HashiCorp's policy as code framework. See demos of Sentinel policies inside Terraform, Consul, Nomad, and Vault, and learn about upcoming features.
Sentinel is HashiCorp's policy as code language and framework. Sentinel provides several advantages to all of the HashiCorp Enterprise products by providing an automated way to govern costs, security, compliance, and more. It increases the productivity of security and compliance teams, empowers systems engineers, and modernizes offline workflows that may be missed by traditional IaC tools.
In this talk, HashiCorp engineer Chris Marchesi will demo example Sentinel policies for HashiCorp Vault, Nomad, Consul, and Terraform to show how it's advantageous for each domain. Most of the talk will focus on Sentinel with Terraform: how a practitioner can manage policies with the TFE provider, group policies into sets, and apply those policies to workspaces to trigger policy checks during a run to ensure that a specific Terraform plan can run or not.
The talk also discusses several upcoming features for Sentinel in Terraform Enterprise, including:
- Workspace metadata import
- Cost estimation
- Automatic policy set association with workspaces
Even non-enterprise customers can get started with Sentinel via the offline CLI tool, the Sentinel Simulator.
- Chris MarchesiTerraform Enterprise & Sentinel Software Engineer, HashiCorp
Today, I'm happy to discuss HashiCorp's Policy as Code framework, Sentinel. Whether you're new to Sentinel or not, I hope this presentation can get you more familiar with the framework and its role in our enterprise products. I'm Chris Marchesi, Engineering Lead for Sentinel at HashiCorp and the primary developer for the Sentinel runtime and its integrations—namely Terraform Cloud and Enterprise.
Introduction to IaC, PaC, and Sentinel
Sentinel is HashiCorp's Policy as Code framework. We introduced it at HashiConf 2017. It enables IT governance in HashiCorp's Enterprise products. You'll see integrations in Terraform Cloud and Enterprise, Consul Enterprise, Nomad Enterprise, and Vault Enterprise.
I'd like to give you a tour of the framework to help you, the practitioner, get an idea of the problem Sentinel was designed to solve—how it works with our products, and how you can get started using it today.
What do we mean when we say Sentinel is a Policy as Code product? To better explain what Policy as Code is, let's discuss a well-established concept of Infrastructure as Code. This will give a foundation on which we can build and make it obvious why Policy as Code is necessary to work with IaC at scale.
Infrastructure as Code
At HashiCorp, we're firm believers of the concept of Infrastructure as Code. We'd like to think that our products embody the practice. We believe that investing in IaC is necessary for an organization to scale. It can be rewarding in its own right to practitioners of a wide variety of backgrounds.
Let's discuss how IaC helps by versioning your work, providing automation for your workflows, and providing documentation for your processes. While not necessarily the whole of IaC, we believe that these disciplines are at the heart of the concept.
Versioning is the foundation in the Infrastructure as Code concept—giving you configuration storage with history. It’s the practice of using VCS systems, such as Git, to store configuration data—and these changes are tracked.
The VCS history will tell you who made any change, exactly what they changed, and when they changed it. This means an audit trail is produced and allows for the observation of changed infrastructure over time. A VCS system, while generally designed for developers, turns out to be a great way to ensure systems engineers can keep track of the modifications they're making—keep them on hand for backups or re-deployments, and roll them back if need be.
The object of automation and IaC is to support a simple deployment workflow to complex infrastructure. Automation is important as it ensures IaC provides value over basic documentation and manual procedures. We transfer manual workflows to automated ones—and we aim to execute deployments in a single operation or at least aim to reduce the number of steps required.
This process vastly reduces the amount of manual work necessary and reduces the possibility for human error. Ultimately, automation enables us to deploy at a scale not normally possible with human work. Extremely complex operations that would normally take hours can take minutes or even seconds. An entire workflow can now be repeated hundreds or thousands of times, trivially.
I’d be hard-pressed to think of any organization that's perfect with theirs. All too often, workflows adopted by individuals may go undocumented. Writing good documentation takes time, and it's highly susceptible to drift as situations change. IaC solves this problem by making the source the documentation.
While usually said tongue-in-cheek as a criticism in software development, it works well when applied to infrastructure. It reduces or removes the need for documentation that may be prone to frequent change. It also reduces the risk of divergence of documentation from reality and can largely replace documentation when it concerns automation.
Every HashiCorp product implements the concepts we've discussed, to some degree. As IaC tools compare, Terraform, Packer, and Vagrant are industry leaders when it comes to provisioning and multi-cloud infrastructure, images, or development environments. Consul, Nomad, and Vault are not specifically IaC tools themselves. Their designs still make it easy to version configuration, automate management, and self-document their particular problem demands.
Infrastructure as Code at scale: Ensuring security
Now that we have some of the background of Infrastructure as Code let's take a look at some of the challenges that may arise when applying these principles to large organizations. Infrastructure as Code is great because it allows you to scale. It reduces the barrier to entry for automation—or through automation—in addition to allowing people to do more.
As an organization scales, IaC presents its challenges. The increase of productivity inherent in automation makes risk management harder. You can put pressure on the teams who are responsible for managing risk within the organization. This can make it hard to effectively manage the security of the infrastructure in the organization—ensuring that the organization is in compliance, and ensuring workflows follow best practices. Let's look at why these things are important.
Security is a fundamental and important part of every organization. As an organization scales, there's always a risk that security policies and procedures may get lost in the growth.
Security awareness is subjective from employee to employee. Through no fault of their own, an individual may unintentionally perform an action that could expose the organization's infrastructure to vulnerabilities. Even security-awareness training may not account for every scenario—most notably, good old-fashioned human error.
Security lapses at scale can have extremely disastrous results. Data breaches at large organizations are usually newsworthy events with possibly millions of clients being put at risk—causing damage to a company's reputation that can take years to repair.
Hand-in-hand with security is compliance. Security and privacy procedures usually take the form of industry-mandated, best security, and privacy procedures. Examples include payment processing standards such as the PCI DSS, various government standards such as the FIP standards in the United States, and data privacy standards like the GDPR.
Violations of these standards can affect your business. Violate PCI, and you could lose your ability to process payments. Violate the government's standards, and you could lose access to entire business sectors. Violate GDPR, and you could face fines and financial penalties, sometimes in the tens of millions of euros.
All practitioners should be aware and receive the appropriate training—but as with security, not everyone's going to have the same level of awareness regarding any sensitivities. As an organization grows, it becomes increasingly harder to manage this at the human level.
Finally, there's the concept of best practices. These aren't necessarily hard-and-fast security policies or privacy standards like the GDPR. These are things that are generally correct procedures not covered by a specific standard. Examples would be to set up single points of failure, optimize for cost efficiency, or build for responsive user experience.
Best practice can be a problem. I have a problem with the term because best practice and common sense are used interchangeably way too often. This can lead to assumptions about an individual's awareness.
You could easily replace, "it's best practice," with, "it's common sense," when talking about these three things. After the fact, hindsight's very much 20/20 here. These problems were aggravated at scale. As an organization takes on individuals and employees, you're going to have a lot more opinions as to what constitutes best practice.
Policy as Code
How do we solve these policy issues? How do we eliminate the assumption that everyone knows how to configure a redundant or performant infrastructure that adheres to industry standards and government regulations and doesn't make the organization vulnerable?
Let's talk about the traditional process first—using documentation for manual review. In the manual process, security compliance and best practices are usually stored within documentation as policies to be followed.
These are applied by practitioners during code review and dry runs—such as reviewing the output of a
terraform plan. When we refer to manual application, we're seeing that the reviewer follows the policy when making judgments on whether or not to allow a change.
The problems here are threefold. First off, code review is a labor and requires context. This can be tough if the experts for the code review are on one team and the domain experts for security and compliance are on another. Reviews take enough time already, and this can exacerbate things.
Second-off, errors can be missed or subjectively accepted as correct. No one reviewer is the same—and sometimes there can be a number of factors in play in addition to the aforementioned context issue.
Finally, a review can take days and lead to frustration, mainly due to these factors—but possibly others. This can be a disempowering process for the practitioner and risk negating the gains that adopting Infrastructure as Code gets you.
Policy as Code is important. Moving your policies out of documentation and into code allows you to automate these checks and balances, remove arbitrary interpretation, and supply immediate results for engineers. Everyone can get back to the work that they enjoy doing.
Further to that, our policies now enjoy the same benefits as our infrastructure code. We store them in VCS with a change history. You automate your policy checks as part of your regular workflow. You document your policies with code—replacing a good part of your paper policy.
Let's talk about the things we've been trying to do with Sentinel at HashiCorp to help make Policy as Code a good experience for everyone. Our objective when designing Sentinel was to bring infrastructure in code principles to the governance, risk, and compliance field.
This meant bringing over a lot of people that weren't programmers. As such, we aim to make a programming language that's friendly to both non-programmers and programmers alike. We aim for Sentinel to be embeddable so that it's easy to integrate into all our products—safe so that it can be executed in a sensitive context, and auditable, so you know which rules passed or failed and why.
Let's elaborate on these fundamentals. First off, we aim to design Sentinel in a way that programming experience isn't required to you work with the product. As mentioned, we're working with an audience that's composed significantly of non-programmers.
They may be operations people or compliance experts or maybe systems engineers. As practitioners of Sentinel, we want these audiences to be able to understand the policies written within the language and be able to write them with a level of proficiency.
We aim to keep the language easy and understandable via a couple of principles; the first is an emphasis on simple policies. The language of integrations is extended with the objective of keeping policies short and readable.
We like to restrict policies to single-file code. If a policy gets too large and too unwieldy, that tells us that we need to start working on our integrations better to get it back to that principle.
Secondly, we use an English-based grammar with statements, keywords, and operators, or English-language words structured in a way that reads something between pseudocode and an actual English language sentences.
At the same time, we still aim to make Sentinel friendly to programmers. Stakeholders with programming, or even at least scripting experience, will be able to write more complex policies than those that don't have a similar background. Sentinel has constructs that any programmer would reasonably expect—such as conditionals, loops, and the ability to create functions to help with more complex workflows.
The Sentinel simulator has advanced testing and mocking features that we'll cover later in this presentation. These features allow practitioners to mimic the environments they'll encounter in production to ensure function policies function correctly. To help with processing the wide variety of data you encounter in Sentinel, we have a growing standard library that we're building on all the time.
Some of the imports available in the standard library include the time import for accessing time of day, the JSON import for parsing JSON data, and the Strings import for working with the many kinds of strings you'll get. These aren't all of the imports, of course. You can find these and more on the Sentinel website.
How Sentinel works in each HashiCorp product
Let's talk a bit more about how we designed Sentinel to be embeddable and safe into the products it's integrated into. Internally, we designed Sentinel with this principle of straightforward integration into the HashiCorp products. It's embedded directly in the enterprise binaries you get with Consul, Nomad, and Vault Enterprise. With Sentinel invoked in time-critical places in these products, we designed it to be efficient and minimize impact to the request path.
This level of embedding means we need Sentinel to be resilient and safe. We've designed a broadly read-only language, and runtime model as the purpose of Sentinel is to analyze data and not change it. We also give restricted system resources to the runtime. This prevents Sentinel from actually crashing the system that it's embedded into. There's no ability in the runtime to execute arbitrary commands.
Sentinel is designed to be auditable, so you know what rules passed or failed and why. Trace data is retained for rules, including any sub-rules and conditions that triggered a specific result. The data is built naturally as those roles are evaluated in a policy. This promotes a more granular pattern for policy authoring. We encourage authors to write policies with many small rules to take advantage of auditing functionality.
Sentinel in Terraform
Sentinel has been implemented in every HashiCorp product at varying levels of scale. But it's probably seen its most extensive deployment to date in Terraform Cloud and Enterprise. Let's look at each of these implementations, including what a policy may look like in each implementation —and some of the features offered with Sentinel in each product.
Here's an example of what a policy looks like within Terraform Cloud and Enterprise. In this example, we use data from a Terraform plan—provided by an import that provides a plan data—which is aptly named
tfplan. This example would traverse all
aws_instance resources— defining the root module—and ensure that all instances have tags defined. If a resource is detected that doesn't have tags, the policy has failed and the run is blocked.
Sentinel policy checks run in Terraform Cloud and Enterprise immediately after the plan has been made. This phase—called the policy check—is sent the plan along with any configuration in any existing previous state. You make a policy based on this data, with the results ultimately determining if the plan can or cannot proceed. You can use the soft mandatory result to require that a plan proceeds only with approval from an organization administrator.
Policies are organized in Terraform Cloud and Enterprise by way of policy sets, which we recently updated to adopt a completely VCS-hosted pattern.
Policy sets now function very similar to workspaces in that policies are ingressed as they're added or modified on the source of the ECS repository. You choose to apply these policies to specific workspaces or to the organization as a whole. When a run executes, policies from all applicable sets are collected and sent for execution in the policy check for that run. Which policies are executed—along with each of their enforcement levels—are configured within an HCL-based configuration file in the policy set repository.
Being able to test your policies before they get deployed to a policy set is an important part of the development process. Testing policies live can be tedious. We provide the tools to allow you to test policies offline before deployment. Again, we'll discuss testing in detail a little later.
An important part of offline testing is the ability to work with mock data. The data that comes from Terraform Cloud and Enterprise can be complex, and mocking each particular scenario can be hard without access to actual data. To help ease this, we allowed the capability to generate mock data off of any existing Terraform plan. Whether you need an additional data set to work off—or data to account for a certain scenario—mock generation gives you the confidence you'll be working with the data you need to.
New Sentinel features for Terraform Cloud and Enterprise.
The first is the workspace metadata import. This'll complement the current Terraform imports by giving you access to specific data beyond the simple config planner state. Include data specific to the workspace run or organization. For example, you can change policy behavior based on the organization or workspace name that it was a part of.
The second is the integration of cost estimation data into Sentinel. Cost estimation is something that we're currently working on and is actually in beta right now. But when it gets into Sentinel, you'll be able to make policy decisions based on the cost of executing a plan.
Lastly, we'll be working to reduce the burden on workspace management by allowing automatic policy set association with workspaces. Through rules defined on a workspace, the policy sets will be automatically associated with workspaces as they're added—or even as the rules change on a policy set itself.
Sentinel in Vault
Sentinel support in Vault Enterprise complements the existing ACL system with the addition of two new types of policies. Role governing policies—or RGPs—are tied to particular tokens, identity entities, or identity groups. Endpoint governing policies—or EGPs—are tied to paths instead of tokens. During a specific request in Vault Enterprise, all three policy types are evaluated, starting with conventional ACLs and then moving on into the Sentinel RGPs and EGPs in that order.
The first two policies are only evaluated if the request is authenticated. This means Sentinel and endpoint policies are the only way to perform policy checks independent of authentication data.
Here, we have an example of an EGP demonstrating the instant invalidation of all tokens that have been created before a specific time. This can be used as a kill switch in the event of a large-scale compromise to ensure that no more tokens are allowed—buying time for forensic analysis and targeted remediation.
Similar measures in the conventional ACL system would require a policy that can be modified, so it's attached to every token—in addition to not being applicable to otherwise unauthenticated paths.
Sentinel in Nomad
Sentinel policies in Nomad Enterprise allow you to hook into the job management process during operations such as job creation or modification. Access to the job structure is given, allowing deep introspection into any job configuration file. In addition to this, Nomad Enterprise fully supports soft mandatory policies.
When an inappropriate capability is assigned to a user, job submissions that would normally fail can be forced to proceed by supplying the policy override flag in the Nomad CLI—or via the parameter in the specific API request.
Here, we have a policy to ensure that all tasks groups in the job are using docker drivers blocking any other type of job. This would mean that all the jobs that run with this policy have to be docker containers.
Sentinel in Consul
Sentinel support in Consul Enterprise extends a standard ACL system for the key-value store to pass standard read, write, and deny policies. This allows full conditional logic and access to the data that’s being written.
Here, we have a policy that checks the key that's being written to you, and it validates the input. If we're storing the port, it must be an integer. If we're storing the name, it must be a word—a contiguous series of Unicode characters, digits, and connectors.
The Simulator packages the entire core runtime into a CLI tool that assists in developing and testing Sentinel policies. Whether you're an existing Enterprise customer or a practitioner— looking to give Sentinel a try—the Simulator is an important part of your workflow. Using the Sentinel Simulator, you can try out sample policies or the apply subcommand to see if they pass or fail.
Here, we try a simple policy that tests the time standard import. As a simulator comes with the standard library, it's great for these kinds of things. You can use it to play with the standard library all you want.
A simulator wouldn't be entirely effective if it was just able to utilize the standard library. As such, the simulator has capabilities that allow you to simulate the environment seen in HashiCorp's products. Here, we see a basic example that verifies a simulated import.
The data is supplied as a static object within the JSON configuration file. In addition to being able to supply mocks of static data, the Sentinel Simulator provides the ability to supply mocks of Sentinel Code. This allows you to mock data that JSON can't represent or other features—such as functions.
Here's an example utilizing the previously mentioned mock data supplied from Terraform Cloud Enterprise. We don't show you the contents of the mock file because they're pretty large. Instead of the object though, you'll notice that the path of the mock data file is supplied. Utilizing this more powerful mocking feature is crucial to the accurate mocking of Terraform data and the entirety of the import.
The Sentinel Simulator can be used to conduct sophisticated testing comprising of multiple policies and tests across the repository. Here, we've laid out an example repository with two policies. Within the repository, you can see the test directory—and then there are two subdirectories under there—where we place the tests. Here, we have a passing and failing test for each of the policies shown.
Each configuration file can not only assert different results for specific rules but can contain different configurations for mock data. This ensures you can test your policies against as many scenarios as needed to be assured that your policies will behave as expected. This is a passing test, but the import mocked is expected by the policy in an assertion that the main rule will return a true result. For a more sophisticated policy, you want to add all the rules and their assertions. As mentioned, you can mock the data as you see fit for each test case.
This is the failing test—where we've adjusted the mock data so that the policy will fail. We expect this, which you can see there is asserted by a false in the main rule. To run the entire test suite, you just run
sentinel test. This'll run the test for all the policies that are found in the working directory.
Testing not only against multiple cases but multiple policies at once in a single repository is especially useful when working with things such as policy sets in Terraform Cloud and Enterprise. The Sentinel simulator is free to use and available on the Sentinel website. It's available to use on all major platforms including Windows, Mac OS, and Linux.
If you'd like to learn more about Sentinel, you can read all about it on our documentation page. If you'd like to know more at the conference, feel free to come by one of our booths tomorrow. If you're interested in getting a more in-depth demo with Sentinel on any HashiCorp product, you can contact sales.
I'm Chris Marchesi, Engineering Lead for Sentinel, and it's been my pleasure to present this to you today. Thank you.