Hearing a lot about "Zero Trust"? Learn how Boundary was designed from the ground up for differentiated security, simplicity, and extensibility. Get deep dive details on how Boundary works and what its architecture looks like.
Speakers: Jim Lambert, Jeff Mitchell
Hi, I'm Jeff Mitchell. I'll be giving this talk with Jim Lambert. We're both engineers at HashiCorp working on Boundary.
We're going to talk today about the underlying principles that drove Boundary's design and what makes Boundary different. What did we think about when designing it?
Then Jim's going to talk about how we translated those into Boundary's core architecture.
There are a lot of entrants, especially startups in this market, but we thought that we could bring our experience working with the types of customers that we work with to bear, and that it would speak to a lot of them.
We put a lot of careful design into Boundary, and hopefully you'll see what we mean during this talk.
lot of core HashiCorp and especially Vault design principles carry over. This was a good time for us to take a step back and think, "5 years after Vault, what lessons have we learned? What kind of principles can we apply and pull in not just from Vault but the rest of the company as well?"
I want to talk about 3 of the principles that we had in mind:
Vault clusters run everywhere. Home hackers run them on Raspberry Pis, and there are single clusters with many nodes that are processing trillions of transactions per year.
Boundary needs to scale up and down, similarly. We want to make this accessible for the home user that is just wanting to get into their home network, but also we want it capable enough for the largest businesses.
The easy way to scale these days, and a very modern way, is to pick a cloud vendor that has lots of services exposed. You plumb together the services with a custom frontend, and you're done.
This works well for a lot of products, but you have to hope that pricing doesn't change and the services aren't discontinued.
It can help a lot of smaller projects get off the ground, but at the scale of the bread and butter HashiCorp customers and our open source users, there are real-world complexities.
Especially larger customers, many of them will be multi-cloud and on premises for a long time. Even if they're moving to managed services, they are still likely to have on-premises for a long time.
Those managed services may be on various clouds. And that matters because there are things like geopolitical and contractual obligations that require the use of some clouds but not others.
So where your data transits are and where the data is stored and where the services are located end up mattering a lot.
We saw with Vault that there were a lot of AWS-centric solutions that didn't flourish, whereas Vault did.
We think part of the reason why is its ability to run anywhere.
On the open source side, if you're a home user, maybe you don't want to sign up with GCP or AWS to run something. You can't necessarily rely on those cloud services.
So scaling must be a core part of the system design, and it applies both to requests-per-second-style scaling and to organizational scaling.
A core Vault focus is accessible, best practices security. We want to do what's secure, not what's easy.
At the same time, we try to remove a lot of the complexity of being secure from the end user.
Boundary follows suit. We build on our experience of Vault to go further.
Here are some examples of the things we've done.
Connections in Boundary between the various components that Jim's going to talk about—controller, worker, and client—are protected via single-use Mutual TLS, authenticated TLS 1.3 stacks.
Doing so significantly reduces both ease of compromise and the risk of exposure if compromise occurs.
You can't say, " I'm going to reuse this TLS stack across other sessions." We don't do that.
And we use key-management systems, which can be Vault, or it can be various cloud systems like AWS KMS, GSPC KMS, or Azure Key Vault to provide the root of trust for secure introduction needs.
protect these TLS credentials as we pass them around so that we can establish that mutual connection.
But, to make it easy, we are doing all that for you. We're doing this modern and best practice security, but you just have to provide us with a KMS, and that's it.
Boundary has role-based access controls (RBACs) that apply to every resource within it. We also have useful aggregates like type and wildcards, and anonymous users are controlled the same way.
You can apply RBAC to the anonymous user that doesn't have an authentication Right now to Boundary.
How do we make this easier for the end user?
One of the things that we took is a principle to compose access explicitly from zero, real zero trust.
There's no "deny" capability. You can't grant a bunch of permissions and then remove permissions later on.
We found this reduces a lot of cognitive load. It makes it much easier for people to figure out, "What are the final set of permissions that are applying?"
You don't have to think about, "What did I grant access to?” and then remove specific access.
It prevents accidental granting of excess privilege and reduces cognitive load.
Again, we're taking something that's really complex and making it accessible and trying to reduce the burden on your end users.
As a final example for security, we even looked at things down to the workflows. One way that this panned out in Boundary is that one of the KMSs that you can define can be shared by both an operator and the system itself.
If the operator needs to put passwords or API keys config files for Boundary, they can encrypt those values in those configuration files.
That allows it to be put across Git, or in your CI system, or baked into Amazon Machine Images (AMIs) or containers or Lambdas or what have you, and you have that safely, as long as the end system can access that same KMS for decryption at runtime.
We tried to make it easier to be secure, but still simple if you are an operator and looking at the workflows.
HashiCorp was founded on the open source ethos. We believe very deeply in it.
There will be an enterprise version of Boundary at some point, just like our other products, but like our other products, the core capabilities will always be open source.
Why does this matter? Partly, it's philosophy.
We build not just products, but a lot of libraries. We put them out there, lots of people use them and find them valuable for building their own software packages.
And we use a lot of other people's libraries as well.
And we deeply believe that we all benefit when more of this is open source.
It's also very practical. Infrastructure is not homogeneous. We know that very deeply at HashiCorp.
As examples, we're building dynamic host catalogs, which are similar to the static host catalog, where you put in the hosts that Boundary can connect to.
These will pull in hosts directly from cloud providers or whatever, but there are tons of places that can host the machines for you.
We're working on the interfaces, and we'll build initial plugins, but we know that we can't provide every plugin for every user.
Similar for credential stores. We're working on built-in Vault credential sourcing right now but could support plugins in the future.
So having open APIs and open source is something that we see as very important, because we want you to be able to build on what we are putting out there in order to support the things that you need, because we know that things are not homogeneous.
Now Jim's going to talk about core architecture and how he put some of those principles that I talked about into practice when building the product.
Take it away, Jim.
Like Jeff said, I'm going to talk about how we applied our principles to the core architecture.
Boundary's architecture basically is a control plane and a data plane, and the control plane has controllers, and it has KMSs.
As you see in this slide, there are multiple KMSs for different purposes, but generally Boundary's a KMS component, and there's a database. That makes up the control plane.
On the data plane, there are just workers and a KMS. That's the KMS that's shared between the controller, the control plane, and the data plane to do authentication.
The KMS component in the Boundary architecture allows the customer to choose a route of trust. We use that route of trust in a variety of ways in the Boundary architecture.
One way is there's a recovery key, and that recovery key allows you to do rescue and recovery operations, of course, but you can also use that recovery key to authenticate for just about every operation within Boundary.
We also have a root KEK, a key to encrypt keys. From that root KEK, we will derive other keys, other KEKs for scopes within Boundary.
A scope within Boundary allows you to organize Boundary by projects or organizations.
Each scope will have a KEK and multiple DEKs, keys to encrypt data. Those DEKs will be single-purpose use within the scope.
That should give you a sense of how we use that root KEK.
Then we have a worker-auth key, which is the KMS key that shares between a worker and a controller and allows for the single stack authentication for TLS that Jeff talked about.
Jeff also talked about the config key, which allows you to encrypt different attributes within the configuration.
The controller component in our architecture, which is part of the control plane, is really where the domain model for the most part is implemented.
You'll see the domain model a little bit in the work, but primarily it's in the controller. In the controller] you'll see things like users and sessions and targets and credentials.
Our controllers are leaderless. It's a little bit different than some other HashiCorp products that use Raft. We don't have a Raft running in our protocol. At least not yet.
We're using the database for persistence. Controllers are leaderless, and that allows us to have predictable horizontal scaling.
Controller also does authentication and authorization. It does authentication for users and workers, and there's authorization for all the users via roles and grants and principles.
The controller also serves all API requests and our admin UI, and it assigns tasks to the data plane and workers.
The primary focus of the data plane and the worker is to proxy sessions. It will do other jobs that the controller assigns, but primarily it's going to be the proxy sessions.
We're using a relational database under Boundary. That gives us that predictable horizontal scaling of the controller. It also gives us predictable horizontal scaling of the database.
It's where all the domain model is persisted. Currently, we've implemented Postgres as our supported database dialect.
We use domain-driven design within Boundary's architecture, but we also have architectural boundaries, and those architectural boundaries are respected throughout the controller.
You'll never see the services talking directly to the database, or the database going all the way up to the service.
There are layers, and we obey those boundaries as the requests go through the system.
You'll also find, because it's a domain-driven design, ubiquitous language of the domain throughout all these layers.
You'll also notice that there are Protobufs, not only at the service layer, but at the data storage layer.
Once you start down this domain-driven design in Boundary, you start to see the domain everywhere. It becomes part of your language, this ubiquitous language of the domain.
What's ubiquitous language? Ubiquitous language is where you strive to have clear concepts and terms without ambiguity across a limited domain.
This domain we're talking about is Boundary. You use those terms and concepts, whether you're talking to engineers or designers or product managers, whether you're writing RFC or PRD or code or test.
Wherever you're using these terms, you need to have them consistent and unambiguous.
Of course, we're always searching for new patterns and terms within the Boundary domain, as we develop it.
We use Protobufs for more than just API. We use them for the API, of course, but we also use them to generate our CLI. We use them to generate our Go SDK. We use them for open API integration, we also define all our database types in Protobufs.
Those database types also have some extra tags that we put in those Protobufs to define primary keys and the column names, if they need to be overridden, perhaps defaults for the columns.
We use Protobufs quite a bit within Boundary. Once we have those Protobufs, especially at the storage layer, we can use them as we go down the architecture into lower layers, like the infrastructure layer.
The infrastructure layer of Boundary is composed mainly of 2 parts. There are a few more things at the infrastructure layer down at that lower level of Boundary before you get to the relational database.
But primarily we have a database API, which provides retriable transactions, all the CRUD operations you would need in database, query, lookup.
It also supports something called an "operational log," or an "oplog."
Boundary, an oplog is an order history of every change that goes into the database.
That's part of our infrastructure as well. If you ask the API to do a database operation through our database API, , it'll generate an oplog for you and store it in Boundary's database.
Each entry in the oplog is encrypted, and it contains the serialized Protobufs that were necessary to make that change.
The entries are ordered, which means we had to do some kind of serialization for locking to get that. We used optimistic locking, which is a known pattern to provide something like that for a relational database.
Boundary has lots of related data. And a related database screams that you want to use an RDMBS because it makes it easy to query and bring all that related data together when you need to, quickly.
An example of that is when you authorize or create a session in Boundary. When you authorize and create a session in Boundary, you have to bring data together from targets, from hosts, host sets, host catalogs, auth methods, users, user tokens, servers, grants, roles, principles.
All this related data has to come together very quickly, basically instantaneously. And so you can create or authorize a session.
This is just one of the many use cases in the Boundary domain that maps really well to a relational database.
How else do we use a relational database?
We also rely on a relational database to ensure the accuracy and consistency of the domain data.
We define primary key and foreign key relationships, with "no orphan" rules, and with normalization and constraints and a few triggers and of course transactions.
We use all these things because one of our primary database tenets is ‘the database should not rely on the application layer to maintain its accuracy and consistency.’
We use this defense-in-depth kind of strategy across the infrastructure and the layered architecture, where every layer will perform validation and integrity checks.
The database schema at the lowest level is that last line of defense to make sure the accuracy and consistency of the data.
When you get down to the relational database in Boundary, of course, there's a data model, there's a schema there.
This slide is just a small sliver of that data model. It's basically the OIDC auth method that we just released with some of the other related data, like the KMSs, and the scopes that are related to that auth method.
If you look really closely at this small picture, you'll see on the far right, there's a base type of auth method, and just a bit left of center you'll find a subtype of an OIDC auth method.
This is just one example where you'll see subtypes and base types modeled in the Boundary domain and the relational model.
Of course, we already have a password auth method, so that's another subtype in this model.
Boundary currently has about 100 tables, maybe a few more, and has roughly 150 foreign key relationships defined and lots of constraints.
When I spout off that statistic, people get nervous. They're like, "That's a lot of complexity in the database."
Well, the complexity really isn't in the database. The complexity is in the Boundary domain. The complexities are there already. So all we're doing is using the constraints or the features of the relational database to define very small individual constraints.
For example, when we define a foreign key relationship between the user and the accounts of that user, and we define a "no orphans" rule, that allows us to know for sure that if a user is deleted, all the accounts for that user are deleted as well, and the database will ensure that integrity.
That's one example of how you can limit that complexity and see that it's really just a bunch of small little things that have one single purpose that build up to protect the database.
Boundary's best kept secret is the data warehouse. It's not really a secret, because if you look at our source code, if you look at our data model, you'll see there's a data warehouse there.
In our data warehouse, you'll find facts for sessions and facts for all the connections that make up a session. You'll find dimensions for users and hosts and some dimensions for date and time stamps as well.
And using a BI tool that you are familiar with, let's say Tableau, you could point Tableau at Boundary's data warehouse, and you could start to explore patterns within access to different hosts in Boundary.
You could start to understand whether somebody is consistently making connections in off-hours, which aren't the normal business pattern.
In this data warehouse right out of the box, we'll provide you that feature from Boundary.
We have plans to enhance it, and we'll document it.
As we enhance it and add bytes up and bytes down for every connection, you'll be able to look even for data exfiltration patterns, possibly, by using this data warehouse that's part of Boundary.
You won't have to ETL it, design your own data warehouse and understand all of Boundary's related tables to build a warehouse. It's already in Boundary.
That's all we have time for today. Now I'll give it back to Jeff.
Thank you very much, Jim. And thanks, everyone, for attending our talk.