Securely share secrets in your pipeline, at scale
Jun 26, 2018
Learn how Groupe Renault moved from its ad hoc way of managing secrets, to a more comprehensive, automated, scalable system to support their DevOps workflow.
Groupe Renault uses a hybrid-cloud infrastructure, combining Amazon Web Services with on-premises OpenStack and vSphere.
At the start, Renault was keeping secrets in unencryped text files, or even in Excel files. Obviously, this wasn't secure and didn't scale.
The company moved to using:
- HashiCorp Vault to centrally manage all secrets, globally
- Consul providing the storage
- Terraform for policy provisioning
- GitLab for version control
- RADIUS for strong authentication
In this video, from HashiDays 2018 in Amsterdam, Mehdi and Julien explain how they achieved scalable security at Renault, using the HashiCorp stack.
Automation and Cloud Consultant, D2SI
DevOps Engineer, Groupe Renault
Julien: So, I'm Julien Arsonneau. I'm a DevOps engineer at Renault company and I work with Alliance IT Services. It partners with Nissan and Mitsubishi. So I think everybody knows Renault. We work on the electrification of all vehicles, and we work again with connectivity, and we started some work on autonomous drive. So I'll let Mehdi...
Medhi: Thank you very much, Julien. So hi everybody, my name is Mehdi Laruelle, I'm from D2SI, it's a consulting agency. As you can see, sorry for my accent, I'm French. D2SI are working with a large company, we are proud to focus on cloud on automatization, and we have clients for this part. This is why we do some formations, some courses for each of your clients. We are an AWS training partner. We love some social, so we are proud to be a co-founder of the HashiCorp User Group in Paris, and that's cool. And most important for us is the experimental part. So we work a lot, we get some feedback, some experience in field work, and we use technology. We are a partner with HashiCorp, Docker and AWS for our main partner. So it's cool and we help everybody, only our client, to use this technology on the field. Let's make some context, I'll let Julien do the context.
» Renault's tooling and infrastructure
Julien: So now today we will explain how we use Vault in the lifecycle of our project. So the main part is we want to make a global solution to share every secret. Before, just for fun, between the ops we share some Excel files with all our secrets, for example for the database, or we had a plain text with all our secrets.
It's just not acceptable, so we try to use Vault to manage, as a global solution, every secret during the pipeline for the project and for ops during the deployment. For this, for the moment we use three kinds of security. The App Role, we will explain what it is later. The RADIUS is ID-authentication for ops, and the LDAP for the project team.
We work on multiple environments. We work on public cloud with AWS, and the private cloud with OpenStack, and we use Swarm for real internet path. So the context is we really want to have secret management on each path. And for the pipeline, we use two kinds of orchestrator. One is GitLab CI, and another one is Jenkins. And at the end, for the running application, we use ECS on AWS, and on-premises we use Swarm as a Docker orchestrator.
As I said before, we really want to have no secrets on the SEM, so for example, on the user data, if you use their form, we can have some token to register the GitLab owner or something like that. So during the presentation you will see how we are onboarding the project and how we manage the policy for this project. So now I'll let Mehdi explain the architecture that we have implemented.
» Renault's architecture
Mehdi: Thank you very much. For the architecture part, we use best practices for HashiCorp. So you can see this diagram on the official documentation of Vault. We use free Vault for HA-8 availability and Consul for storage as a backend. We use only Consul for storage. We have another poster to extra features like DNS discovery etc.
For the Consul part, we use five nodes, and we have three Vaults on each Consul agent to communicate with the Consul cluster. So we use all the best practices, so TLS encryption, the encrypt key, access control list, etc. for the security purposes.
» The ops lifecycle with Vault
Mehdi: Today we will show you a project lifecycle illustrating how Op can use Vault in your site. We will start with the provisioning part as an ops. The second part is the tool updates. So some admin storage, some tools to generate the pipeline of the storage part, and we'll provide some tools to help us update the secret part. The third part is the human part, so sometimes the ops need to update secrets or create secrets, so we give access to it. And the last part, but the most important is the pipeline part.
Mehdi: Let's start with the provisioning part. For the provisioning part, we have three actors. The first one is the ops, the operator. The operator for Vault needs to be authenticated to Vault with RADIUS. Why you use RADIUS? It's for strong authentication. We have special policies to create or update secrets because in the Renault environment, some ops need to provide some environments and create secrets. And for that, the ops have one-usage secrets for the policy, which is to create another policy for the project site.
The second one is the project. The project uses special authentication called AppRole. AppRole needs two pieces of information. The first one is role ID, the second one is secret ID to have the token. We will see later what AppRole is in more detail. To retrieve the first one, the role ID information, the operator sends the role ID to the project.
The second one, secret ID, is provided by the last actor, the orchestrator, it's Jenkins or GitLab inside the pipeline, and the authentication is very easy, it's just a token, and it has a simple policy to only create the secret ID. So let's look back to the project.
The role ID is provided by ops, and the secret ID is provided by the pipeline, the orchestrator's job. It's the segregation of duty and security purposes. So only the project can have both pieces of information. Very important first. And with both pieces of information he can create the token. With that token, they have one policy.
We have two types of token: one for production environments, and the second is for non-production environments. Let's see how we can, as an ops, provide this environment for a project.
First, the project sends a request to us. We receive it as a mail or catalog. Second, we see the specific usage and specific needs of the project. So we make Terraform file, because we use Terraform for the policy part to provide the policy provisioning. It's a role provider if you are interested.
So we make a Terraform file, and adjust the policy for the needs of the project. After that, we push the Terraform file on the GitLab, and after that we use a specific pipeline from GitLab to use, plan, and apply Terraform with this policy. So everything is automated. After that, the ops authorize the pipeline, the orchestrator to gives the secret ID for a project.
For a specific project, only GitLab CI is authorized to provide the secret ID. It's a security measure. So if the project wants a biplan on Jenkins, he cannot retrieve the secret ID. After that, we send some information for the project to retrieve the role ID. We use a specific wrap.
What is a wrap for? Some people don't know that in Vault, a wrap is one usage envelope. So when you receive this, you cannot use this kind of information a second time. So you receive a wrap, you unwrap, and that's all. You cannot unwrap a second time. This is very useful if someone else tries to retrieve this information, or use this information, he cannot reuse it.
After that, we send a specific graph to the project. It is a policy graph so here you can see the paths you have access to for read or write. The other path here is for a development path. So you can see one only path, but there are two other paths from the project perspective, but it's for another token. We have two tokens, one for production, and one for non-production. This is an example path for this non-production token.
Let's see a little demonstration, it's not really a demonstration, more like an example. Let's start with the first step. The project makes a request, we have a lot of information. Here is a test project. The test project wants an environment in infra, on-premises, for the devs on a production environment, and only for GitLab CI usage. And he wants all the path lists on the left, and he wants to add one extra path just for the project needs.
So we make a Terraform file with 'Make New.' Here you have an example gif. We have four commands, just Make, CD, and let's do a VIM on the policy. So it's pretty simple as you can see. This is a policy example for the dev environment we can see. The project has one usage. It's a specific path for the project need. As you can see, we use an extra path. We add only an extra path for date on create. So that's good for the project. It's very easy.
After that we just make a new path. We have some granularity on the project policy, it's very easy. You have some variables with different values, and then we have Terraform TF Vault dedicated to change this default value for the project needs. If you want to have a duration of a token be two hours or maybe more, you can just add this value on the roles, and just set the value you want.
After that, we just git permit and git push on the GitLab CI. Very simple. On this is a pipeline view. So for the pipeline view, we have a test stage here just for testing the script, testing the environment to deploy. The verify part is the most important part, because it checks each policy. So for example, we have some access permissions and some non-access permissions, so this stage verifies just the policy. We have some status, so for the people who know what Vault status is, it's just to see if the environment is up. And the plan on apply bottom.
Then one of the most important stages, the last job. It's, of course, the token revoke. Here I put my ops token, with a lifetime of 40 minutes, and after this job, there are a number of stages dedicated to revoke this token. After that, we make some authorization of the orchestrator. Here I make the new command. And again, we can see a dedicated policy to only print secret IDs for GitLab CI in this project. At the end, we send an email with the graph and some information.
Here, the token TTL is two hours, and again, how to retrieve the role ID, because it goes to ops to send the role ID. So this is for the provisioning part.
Mehdi: Let's talk about how to update or create secrets from the other roles. Some project operators like project owners, DBAs, or storage admins use some tools to generate environments. So it's very important to integrate all these tools, Vaults, creates, or update policies from that. So we send a script for each of the team to integrate with the tools.
It's a script dedicated to Vault just to create a secret. The first part of the script is to use AppRole to authenticate like a project. There are specific policies to create or update a policy for a specific path. For example, DBAs only have access to the DB path. And the last one is of course, the update part of the Vault inside the Vault of the secret. I'll let Julien do the rest of the presentation.
Julien: Now you have seen that we have all onboarding projects with the project path, and with a tool like Terraform for constructing the various paths. I've put all secrets inside a Vault, but after that, we have some secret that we can't manage as ops. So we let the product owner, DBA, storage admin, or any other people that can access this secret, use the UI if there is no capacity to call some script with Vault. We let them use a UI to manage every secret that may be used during dev acceptance and production.
We can have the same path everywhere. For that we use LDAP for the connection. After that we provide to the team, the development team, some template for example, for generating a SSH key, and they are created and stored in Vault. So we have some template to provide this, and we share this with the project.
With the UI, it's a really quick demonstration, but with the UI, it will connect, and after that it will create the specific path that I load with the policy that we have created as we're onboarding. And after that, it can put the key-value, because Vault works with key-value secrets. So on each path, you can have a specific key-value.
So here it set the new path. And after, we have an example script that we provide. Don't care about the name inside, but it's really how to launch the script and store all the secrets inside Vault. After that, it's just the log of the script.
The next step is the project is, there are some jobs dedicated to regenerate the visuals of all paths that we have on Vault. So here we see that the human adds specific keys on the path, so we know when they will originate. This came up with a GitLab CI job or Jenkins. They will receive by email, for the moment it's by email, a visual definition schema of all paths. And after, we can provide this to the dev team to see, ah, okay, my create I'm sure is stored in this.
And after that we have the same architecture on each environment, and we can manage that. So it's really interesting for communication between the ops and the dev team.
Julien: After a quick introduction, the create code is linked to the HashiCorp website about AppRole. Just to explain how AppRole works. Here the admin provides some role ID, and the role name. With this we can generate the new secret ID. After that, we deliver the role ID and secret ID to the application, and with this component, with the role ID and secret ID, the application can authenticate through API or binaries to Vault, with the role ID and secret ID, for generating the new token.
With this token, the application can retrieve all secrets from Vault for the specific environment. So now, just to come back to the pipeline for the project, here we consider that the operator has sent a wrap with a role ID and role name to the project. When the project receives these variables, the team defines this on the CI. After that we can manage some secrets inside GitLab CI.
Here is the screen of secret variables in GitLab. The team has put the role ID inside the secret variable. After that, the project team launches a job. That triggers the building of the image. On this specific script, it's getSecretID. The getSecretID is a script that we make inside and we share on all GitLab owners and Jenkins slaves.
With this script, it's an example of gitlab-ci.yml. We call this script with a role name, and with this, the getSecretID has an orchestrator token as explained Mehdi before. This orchestrator token is really a policy to get all secrets for this whole name. It's really mandatory to have this in order to have segregation between Jenkins and GitLab. So if the team calls getSecretID with the role name of this project, the getSecretID authenticates with the orchestrator token, Vault delivers a wrap with a secret ID, to getSecretID, and the getSecretID exports this in variables.
After that you call the orchestrator, in this case, it's Swarm. So as I explained for the AppRole ID, we define the role ID and secret ID inside the deployment stack. And after, when the container starts at the Docker entry point we provide some scripts to help the team get all secrets and use the role ID and secret ID and authenticate through Vault.
Inside Vault it's a basic command with a binary Vault where we log in with a role ID and secret ID, and we get back the token. After that, with this token we can retrieve all secrets inside the containers, and the application can start. This is the way that we manage secrets inside the GitLab CI and Jenkins.
Now just a quick introduction to getSecretID. We have two methods. It's mandatory to have a rotation of this script because inside we have an orchestrator token. So we have some cron job to do that, and every day we originate a new getSecretID for each environment. After that, some specific ops can originate a new one manually because maybe it's corrupt or, I don't know, but we can originate a new one.
We construct these binaries with the ops token AppRole because it's really helpful for cron jobs. AppRole is for machine to machine. So when it's a cron job, we use AppRole. When it's ops, we use ops tokens. So we ask Vault, "please give me a new orchestrator token for GitLab." And after that, we generate a new orchestrator token, we construct the binaries, and we store it on an extra bucket. Then we share all scripts on each amendment as a GitLab owner or Jenkins. This is the way we implement things for sharing that. Each runner on Jenkins is on Docker, so we have some mounts inside, and we have some on the host, the getSecretID. Now all jobs can directly access this script through the GitLab CI and pull it exactly.
Mehdi: It's very important for us to separate the role ID for the ops side and the secret ID for biplan, like GitLab or Jenkins, for security purposes. In here, as you see, only a pipeline can provide a secret ID, a specific biplan, and ops is authorized to send the role ID. It's the segregation of duty, and an ops cannot generate a secret ID. So only the project side can use the token, so it is very important to us for this side to provide this environment.
Julien: And for the moment, we use this kind of authentication with AppRole because with Swarm as our orchestrator, we can't manage Kubernetes with Kubernetes is directly connected, so we can allow the namespace to get all secrets, but for specific usage, you can use AppRole. Here it's really specific usage, but for ECS and Swarm, we use this kind of authentication for getting all secrets. And with this kind of secret, for example, if you work with AWS or if you work with GCP, you can work directly with the Role IAM for the LDS part, for example. You can generate this and directly put the secret in Vault. Then the application can retrieve all secrets inside.
Mehdi: That's all for us Thank you very much, everyone and see you for the next, thank you.