Orchestration to Delivery: Integrating GitLab with HashiCorp

This session will discuss how GitLab integrates with HashiCorp Terraform, Vault, and Consul.

GitLab enables the efficient implementation of GitOps and infrastructure and code strategies with tools like HashiCorp Terraform, allowing organizations to efficiently and continuously roll out updates to infrastructure with shorter TTRs, Time to Recovery. It does this by enabling teams to leverage several other integrations:

- HashCorp Vault to manage secrets during CI job execution securely

- HashiCorp Packer to build images

- HashiCorp Waypoint for application delivery

The joint solutions developed by HashiCorp and GitLab are helping organizations find a better way for application development and keeping delivery, and infrastructure management workflows in lockstep.

Speaker: Abubakar Siddiq Ango

_Unedited Transcript_

Hi, I'm Abubukar, and in this session, I'll be sharing with you how GitLab the product with HashiCorp technologies like Terraform, Vault, and Consul enable enterprises to maintain secure and highly available declarative systems. But the main focus of this talk will be how GitLab, the company uses these same technologies that is Terraform and Consul to deliver great service to users of, our SaaS offering of the GitLab product. But first introductions, I am Abubakar Siddiq Ango, a developer evangelism program manager at GitLab, speaking to you from The Hague. I'm also a HashiCorp ambassador. GitLab is an all-remote company with over 1000 employees in 67 countries. We pride ourselves as the first... well, probably not the first, but an all-remote company, even before the pandemic started. And we create an all-in-one solution for the entire DevOps life cycle, getting enterprises from idea to production and back all the way in a single product.

But we don't do this alone. We work with other partners through integrations with best in class technologies, like the HashiCorp suites of technologies. We ensure excellence for our enterprise customers and users of our SaaS offering that is by making sure that they can leverage on the awesome technologies like Consul, Terraform, Vault Parker, and Waypoint to make sure they are able to deliver on their product. Enterprise customers of GitLab who deploy GitLab in-house are able to ensure high availability with Consul, especially when their user accounts that exceeding 5,000. Terraform is the main component of GitLab's infrastructure as code and GitLab strategy, enabling users to take advantage of the power it brings to the enterprise. Terraform is a huge, powerful platform that has been instrumental to almost all stories of infrastructure as code. Ensuring secure and efficiency to management is extremely important to the enterprise, allowing companies to be able to prevent privileges collisions and leakage of secrets.

Also securing your CI secrets with Vault from being lifted within CI jobs is also instrumental. GitLab with Vault allows you to ensure that your secrets are safe. But I'm not here to talk about how all these technologies just integrate with GitLab, but how we as a company at GitLab use Terraform and Consul. But first a little background, with over 8 million users, there are lots of activities happening every second that need to happen efficiently and reliably fast, but that wouldn't be possible without a reliable infrastructure.

GitLab is hosted on with a pool of virtual machines for each of the components that make up the application. That's registry to GitLab API, PGS, Git and others. Each group has a load balancer in front of it. One of the largest challenges when growing fast is keeping up with demand, and we've been having quite a lot of demand. When the infrastructure is static, which can be the case when your infrastructure is based on virtual machines, it is not trivial to dynamical group and reduce the size of the infrastructure as a reaction to an external factor, which can be anything which can be load due to an announcement or an issue within the community.

This is a two-fold problem. Serving our customers is not fast enough and infrastructure costs can be high because of the infrastructure size is the same regardless of demand, it doesn't scale up or down efficiently. At GitLab we use Terraform to manage VM life cycle in Google cloud platform and with [inaudible 00:05:07] for ensuring consistent configuration across all of our infrastructure. Auto-scaling VMs with this tools is of course possible. However, in order to accomplish that, we need to write additional tooling, which is often unnecessary that will allow us to scale VMs on demand. Even with the VM auto-scaling automation in place, [inaudible 00:05:31] times of creating new machines and installing and configuring them and everything that is necessary to run our application, will lead to GitLab being deployed or installed in a matter of minutes, which can cause a lot of issues or delays.

But with Kubernetes, both of these problems are addressed on the go. Kubernetes allows us to be able to scale our infrastructure based on demand at the moment, rather than based on the best guessed projections when using auto-scaling with Vms. When the demand is low, Kubernetes allows us to automatically scale down infrastructure, reducing costs instead of... With VMs we are using [inaudible 00:06:22] and the problem we notice load is going down, we remove the surface. Kubernetes is also made to be an [inaudible 00:06:29] backed by vast community. This allows us to leverage industry-standard to, instead of we having to write and maintain our own costumes solution in-house. One of the initial reasons of moving to the Google cloud platform for my initial provider is to take advantage of the Google Kubernetes engine, which was choosing for it to maturity are more other offerings that are available within the community.

After all, Kubernetes came out as a project from Google. I think it was initially called Bug, and has been evolved within Google. We also provide Helm Charts, Official Helm Charts so our customers be able to deploy GitLab at scale on Kubernetes, and in true GitLab style, we believe dogfooding our own Helm Charts will allow us to proactively improve the charts. Our Kubernetes migration isn't complete yet. All of the [inaudible 00:07:32] components of the application have been moved to Kubernetes while the remaining components that handle all our persistent storage are being gradually moved. You will notice in the image on the slide that we still maintain some legacy VM infrastructure. The best part of we dogfooding our own charts is we get to learn new things and experiment even before our customers hit those problems. And we share these improvements with our customers. So this allows them to get best in class solutions.

Now our CI infrastructure is another component of our SaaS service, where users get free or paid minutes to run their CI job. We maintain a fleet of VMs we call [inaudible 00:08:25] managers, as you can see in the slide. This [inaudible 00:08:28] service that run all of our CI jobs and those of our users by creating virtual machines on GCP and take them off.

Everything I've described for all of our entire infrastructure is maintained in a single monorepo and organized in directories that we call environments. This allows our infrastructure team to be able to collaborate using Git work flows. All Terraform workflows are done within CI, which Terraform apply manually triggered ensures that no misconfiguration that's been introduced by any team member gets passed through reviews, it needs to be reviewed. This also allows the whole of our team to see how, and when changes are made, the perform code reviews and review all the logs and those logs are persistent over time so that we can always know what changes were made at what time, and for what reason. With the monorepo and the single CI configuration file, changes are only applied to specific environment in which they are made.

That is when the CI script rolls it checks for the environment configuration that has changed with the new merge, and on rolls the job for that specific environment. The Terraform files are public for infrastructure and the link provided on this slide for you. You can review it and be able to even recommend changes. It's a mirror of our internal projects that maintains the file. So there's no harm that we expect that needs to be done there. Just a bunch of Terraform files that anyone can review and make recommendations. Now, an environment in the context of our infrastructure can be provider specific that, is it can be a bunch of CloudFare rules or AWS configuration or any other configuration specific to a specific provider. Then we have infrastructure environment, which can be our Production, Staging and so on, and also what we call ENV project.

This add Terraform configuration files to manage our GCP projects, along with the IAM permissions, service accounts and other settings needed to maintain those projects aside from the resources created in there. All this makes it easier for our engineers to quickly create ephemeral projects, an environment to test out different scenarios or different issues that might have occured and they need an environment to fully test it out. Now, with a monorepo, drifts are bound to happen due to a number of reasons. It can be probably a [inaudible 00:11:36], but Terraform applies and executed. So what we do is to run Terraform plan with the detailed exit code flag enabled, which checks to see if there is any difference between the production on the new merge, and sends a slack message anytime change has been made, if change has been made and needs to be reviewed. Now, like I said earlier, all of our entire infrastructure is managing a single monorepo, including our database. However, for our database, we'll leverage on Consul to ensure high availability. Service level objectives for our progress database include a maximum of 45 minutes downtime budget per month for maintenance.

That is if in a month we have used up all those 45 minutes, we move our full cost to availability instead of making new changes or pushing new updates to our database infrastructure. Also, a maximum of 200 milliseconds is our threshold for any query that needs to run with that database. Now, all this is possible with Consul ensuring that our database is in sync with the secondaries. Lastly, for our enterprise customers, who have grown beyond just having a single instance of GitLab and have more than 3000 users. Some of them have up to 10,000 or 25,000 and so on, especially in huge corporations. It can be a challenge to be able to keep up with the resource needs and the issues that they need them to work with.

We did an experiment with our support team. We had reviewed different issues that our customers are facing. And we came up with Reference Architectures for... Okay, if you have 5,000 users, this is what we think is the recommended way to be able to deploy your infrastructure. If you have 25,000 users, this is your recommended way. And the core part of Reference Architectures is Consul, which a recommended minimum use of three, they are able to ensure that they have an availability of the infrastructure within that team because it definitely needs to handle a lot of requests from Git pulls, Git push, registry and a whole lot of features that GitLab ships. This is the end of my presentation. I hope you've learnt or picked quite a number of things from how GitLab uses Terraform and Consul. Like I mentioned earlier, you can definitely go through our infrastructure project, see all the ways we've been using...

... we've designed our architecture. And you can even create issues or create mud requests, or make recommendations. Probably there are things you think we need to improve on or we need to do differently. We cherish our value, which increased our transparency. And that is one of the reasons why we decided to make that project public and accepting contributions from the public. Thank you very much for your time. And if you have any questions, you can always check me on my website, or if you want to learn more about GitLab, you can check us on

More resources like this one

  • 4/11/2024
  • FAQ

Introduction to HashiCorp Vault

Vault identity diagram
  • 12/28/2023
  • FAQ

Why should we use identity-based or "identity-first" security as we adopt cloud infrastructure?

  • 3/15/2023
  • Presentation

Advanced Terraform techniques

  • 3/15/2023
  • Case Study

Using Consul Dataplane on Kubernetes to implement service mesh at an Adfinis client