MediaMarktSaturn uses Terraform Enterprise to drive compliant infrastructure consumption across 70 teams. This is their journey to proactive policy as code.
Speaker: Bjoern Jessen-Noak
Hello, everyone, and welcome to my presentation about MediaMarktSaturn's journey to compliance with Terraform.
My name is Bjoern Jessen-Noak. I used to be a developer in ancient times, but I realized very fast that it's not really my way forward. I oriented myself around quality engineering for some time. Finally, I somehow got into system engineering.
Over the last 5 years, I explored a lot of private and public clouds, and I'm passionate about change and new technology. Currently I'm working as a cloud engineer at MediaMarktSaturn. MediaMarktSaturn is Europe's leading retailer of consumer electronics. We have over 1,000 stores in Europe.
We have also a very big tech department. We currently have about 300 developers working in over 70 teams to deliver the best possible IT we can to you on our online shop and also in our stores. A lot of our applications are running in over 100 Kubernetes clusters.
We had very big growth in the last year due to the coronavirus pandemic. What really drives us is, of course, you, the customer, and it's the quality we deliver.
For quality, we want to be reliable and usable for everyone, but we will also want to have this all maintainable and functional on our side. This is one of the biggest drivers we have currently.
We also realized during the last year that rapid change in the retail sector is really driving us, and that there's a need for omnichannel, so that consumers expect the online and offline experiences to be the same.
This is driving us forward, especially our product teams. We embrace sharing and learning together, and we want every team to be self-responsible and to deliver the best quality they can.
What we also realized in the last year is market demand is changing a lot. Electronics retailing is known to be highly seasonal, like Friday or Christmas sales, but the last year blew the lid off the online business.
We had to scale to all the changing demands. Another part we embrace is to be secure and legal and having governance on board and having the best experience. Trust is one of the biggest parts we have to keep up.
Today I want to focus on state-of-the-art technology. When we started our journey, we realized we wanted to go to the cloud and we wanted to deliver the best possible application we can to our customers.
But we also wanted to fit to the market and the market demands. We realized that there was a lack of knowledge, for cloud especially, but also for topics like infrastructure as code.
We had also the premise that we wanted to have whatever we are doing align and adapt to our processes. We want to have cloud for everyone. This is something we started early and we still keep up. It should be easy for everyone within MediaMarktSaturn technology group to embrace and use cloud. To tackle all this, building our first cloud-centric team, we tried to start over and to build one team, building up knowledge and creating the first solutions for our online business.
We moved forward to create infrastructure as code, as good as we could at that time. We created some scripts, we started with Terraform, and we did a lot of trial and error. This worked well for some time. It did its purpose in the first year. But we realized fast that it doesn't fulfill the whole story for us.
What we were especially lacking were speed and reliability. If you have a central cloud team, it's obvious that it's a bottleneck. If you have 70-plus teams, like we have, then all of them have to wait until we provide them with infrastructure that they can use or knowledge that they need.
This all took a long time. We were lacking speed. And the scripting we did was not very reliable in some cases, so we lost some infrastructure due to scripting errors or not reliable scripts.
What we also realized during this time is that we have new use cases all coming around the corner; all the time it's changing. It's not only the tech that's evolving. Even the cloud is fast evolving. When we started cloud, it was already old.
But it's also about the use cases. We started early with virtual machines and Kubernetes, but we realized there's a bunch more use cases we had around the corner: big data, a lot of challenges with network, with security, machine learning, all the stuff we didn't take into account with the first step.
We still had our mantra, that we want to use everything easily. Everyone in MediaMarktSaturn should have easy access to the cloud.
What we did to check out those technical issues was to move to a central Terraform setup. We were already a centralized cloud platform team, but we realized it's not about the platform team, it's about the knowledge and the sharing to all teams.
So we created a central Terraform API on our side to provision all the Terraform code. This had the advantage and the value of giving us one place, which was a homogeneous Terraform environment.
Everyone was using the same Terraform version. We could put the state on a central system. We had security best practices in place. We already started to scale this out. We started with smaller compliance checks.
We also shifted our internal working to provide modules in Terraform. We provided infrastructure blueprints, over 30 internally, to bring our engineers an easy way to provision their infrastructure. Even as we switched to Terraform and to infrastructure as code, we encountered flaws in security and in legal governance.
We realized if the configuration is done wrong, or if you have too much freedom, then it can go wrong, especially if you don't take any measures on it. We also realized that we were not as far with enabling our developers as we would have loved. They still struggled with being fast and having reliable feedback on our infrastructure changes.
We were not fully integrated in the development process at that time, and we didn't get too much feedback on successful runs during the CI/CD pipelines we had in place. We also realized that everything we did was mostly reactive. If something happened, we reacted.
This was true for the security flaws as well as for the legal and governance issues we encountered, and also with the developing part. Developers had to be reacting on something they did unintentionally.
We also realized that Terraform modules are really lost in space. If you create a lot of Terraform modules, especially if you have a very big code base in GitHub, this leads to a lot of lost modules, and it's hard to find them again. What we did to forward on this was, we thought about how we could improve this whole process.
First of all, we had to include all the departments we had left out. We had to include security and governance in our process for providing and modifying existing infrastructure. This has to be done in an early stage.
We also realized that we not only have to include them, but we also have to enable them. So not only the development teams that relied on fast feedback and CI pipelines, but also governance and security have to be part of all this whole workflow.
We wanted to be proactive. The most important point on this is that if you want to shift left your whole development, you need to be proactive. Integrating all parties, proactively enabling everyone to provide policies or to provide checks—the whole process is proactive, so that we don't have to react on incidents or react on misconfigurations. We find them in the first stage.
Roughly a year ago, we made the next step with Terraform Enterprise.
What was surprising was when we asked the HashiCorp sales guy "How does Terraform Enterprise work? What do we need to do? Where can we use it?" And the answer was, "It's quite simple. We deliver you Terraform code." And we were a bit surprised, like, " Terraform code? OK."
We were positively surprised that Terraform Enterprise was set up as Terraform. We have now Terraform to set up Terraform Enterprise to provision Terraform. This was quite interesting and quite strange, but it worked out very well.
And it's all one big piece and fully automated. We are following our guidelines to have an easy setup, a reproducible setup, and it's nearly fully automated, our whole process for the whole Terraform and Terraform Enterprise setup.
We also realized that we are now way more into our direction of a compliance infrastructure setup. As I mentioned before, our whole stack was already centralized with Terraform, but our APIs were lacking some important points. With Terraform Enterprise, now we had the possibility to really have RBAC (role-based access control) in place. We have complete user management. We have workspace management.
One of our biggest issues was displaying users' current states. We were lacking a good overview of this. We have now a high-availability setup. Our API was already prepared for high availability, but we didn't have it fully managed and fully available. Now we have a fully HA setup, and it's reliable. We have a reliable process.
One big point: We love the transparent and standardized processes we have. All parties and all teams and all people can see what's going on in their environment. They see their runs, they see the state, and they see their current status. Especially development had a big boost here.
We embrace GitHub. And Terraform brought us forward with GitHub for infrastructure provisioning. We have also a "single point of ..." It’s really the single point of everything for Terraform.
If you want to audit something or to find out what's going on, we have one central place. We don't need to bother anything. We can find the current state, and it's fully automated, which is the way we want to move on everything.
Everything should be declarative, and everything should be automated.
Our new proactive workflow is really an improvement on all sides for us. We had already the GitHub Actions. They checked all the styles and formatting, and maybe some content checks for Terraform files. But what is totally new is the seamless integration of Terraform Enterprise.
On every pull request, a plan is automatically triggered. We will immediately see if the plan succeeds or if the plan fails. We have now the proactive part for our policies. All our departments, like governance and security and our central platform team, can now write policies as code that define what is allowed and what isn’t.
All users immediately see if their code is compliant or not. Also included is cost estimation. It's not yet where we want to have it. Especially in GCP, it's not showing a lot of resources. But I think it's a good way forward to display and be transparent about costs to all the users and teams. After this Terraform Enterprise check, we go the normal road down for the code review, and what's also a big boost for us is that Terraform Enterprise has enabled us to follow the GitHub approach. It enables us to run an
apply fully automated after a
If this feature is enabled, a team can now run after a successful pull request review, and after a merge to the branch, a fully automated
We have GitHub also in infrastructure. This is not everything. We are still on the road, and the story is not fully finished yet.
I'm really eager for what's coming next. What we are currently facing is that we want to put this to the next level within our organization. We are currently onboarding all teams, so everyone should use it for infrastructure changes on our side.
We are going forward to have this extended policy as code, so everyone should know what is allowed and what is not allowed. We want to extend the policy as code we have.
We also need to train and evolve our teams in using and embracing the Terraform setup, especially Terraform Enterprise. We don't want to lose our momentum. It's really important to keep going and to know what's coming next.
I hope you enjoyed my talk today. I'm really happy to talk to you about our challenges and how we solved them.