When AVL wanted to refactor their automotive engine testbed desktop application to run as a cloud app, Terraform helped make it happen.
Hi, I'm Patrick. Thanks for being here. It's an honor to be here and to give a presentation. Thanks for already introducing the topic. It's a cloud-migration adventure, including all the obstacles — or at least most of them — and some problem-solving approaches.
I also would like to explain how we used Terraform. I'm a DevOps engineer at AVL. I will explain the company later on. I try to make meaningful contributions to the community by giving workshops and organizing meetups — and I like to be at conferences. As you probably may notice, I'm from Austria. It's a country between Italy and Germany — and we use some German, almost Bavarian — so I think it fits good to be here.
I would like to start with a short introduction about my company and the legacy-software application — the main actor during the cloud migration journey. In chapter two, I would like to describe the initial situation to you. What did we do before We tried to do some stuff in the cloud?
Chapter three is about the cloud migration journey in detail, including obstacles I mentioned. At point four, we proved that it generally works, but what are the next steps? Finally, at point five, I would like to explain how we use Terraform and which benefits we could derive during the cloud migration adventure. The session ends with a conclusion.
Let's jump into the topic. You've probably never heard about AVL. Our business is in the automotive industry. We are located here in Graz. Who has heard of Graz before? Oh, quite a lot. Who heard about the Terminator? This guy was born in a little village northwest of Graz, so he’s probably our most popular living person, and this description always helps.
As you are aware, the automotive industry is in transition. We are also trying to focus on new areas, including eFuel, electrification of cars, autonomous driving assistance systems, etc. Probably our most well-known product is an engine testbed. You will see later on what this is.
Finally, to conclude the introduction part, may I introduce you to AVL CONCERTO. It was established almost three decades ago. It's a Windows application — a desktop application. And also, it's a legacy application.
In 2014 we introduced managed codes, put a REST interface into it, and added Python support. Finally, we got the message: It should also run in the cloud. To be honest, we didn't expect that. So, the key message here is, it’s quite an old software application. There's a huge technology stack in it, important functions that the customers rely on, so it's difficult to adapt there. We got the message — it should run in the cloud — why not?
What did we do before considering any cloud actions to get a better context? That would be an engine testbed? You put a combustion engine on it and run several tests to get data regarding the emission, speed, etc. In the end, you get a bunch of data, and after you have the data, it's CONCERTO's responsibility to do some post-processing of this data.
As you can see, you can load files. You can put the data in it. And by running specific functions, CONCERTO creates a report which includes diagrams, tables, etc. That would be the final result for all the huge car companies, and it looks like that. You probably wonder — it's working, why should we do any actions in the cloud??
Why did it happen? Again, our traditional use case — engine testbed, combustion engine, we put our unit on the test. A guy comes into the company and runs the test. Let's name him Herman. Herman runs a lot of tests against the engine testbeds. After eight hours, the day’s done, and you get the results. But times are changing, and now we have the following situation on the right side.
Those guys you see here are called battery testing chambers. One of those battery testing chambers contains several batteries. Each battery contains several modules, and each module contains a lot of test cells. If we now compare the amount of data we get from one combustion engine —After a test run —it's almost the same as from one single test cell.
This means a person who runs tests against the combustion engine wouldn't be happy to do it 1,000 times. I don't know how many times against one test cell. This forces us — and this was the key, the main driver — to introduce automation and to scale CONCERTO. We immediately needed several thousand instances of CONCERTO. According to scaling, Kubernetes would work there — but why should we go into the cloud?
We got some requests from customers to know whether it will run in the cloud because they would also like to migrate to the cloud — and therefore, it was quite simple for us: Let's try to provision all the stuff which we need to scale CONCERTO. Whatever the solution will be, let's try to use the cloud for that. There somehow, the pain but also the adventure started.
The first thing from a few colleagues for me was when they heard news that we should migrate to the cloud. There’s not a chance. Let's try to rebuild the whole stuff. It's 30 years old. Well, let's throw it away and let's try to start from new — but the problem was at that time, we didn't know which components we should rewrite because we have several million lines of code and 30 years.
Therefore, first we have to agree on a cloud migration strategy. Shall we refactor, rebuild it, or try lift and shift? We had great discussions, and we chose to lift and shift because at first we didn't know where to start, to be honest. Will it ever end at the legacy software application? And we wanted to see the big picture. What will it look like, and does it work anyway?
The next great thing was to our luck. At that time, we already had a containerized version of it. But it's a Windows container as we had a Windows desktop application. Therefore we already had a kind of starting point. So, we decided to make a lift and shift, put the application as it is in a Windows container, and let's hope for the best.
We thought we don't want to invent anything. We want to go according to best practice, and on Microsoft, we got a great solution idea. But the goal should be to implement something like that. So, to any kind of resource capable of scaling CONCERTO — those are those orange icons there — to generate the report. And the data we get out from a database which could be in the managed-cloud service or outside because of some sensitive data. Not all our customers are happy when specific data are inside the cloud.
We focused on that architecture. Triggering the calculations from the message broker, trying to scale CONCERTO, and generating the very same reports which we did in a manual way. The solution idea, according to Microsoft, looked like that. We have the existing application — that will be CONCERTO. We put the corresponding container image to the Azure Container Registry. And by using the dedicated CLI commands, we applied the workloads on the Kubernetes service — which runs CONCERTO, which generates the reports. We get the input here from there from the SQL database, and that fits.
We thought it doesn't look that bad, so log into the HR portal and try to get it done. We provisioned the Azure Container Registry, the Kubernetes Service, and virtual machine. We use that for a license server so, CONCERTO just starts up when you grab a corresponding license feature from a license server. Let's provision an Azure file share to put the repos on it and the container should run and trigger the calculation inside it.
It's one thing to provision the dedicated resources, but they should work together. And at which point do you think we had some obstacles and went into trouble? I solved it, so it's almost everywhere. We had problems between the AKS and the container registry. We didn't know that we had to attach it properly. We were quite new to it.
We couldn't derive a container anyway on the Kubernetes cluster when that was working because it's a Windows container workload. So, we thought, let's add a Windows node to the AKS, then it should work. But it didn't because if you don't tell the workload that it has to be scheduled on a Windows node, then it doesn't do that. Also, the Windows node would be there, and we didn't get it scheduled and were wondering why nothing happened. So what's wrong with you, AKS?
According to point three, we set up a license server for the first time, and there we ran into a specific, classical IT, sys-admin problem. We forgot to set up the proper firewall rules. We, as a developer, provisioned the virtual machine. An IT colleague could tell us we need the firewall rules, please set it up. Probably we didn't think about it, to be honest.
We didn't even reach point four because we struggled at the point before. So we don't know whether there would be a problem. But the problem here was — from my point of view, in my humble opinion — with almost doing nothing before, according to the Azure portal, we tried a quite complex solution, including an AKS. So, a better approach would be to simply think of the easiest way to get the container running in the cloud. We couldn't figure it out, so we stopped that approach and searched for a different approach — and we recognized serverless services.
There exists an Azure container instance. The great thing is if you already have the container registry with your image on it and try to deploy an Azure container instance, you can choose the image you would like to have from the container registry. You don't have to take care of point number one and number two to get it scattered on the right infrastructure — in a Windows node — because the container instance manages that for you.
Therefore, we could see that CONCERTO tried to start up. We still had the problem at point three, including the firewall rules. But we could see that CONCERTO was starting up, and we had hope for the first time. We could figure out that it was about the firewall rules, and CONCERTO was running for the very first time in a container in the cloud.
We proved that now the other things should be about the infrastructure. Because before, we thought it's something wrong with the container anyway, so we could solve that issue. So, just two problems left — and after a while, we also figured them out. And here in this short movie, I would like to show you what the automated use case in the cloud on the AKS cloud looks like.
Here we have, the share. It's now empty. That's the container registry. Here we move back to the Kubernetes cluster. Now we apply two different workloads almost at the same time by kubectl apply and using the corresponding YAML file. Then we can see that two different workloads are running. Inside the workloads, we now run the ports. Inside the ports, the container which triggers a calculation for CONCERTO.
Two different calculations are now triggered. Here to share, you can see two different directories are created. Inside those directories, we will get the same report we would create when we do it manually. This worked fine. After a while, we can also open the PDF. Let's also download the second one, close it. Let's also check that the workloads are completed. So, when CONCERTO stops the calculation, the workload is completed, and as you can see, we get the same report. This proved that it generally works, and we were glad about that. But we were quite sure that the solution on a Windows container may not be ideal.
This brings us now to the next steps — what we're going to do in future. We've seen that before. The two options — refactoring, rebuild versus lift and shift. We had lift and shift and thought we could figure out which components we really needed for CONCERTO to make it performant to run in a container on the cloud., We were quite sure we should get it ready for Linux.
Therefore, the next thing is to do a refactoring and rebuild. We pick the stuff we need for the automated use case on the AKS, and we are going to create some kinds of new products. But before that, we have to throw away a few things.
That will be the structure of CONCERTO on Windows. We have a managed use interface, a native use interface, and native data management subsystem, Python, COM, and REST — and that is it on Linux. So we throw away a lot of stuff, we don't need that many components for the new use case.
It will end up turning into a new product. Amazingly, instead of having one single desktop application, we will now have this Linux container where the refactored components of CONCERTO will be run.
In addition, inside Azure, we have the AKS, Prefect for the workflow, KEDA for the autoscaler, ClickHouse. We will have a MySQL database, Postgres, Elasticsearch — and, of course, we provision the resources with Terraform. That's some kind of amazing transition we are allowed to experience. It's still a great adventure, and that's the plan for the new product. We are looking forward to it.
This brings me to some benefits we could figure out by using an infrastructure as code approach, so by using Terraform. I experienced personally that it was great to provision the resources which you need on your own. Before that, I had to create a ticket for IT: Please create a Kubernetes cluster, and then you wait for it. Some settings probably don't fit. Then you have to recreate the ticket — and nobody wants to create tickets — I don't like it, and it should be banned. Therefore it was great to have more power — to not wait for an IT guy to do that. You could do it on your own, so empowering the developer.
I think I also don't have to tell you the next benefit. But especially in our use case — or our proof of concept — it was great to versionize the different things we need. We did a lot of adaptations to the Kubernetes cluster to scale up the nodes, speed up the calculations, and get as many parallel calculations as possible. Therefore, it was also great to versionize the Terraform configurations.
We are using Azure DevOps for CI/CD. There's also great support for Terraform if you use the classical editor, but also if you would like to create the pipeline within YAML. The cool thing is anyone can trigger a pipeline, so there's no excuse for anyone if something needs to be destroyed.
Also, some colleagues still don't feel that comfortable if they need to touch the cloud resources. We tried to get most of the stuff done in Azure DevOps release pipelines or built pipelines to increase our automation.
I don't think I have to tell you that: To decrease the costs. We had a time when we did all the stuff in a manual way. We were glad we created a Kubernetes cluster, which was working. Nobody wanted to touch it — nobody wanted to destroy it, but we need to do the adaptations.
Unfortunately, the costs increased, and some line manager maybe was yelling at us that we had too high costs —and Terraform helped there too. Also, some great free tools already exist to get estimation within Terraform configurations — to estimate it in a better way. That was great too.
We continuously get customer requests that they want our products migrated in different cloud platforms. We always tell them, let's do that for you, but we would like to do that with Terraform. We also ship the configuration files to you because most customers currently have their own subscription. We try to get it done in Terraform — we provision it, and send it to them to try to rebuild their whole environment, which they need. This helps a lot.
This finally brings me to Outlook, respectively, to a roadmap. It all started in 2019 when we did the very first containerization of CONCERTO on a Windows container. We invested a lot of effort in 2020 regarding CI/CD — so building the container image and deploying it in an automated way on an on-premises Kubernetes cluster. Finally, in 2021, we started with the cloud migration journey — we started with the proof of concept and proved that it generally works.
In 2022, we had to focus on getting CONCERTO ready for Linux and try to focus on the parts which we really needed for the automation and the cloud version. Finally, this year we will release a new product — a real cloud-native solution. We are looking forward to that, and we are curious.
We had a lot of discussions to find a proper cloud migration strategy, to find a proper one. Probably, one of our biggest obstacles was we didn’t find the easiest way to start with — we could have started more lightweight, then we would probably have been faster.
So, maybe not starting with a Kubernetes cluster when you don't know anything about it. Maybe let's try it with a serverless service. Of course, using an infrastructure as a code approach, Terraform helped us a lot — and I couldn't think of any other options anymore, to be honest. Trying to increase the automation by doing the integration in pipelines as much as possible.
Maybe some things are quite familiar to you. Maybe you're more expert than I am, but I hope this was a meaningful talk for you. I'm looking forward to more talks with you. Thanks a lot for being here and hope you enjoyed it. Thank you, thank you so much. Danke.