Case Study

Elanco's Journey to Hybrid Multi-Cloud with Terraform Cloud

Hear how Elanco went about requirements gathering and executing when building a self-service multi-cloud platform for their developer teams.

Elanco completed a corporate separation, achieving 95% cloud adoption and embarked on a hybrid multi-cloud journey with HashiCorp Terraform Cloud. Hear about the blueprint they created for building a multi-cloud and on-prem centralized deployment automation platform that met all developer experience requirements.

Transcript 

Matthew Bull:

Hi, I'm Matthew Bull, chief technology and information security officer here at Elanco. Elanco is a purpose-driven company focused on animal health. We provide those who raise and care for animals with products, services, and insights across more than 90 countries around the world. Today, myself and Calum would like to share our journey to the hybrid multi-cloud, supported by the team at HashiCorp.

Before diving into the details, let me provide some framing. In 2018, Elanco completed a corporate separation via an IPO, providing us the unique opportunity to rebuild IT from the ground up. We seized this opportunity to modernize our IT ecosystem, achieving 95% cloud adoption whilst embracing modern concepts and techniques such as zero trust.

IT Ecosystem

This architecture slide depicts our IT ecosystem, which supports more than 15,000 users across 65 countries, including in-house research and development, and manufacturing. As highlighted in the top right, we've placed a heavy emphasis on the developer experience, leveraging capabilities from HashiCorp to automate the end-to-end development process.

This approach provides autonomy to the product teams whilst ensuring security, quality, and privacy standards are proactively and programmatically maintained. Our decision to target a hybrid multi-cloud was driven by our desire to unlock innovation and deliver a cost-competitive landscape, as well as a need to support a wide range of workloads.

Hosting Architecture 

For example, at the point of separation, Elanco maintained over 1,000 applications, covering internal and external use cases. These applications include custom-developed digital products used by pet owners, vets, and farmers — as well as commercial off-the-shelf solutions that support specialized instruments for research and development and manufacturing.

Initially, although our hybrid multi-cloud architecture provided a lot of flexibility. It also created confusion and inconsistency when delivering solutions. Therefore, we initiated a project to standardize our developer experience. This is where I'll hand over to Calum Bell, our lead solution architecture at Elanco and thought-leader behind this initiative. 

Calum Bell:

Thanks, Matt. As Matt mentioned, we'd spent a couple years now on our hybrid and multi-cloud environment, and we were very proud with the outcomes that we'd got to. But what we'd missed in some scenarios was our applications and application teams. We wanted to move them to leverage the cloud native capabilities and the true power of the platform that we had stood up. I'll share with you today part of our journey — and our hope is that you can take some of our learnings and use this as a blueprint within your own organization. 

Journey to Automation

As we think about the project that we undertook, labeled Horizon, we've had three key phases that we want to talk to you about. The first is analyze, next was design, and then finally, as you'd expect, automation. To start this process, the first thing we needed to understand was, what do we have today? What are our customers using? What services do they need? And ultimately, what volumes are we talking?

We started off thinking about how many services do we have? We have 1,000 virtual machines. We have thousands of app services and containers. We have hundreds of data services. That didn't give us much detail in terms of what people were using and why. 

It wasn't until we went a level lower with some data mining to extract the metadata from our cloud platforms and looked at the relationships between the infrastructure. It wasn't just about how many virtual machines we had, but how many virtual machines were using databases — were using Identity — how many application services were using Blob Storage.

By getting to that level of detail, we were able to start to map out — as you can see in the top right-hand corner — what these patterns started to look like; common attributes and details that they had.

Understanding Our Applications 

Coming out of this phase of analysis, we came to a point where we had identified four common archetypes. We had simple web, which was more or less a simple container plugged into some common services. We had our complex sites, which were more detailed, had different traffic requirements and different inputs. Then, we also had virtual machines, data, and some items that we couldn't map to an archetype. But we got to 75%, which was a number we were pleased with. 

Understanding Our Users

Once we'd gathered the data from our hosting platforms and mapped that to our archetypes, we wanted to speak to our users. After all, we didn't just want to build what they were building today. We wanted to understand the pain points, the opportunities, and ultimately make sure that they were invested in what the project would give them towards the end.

Application Architects 

The first group we spoke to was our application architects. We asked them to prioritize their top three things they wanted to see improved from their application development process. The first thing they shared with us was there are too many tools. I'm sure we can all relate to this as IT professionals; but if you had all of the cloud platforms under  the sun, what would you choose for storage? I'm sure in this room alone there'd be 10 or 15 answers. 

Next, we heard from them that their dependency on other teams often caused impacts and issues with their agility. They also expressed an interest in learning how to do some of these things right, to be able to unlock that agility. That was important to hear from that group. Next we heard, we're building the same thing over and over again. They always build SSO. They're always building database integrations each and every time they build an application. 

Application Developers

Next, we went to the application developers themselves and said, "What about from your perspective? What are the challenges? What are the issues you're facing?" The first thing that came out was the developer experience. They were having to cut across GitHub, Azure DevOps, the cloud platforms themselves to find all of the information to debug, troubleshoot, and build our applications. And that was causing them a lot of slowdowns.

The next was changes. As a regulated company, Elanco needs to make sure we always understand the who, what and how of any change that happens to a production system. And this was taking time from the developers — to go and document this, add this to records within our CMDB — that could potentially be saved elsewhere. Next, they came back to us and said, "we're building the same capabilities time and time again." They had a different lens, but it was the same intent.

Enterprise Teams

Finally, we spoke to our enterprise teams, who support the application groups themselves. We asked them  What are your challenges? Finally, they came to us and said the teams don't seem to know what they're asking for, so we take ages working through with them — “Do you mean this? Or do you mean this?" 

That was costing them a lot of time and adding days to their SLAs that they wanted to save. They also then shared that, "We want all more automated controls. We want to be able to give people more access, but we need to make sure that they're not doing certain things." Finally, the thing that resonated for all of the groups again was we're building the same things time and time again. 

Now, having got all of that feedback from this group, we needed to validate it. We went through all of our ServiceNow data — about two and a half thousand tickets that were related to the applications. And we started to measure how big were these problems compared to the time taken and the volume of requests. 

By doing that, we got to a sharp priority, that teams were building too much time and time again, we were losing too much time on the developer experience. And ultimately, on average, we were finding that application development had about 25 days of delay and usually ended up involving about five different teams to get it to the finish line. 

What Did We Need To Deliver? 

Coming out of analysis, we discovered two things. We needed to deliver patterns themselves. Something that could be repeatable that didn't just cover the infrastructure or applications. It needed to take into account the end-to-end. 

We knew our pattern needed to have infrastructure modules, application codebase that hit some of the common criteria and services — and ultimately some systems that connected it all. And we wanted to make sure we didn't move away from any of our principles around open source as default, zero trust, and automation everywhere. 

The next half of that was our automation platform itself. It needed to be able to support as many of these patterns as we could throw at it — and it needed to be flexible for all of our hosting platforms. It was crucial to us that we didn't lock ourselves in with X platform or Y. We wanted to be able to make sure this expanded as we did with our multi-cloud strategy.

Our Automation Platform 

Coming out of our design phase, we had five distinct patterns. We knew they worked for our customers, and we knew they hit all our requirements. All that was left was to automate. Now, as we went into this phase, we were very conscientious that we wanted to keep in mind three key things.

The first was our principles. As we look at anything we build in Elanco, we want to make sure it stays true to open source, zero trust, and cloud native. Second was that people were involved in the process. Each of the personas you can see on the left, we wanted to make sure had an embedded part of this.

Finally, the processes: We get a lot of feedback around opportunities to improve that developer experience. We felt very strongly that we wanted to keep them in one place and ensure the processes worked around their workflow.

Our application architects — that you can see in the top left-hand corner of the screen — can go into ServiceNow, which is our source of truth for all application data. They can now request one of our patterns. They'll select the parameters they're looking for, whether it's a large database, they need SQL or no SQL, or they just need private networking. That then gets passed to our orchestrator. That really is just calling a series of APIs and tracking our jobs end-to-end. 

There are really two systems at play in our end automation stack. The first is Ansible Tower, which we used to set up a series of dependencies, things like security groups and others because it already has plugins to some of those services.

The next is Terraform Cloud. And for us, Terraform Cloud is the machine that builds the machine. The master workspace that you can see in purple is a great design pattern because what it does it provisions all of the other environments, the GitHub repository, the dev, test, and prod workspaces — and then also all of the access and secrets required for the application to be built with SSO out-of-the-box, database integrations, etc.

That means the application engineers and the infrastructure engineers have one place to go when this process is finished to write code — and whether it is application code, infrastructure code, or access code — it is all provisioned once and then follows a series of consistent CI/CD pipelines through to the target.

We then secure those pipelines using Sentinel, which is our infrastructure as code linting tool, which ultimately prevents the teams from overstepping or opening certain ports — and fundamentally underpins our security profile.

In Summary

This is the blueprint of the journey we've been on. 

Analysis

We went through the steps in the analysis phase not to just look at the services that were in use but also their relationships and the underpinnings of the metadata.

We then also went through and defined our principles. What did we want to go for? Was it partial automation, or was it full? Was it zero trust? Was it open source? And the outcomes of this phase were identifying some potential targets, those four archetypes of VMs, web, complex web — and also getting all of the feedback and opportunities from the users, which ultimately drove us towards prioritization throughout the project. Then finally, the piece that helped us validate this was using data from our ServiceNow platform to improve how many days and hours were spent on each of these tasks, so we could prioritize accordingly. 

Design

Next was our design phase, where we took the outputs of our analysis, and we tried to map that to a series of artifacts. We did that through design sprints, where we brought our customers and engineers on that journey. 

The real value statement of that exercise was the manual build that we did. Sitting down with our customers, showing them what the outcome would look like — and ultimately getting sign-off that this hit the requirements and did what it needed to do. 

The outcomes of this were the five patterns. We ended up with the designs, codebases, and infrastructure modules that were required to stand this up. That gave us a real clear focus for what we then needed to automate and the tools we needed to choose to do so. Then finally, it validates our developer experience, which was a common piece of feedback that we'd heard from our engineers, that we wanted to keep front and center. 

Automation

Next came automation. A big part of automation for us was identifying and testing the factory design pattern. This allowed us to provision nearly all of the environments and dependencies from an application in one place. It also kept our architecture very simple and made the engineer's lives much easier understanding the very clear positioning between the tooling that we had.

This ultimately made sure our developer experience was consistent, the access given to teams automatically followed our zero trust profile, and fundamentally, the processes that we'd stood up were based on the personas that were going to use this.

Where Next? 

Elanco has been on a journey for the last six months, moving from a heavily manually provisioned environment to one that is fully automated. What originally took us around 25 days takes us just 25 minutes. And a number of processes within the developer experience have been now standardized — things like changes, GitOps, and the workflows that we use. 

But we think this journey needs to continue. We've identified huge opportunities, whether it's with new patterns, new deployment targets, or even just looking at onboarding our existing applications into this model, using technologies like Azure Terrafy. 

We would like to thank our partners at HashiCorp. The team here at Elanco, as well as our partners, Brodeur, for all of the work that's gone into these last six months. We hope this presentation has been useful. 

Thank you very much.

More resources like this one