Learn the three keys to the National Australia Bank's success in cloud enablement at enterprise scale using Terraform Enterprise.
Hi, everybody, and welcome to HashiConf 2020. My name is Andrew Brydon. I'm here to talk to you about building the NAB Engineering Foundation with Terraform. I'm a distinguished engineer for National Australia Bank. I work extensively with our IT and engineering teams, helping them build standardized solutions for our software delivery environments.
I wanted to start with a little bit of background around NAB. We're a 160-year-old company. We have an extensive workforce here in Australia. We are the number one business bank, and we service nine million customers. In our technology fleet, we have over 2,100 applications. These are supported on 11,000 servers — so an extensive technology footprint — which requires automation and orchestration to manage it in a streamlined and effective way.
Part of the journey we've been going on in the last few years has been looking at how we transform this technology estate and move effectively and efficiently into the cloud.
That has been underpinned by our cloud-first strategy. We have been moving our applications into AWS and Azure and looking at how we can do this in the most efficient, automated, and streamlined way.
We'd been facing a number of challenges with our on-premises and existing deployment approaches. The organization was looking at do how we become more productive, how do we support our business teams, and streamline our delivery of features to our customers.
Our move to the cloud was a conscious one to have the cloud providers do that heavy lifting around our infrastructure and support us looking at building our applications in a more cloud-native and easy-to-deliver way. We wanted to look at how to take advantage of new and innovative techniques such as AI and machine learning to support this application delivery process.
Part of the work that we started doing around a year ago was looking at — and building on top of — the application migration work we'd already done in cloud. NAB has been a cloud user for many years, and it was a first mover into AWS around 2014.
We had seen this explosive use of cloud around 2018 when it had become a key part of our strategy. We've seen lots of teams go into the cloud and start to make the most of it — really embrace it. Over the following 12-18 months, we had also seen that the teams had done an amazing job of building and automating things within their teams. But there was not enough reuse of that innovation across the organization.
We at NAB decided to look at how do we standardize some of these approaches, make them effective across our hundreds of squads in the organization — and use that to improve the velocity around delivery.
We came up with this model called the NAB Engineering Foundation. It's based on three pillars. We have created a standard technology capability which is underpinned using Terraform Enterprise — so TFE to standardize the modules that support our infrastructure deployment.
On top of that, we created a standard pattern for software delivery around containers and serverless in our organization. We use this all wrapped up in a standard pipeline where we could integrate all our compliance and security elements and do the heavy lifting for our engineering teams, and support our deep security and compliance needs as a heavily-regulated financial institution in Australia.
To support this, we have two other key pillars. We created an educational component. Anyone who's an engineer at NAB comes into our organization — as soon as they sign up to GitHub for example — we'll invite them into a bootcamp. We'll take them through these standard software delivery techniques and educate them on the components that make up the software delivery platform. They can hit the ground running, and we can dramatically bring forward their ability to be productive in the organization.
The third component is how do we take this capability we've built within a couple of teams and ensure that any team can create innovative capabilities and have them adopted centrally. We use a method we call in the organization — innersource.
This is an open source model within our structure. We use GitHub — we have the components that make up our net capability available via this GitHub organization. Teams can contribute updates and fixes into that capability as they require — maybe as they need features for their teams to support delivery. That means we have a standard central team that's building capabilities all the time, but that is massively enhanced, expanded, and distributed by having any team across the organization able to contribute to it.
We also looked at how we improve onboarding experiences and automation — every single component along that journey around our SDLC — and made it as streamlined and easy as possible. Looking at — and thinking about — how do we support that outcome of having an engineer able to promote code from ideation to production in a day; that's the key outcome and objective that we're striving for here.
I mentioned Terraform — this is HashiConf, after all. I've been working with the HashiCorp toolset for many years. It's industry-standard; we have a multi-cloud strategy ourselves. It helps us standardize on a way of describing the infrastructure modules that we use to underpin all of our deployments into cloud. It's a first-class citizen around how we manage and implements the NAB Engineering Foundation.
We use Terraform in our pipelines. The infrastructure modules are deployed via Jenkins with a Jenkins Templating Engine to manage all delivery and updates into non-production and production environments. Then we used that innersource approach to — first of all — create any new modules if teams require them. Then say if there's an upgrade required. This happened to us recently around Azure. We needed to go to version two around the Azure platform — we had 40 modules that we needed to update quickly. We had that distributed across many teams, and we were able to do that update in days. So really supporting a cloud, and distributed, and agile delivery model — cannot emphasize enough how important that is.
TFE also gives us some other things apart from that standard approach to infrastructure delivery. It helps us in our compliance journey. We have to be face-in to compliance and be heavily regulated. We use Sentinel policy as part of our management's capability in TFE to identify and ensure that specific elements are always on or off in terms of our infrastructure deployments.
For example, that could be that we can only ever deploy into the Sydney region. We are based in Australia, so that's something that we might want to do from a data sovereignty perspective. That's why we are using TFE in our platform to support the infrastructure capabilities.
We are also in the process of implementing some other capabilities from HashiCorp around HashiCorp Vault to help us standardize our interaction of some of our service account interactions. That's machine-to-machine type secrets enablement that occurs. It needs to happen frequently and has to be automated when you start thinking about using CI/CD as part of your everyday standard deployment approach.
Also, when you ramp up your delivery capabilities — so you're deploying many times per day — it is key that all of that toolchain is automated end-to-end, and you have a supply chain mechanism that's able to support it.
We use TFE for state management. We have a tie-in to GitHub — as I previously talked about. That allows us to support that innersource capability and allows us to scale and distribute out to hundreds of squads.
Policy as code is key for supporting our audit and compliance obligations. TFE allows us to have that central reporting capability around what is going on at the infrastructure layer with the cloud, and provided evidence points back to our audit team. It also allows us to hook into service now — and provides that clear chain of evidence between the change being raised in our organization and an infrastructure component being deployed — so we can trace all the way through every element. That's another thing that we have enabled as a standard option for our developers with the NAB Engineering Foundation.
The last thing here is around workspaces. Every application has a workspace associated with it. It allows us to have a standard way of supporting non-production and production environments — and allows things to be easily be cloned and copied as required across those non-production environments, especially.
We have enabled teams to go faster. That's one of the key things that is an outcome for us in terms of NEF — as we call it. Productivity improvements help every bit of the organization so we can get outcomes to our customers faster.
We make the lifecycle management of the development platform and capability easier for all teams. This is important because lifecycle management can be hard and can take up a lot of time. We deliver NEF to our customers internally as a software product. The teams get point releases and the upgrades every couple of weeks. They can simply put those through their CI/CD pipeline and perform the upgrades and they get the benefit from all that innovation — that I've talked about — from across our organization.
The collaboration element is phenomenal as we scale out this approach. We see with every release of our product that we're now up to about 50% of that release being from contributions from external teams. As we distribute more, we get more benefit from this within NAB.
Thinking about the journey, the approach we took with going to cloud was absolutely a great way for us to learn. We had been in a scenario where top technology had to fundamentally change to support the NAB business. Educating, training, and making these cloud champions with people in our organization bootstrapped us. It made us leap forward in terms of where we are as a technology organization. We needed to do that to get to where we are today and build on top of that; that's why we created NEF.
I've talked about compliance, but that compliance element is a key part of our delivery stream. By automating that for the teams, we take that undifferentiated heavy lifting out of their hands. We build it in as a standard capability so that — although they have to think about it — they know it's there. They know if they use the platform they're compliant against the various standards we have in our organization.
In nine months, we've shifted 300 teams onto this platform, deployed around the same number of services — and we're building on that every day. We work with the teams, we understand the features and things coming next for them that they require, and we create a roadmap with our customers in the organization. We have constant feedback loops that are ensuring that we're building and supporting what they need.
Our bootcamps; we've grown that to have 1,100 people go through them in this financial year. We're looking to grow and expand on that and move it forward. Innersource — that's been a key enabler for this capability. Going forward, I see this as a key enabler for building other services within the organization — and taking those further — as an absolutely different way to work for developing any software that any of the teams require in the organization. That's a key building block for us as we go into the next financial year and think about how we can deliver and develop faster.
I'm going to finish off here. Personally, this has been a fantastic journey. It's been supported by our leadership team in our organization. They've been great supporters of cloud. Moving to cloud, I see this as a key enabler of working in the cloud. And HashiCorp, and Terraform, and the other tools that they provide are absolutely the key enabler — and underpin what we need to do as we move forward with our next generation technology capabilities.
I wanted to say thank you very much for watching — and thank you, and goodbye.