Case Study

Vault in BBVA, Secrets in a Hybrid Architecture

Hear how Vault is used within Banco Bilbao Vizcaya Argentaria's (BBVA) hybrid architecture to deploy banking services in both Spain and Mexico.

Hear about Vault open source and Enterprise usage within BBVA's new hybrid architecture (on-prem & AWS for core services).

What You'll Learn

Learn how and why they transitioned from OSS to Enterprise to adopt BBVA's required hybrid architecture in both Spain and Mexico. And see why Vault has become one of the most widely adopted services at BBVA.

Transcript

Hello everybody. Thank you for coming. My name is Ignacio Fuentes-Villegas, and I come from Spain, from Madrid. I work for BBVA, and I am the leader of the cryptographic team.

This is the agenda. First, I'm going to introduce who BBVA is and what we do in the cryptographic team in the holding area. Secondly, I am going to explain the different Vault installations we have in BBVA because we started many years ago with the open source Vault installation. I am going to explain how we had to expand our Vault installation to when critical applications came and needed to constantly use Vault for reading and writing secrets.

I am going to explain what a hybrid architecture means for us in BBVA — and end my presentation by explaining a very simple use case about the main BBVA mobile banking application using Vault constantly — not just for storing passwords, for authentication — but executing critical transactions and using Vault constantly with very high availability. And I'm going to speak a little about the future of introducing HSMs, KMS encryption, and other things.

About BBVA 

Who we are BBVA. BBVA is a large group of banks with a presence in more than 25 countries, with 6,000 branches, more than 100,000 employees, and more than 80 million customers around the world. 

We are the first bank in Mexico. We are the third bank in Spain. We are also in many countries in South America and Turkey. I'm not going to explain the numbers because I'm going into a tale about some chats about real installations we have evolved for the open source and enterprise version. 

BBVA is very good in digital services and innovation within the banking industry. In fact, we have had the best mobile banking application in Europe for five years in a row. Because of that, our environment is very demanding in terms of availability and performance. This is the environment where we are using Vault for all the applications around the globe, for all the banks of BBVA. 

Cryptography at BBVA

What do we do in cryptography in general? This is a very basic example, but we are using keys and certificates like many other companies to identify, authorize, or keep our communications secure. We are using cryptography in many use cases; ATMs, payment cards, point of sales, the mobile banking application —even in smartwatches. And internally employee devices, servers, branches, and connections with all the providers we have.

Crypto is very important because we're implementing several trusts inside BBVA; we are using Mutual TLS for all the connections inside it. And in particular, in the holding department, where I am because I am leading a team in Mexico and Spain, which is global, we are deploying the main components of cryptography to all the countries, which is part of the security architecture area.

We focus on three different things. We divide our work into three main different topics. The first one is data encryption and data masking, covering encryption, obfuscation, and organization. The second topic is doing certificates for identities, servers, or even devices — thousands of certificates, external and internal.

The third topic is secrets. We provide tools to applications so they can read and write the secrets they need. This is where Vault is used. BBVA is such a large bank — we have many departments, even my colleagues in security architecture, are using Terraform, Nomad, Vault open source for the microservices installations. But what I'm going to talk about in this presentation is the core banking services that are executing transactions by the thousand. 

Secrets

We have two big installations of Vault in BBVA, which are deployed globally. The first one was installed many years ago. It is based on the open source version, and its main users are Jenkins and Ansible. 

The second installation is based on the Vault Enterprise installation. We call it Vault Runtime because it's used by runtime applications. Its main users are the mobile application, the desktop application, batch applications are executing batches with millions of transactions each second. And even other types of applications — data science applications or any application in BBVA that is running critical transactions — can be integrated with Vault Enterprise.

Hybrid Architecture

Before explaining how we installed Vault open source and Enterprise, I would like to say that in BBVA, we are always trying to compute hybridly. What it means for us is we are executing our applications on both sites — in our on-premises prime datacenter and into the cloud. 

So, we are able to — for example — execute 50% of the traffic with Amazon Web Services and 50% on-prem datacenter — having the same applications running at the same time.

Because of that, our network department built this kind of network architecture. We have four regions. The first region is based in Spain, the on-prem datacenter. The second region is in Ireland, in Amazon. The third region is based in Mexico, and the fourth region is based in Amazon in Virginia. Of course, we are using all types of clouds like Google or Azure, but Amazon Web Services is the cloud that is being used more for the core banking services.

Each region is connected. We have several rings — work, live network, and central — so we have visibility from each cluster. In fact, in each region, we have a very big installation of Vault Enterprise. 

Spain and Ireland are synchronized — as I'm going to explain later — with the talk on Vault Runtime, Mexico and Virginia are also synchronized. We decided not to just have one big cluster of four regions connected. We wanted some isolation between Europe and America because we didn't want the data to cross between geographies. 

Also in terms of support,.for example, the supporting teams for these installations are based in Spain and Mexico. They don't have the same change management procedures — the time is different. So, we decided to have two big Vault installations — one for America and one for Europe. But they are all the same as I'm going to explain.

Vault Infra 

In this picture, I am explaining how we have the first Vault installation in BBVA which was installed many years ago — one of the first real Vault production installations in Spain. You can see we only have one cluster in the on-prem datacenter, which consists of three nodes with a Consul storage backend. And we only have one cluster in Amazon.

We upload our backups to the cloud, so we are able to switch on the recovery system if the main datacenter goes down. Why are we doing this? We don't have the possibility of synchronizing data because we have the open source version. 

This installation is only intended for building the infrastructure. Because of that, we call it Vault Infra because when you are deploying machines or you are deploying products, you need the secret in that moment — then afterward, once the secret is pushed to the machine, you don't need it anymore. So, it's a very basic installation. We don't need anymore right now. 

And as I explained in the picture, we are using Ansible and Jenkins — just regular utilities, not very complicated, to deploy or build machines in this core banking services environment.

We are using DNS load balancing for traffic detections — so even when we have to direct all the requests to the cloud, we have to manually change the DNS entry, so we redirect everything to the cloud. 

And we are leveraging F5 devices for load balancing. This installation is enough for deploying the infrastructure but is not enough when you have a critical application that needs very good availability and performance — as I'm going to explain later in the next slides.

 

Vault Evolution to Enterprise 

A quick recap: The first Vault installation is the open source version and is only for building new machines. We don't have the possibility of data synchronization because we have the open source solution — and the performance also is not enough. 

In our tests, we got only 500 transactions per second. With that kind of performance, we are always worried about new products coming to double installation if they're using Vault every time. Of course, if you need zero every time, it's very frustrating. 

We also wanted isolation between the underlying infrastructure, which is only needed for deploying new machines or new products — or just executing operations like restarting machines, or doing anything with the machines — some kind of isolation between the overlying applications. So even if the tools needed to deploy the infrastructure are down, the above applications are still running because the machines are running without any problem.

Vault Runtime

It wasn't enough for us. We had to expand the solution. This installation is designed or validated by HashiCorp with the help of the customer success support. We implemented this last year, and in several months we got it in production in Europe and America — at the same time, more or less. 

We have four clusters in Europe and four clusters in America. This is depicting what we have in Europe, but we have the same structure in America. This is one of the best advantages we have: That we were able in BBVA to have the same everything — network, machines, operating systems, versions. Everything is the same between America and Europe, which is easy for us because if not, it would be very, very difficult to deploy architecture. 

It's one of the clusters. We have two main clusters with five nodes using the integrated storage backend. We are using F5 load balancers, so we are able to redirect the traffic to the five nodes. Not with the open source version because you only have one main node, and you cannot redirect to the other clusters. They are just sleeping. 

And with that solution, we are using the five nodes fully, and we have the same in Amazon. We also have disaster recovery, so we maintain the capacity of the transaction-per-second we have even if the main datacenter or the cloud are down. 

We are synchronizing and replicating everything. So when an application is writing a secret, it immediately goes to the cloud. We are using geographic distribution of traffic with the DNS solution, so we are able to detect where the requests are coming. And if the request is coming from the cloud, it goes to the cluster in the cloud — and it happens the same in the datacenter. 

We even have the same DNS name server for the two clusters, which is pretty convenient. If you have to configure client applications or other types of solutions that don't have this possibility, you have two endpoints for the implementation — you have to implement on the clients side the two different endpoints to redirect the traffic yourself. But with this DNS functionality, we have the same name for everybody.

We are doing this because in the hybrid architecture of BBVA, we are always trying to avoid crossing traffic between our on-prem datacenter and into the cloud. Why? Because we have limited bandwidth. It is not unlimited. 

Everybody is crossing traffic for uploading backups, or trying to connect up databases. But when you have the possibility in this architecture, you avoid crossing traffic because of that. And it's a simple installation for us. If you have all the components in all the layers, crossing traffic it’s going to be very complicated. 

You have to be sure the cloud environment is working properly. You have a real backup or an alternative to your on-prem datacenter. Its main users are the core applications of BBVA, the front application, and the backend applications — as I'm going to explain later in the use case.

Another recap, we have the Enterprise version used for critical applications, and it's already deployed in Europe and America — working in production for all the countries with an Active/Active configuration. It’s important for us that we test and we log 10,000 transactions per second for just one cluster, which is more than enough for running applications right now

If we need more, we could add other clusters, so we are improving in this kind of performance. Lastly, but not least — automatic failover. In case the datacenter or the cloud goes down, we have the servers restored in 0-10 seconds — and automatically. 

Vault Runtime: Deployment

Let me explain how we installed this Vault Runtime in BBVA. The advantages that we have in BBVA depend because we are not using native services in the cloud —for this core banking, it’s a service. Because BBVA is so big, we are using anything from any provider. But it's this kind of installation — very simple for us, but with a very large number of machines. We will have in the front applications, like 300 machines, and in the backend applications, like 100. 

In this kind of installation, we have the same in the datacenter or in the cloud. We have the same red hot operating system. So, we are avoiding any potential change between the datacenter and in the cloud — and we have the same in America, which is very important.

We use Jenkins and Ansible to execute with jobs, the installation of everything, the creation of machines, installation of Vault, configuring alerts, logging; everything was or is performed with several jobs that are executed very easily — and you have all the clusters restored if you need it. For example, every six months, for example, if you need to reconstruct the operating system because you have to update the version of the operating system — we are using Jenkins and Ansible for that. That was key to having the product in place in 2-3 months.

The main problem we had in this installation was opening firewalls. We don't have the firewalls opened automatically, so we have to open everything in synchronization, everything declines. It was what lasted more in our installation, but once we already have the firewalls in place, it is pretty easy to make any change to infrastructure because we have this automated.

And the funny thing is that we are even installing Vault Runtime with the help of Vault open source because Jenkins and Ansible are using Vault open source to store the secrets they need. When you're starting a product in our architecture, you're using Vault open source, so we are, in fact, installing Vault Runtimewith Vault open source.

 

Vault Use Case 

Let me explain a simple use case in BBVA. This is not just the Spanish mobile banking application. This is the mobile application of each country I mentioned before — in South America, many countries, in Mexico, Turkey, and Spain. 

In this example, we are seeing different layers. The first layer is Akamai. The second layer is the frontend. And the third layer is the backend, and the fifth, which is not depicted, maybe would be the IBM mainframe. 

Akamai decides what kind of traffic goes to the cloud or to the datacenter. For example, support that you have 20% of the traffic going to the cloud and 80% going to the datacenter — and applications are running entirely into the cloud. The frontend, backend, also Vault.

When Akamai redirects the request to the Vault installation, it goes to the cloud. Yes, we are avoiding crossing traffic. This is one of the best and few installations that are executing on our hybrid architecture perfectly. 

Because when you are thinking about all the solutions, you are having many problems when you are trying to synchronize data, databases, or just buckets — it’s not so easy as Vault is implementing this synchronization of data. And these kinds of applications are writing and reading secrets constantly — many transactions per second. It's very critical to have everything in place and with zero downtime. 

What About The Future?

The first thing we are going to implement or expand in Vault is the use of HSMs. Why? Because right now, if something happens and somebody restarts the machines, we find the node of Vault sealed. We have to unseal Vault manually. 

So with the help of the HSMs, we are going to unseal Vault automatically. We are installing cloud HSMs in Amazon and Talos or Insight for HSMs in the on-prem datacenter, so that we integrate with Vault, so we are able to unseal Vault automatically. And here are we also, introducing HSMs — or planning to use HSMs — for this certification authority

You know that in BBVA, we issue certificates for everybody — and we have many products like Microsoft CA or others. But why not — when you have such a kind of installation in BBVA, so very well designed and available — why not start using the certification authority?

We didn't use it yet because we didn't have the possibility of integrating HSMs for the certification authority, which I think is coming in the next release. So, we are going to configure our sub CA, so we have another one and then available through our IPA to issue certificates automatically.

We are already doing that with Vault open source. But this is another possibility that — when you have a very big installation — you can integrate everything in one installation. I could count to five different Vault installations for other departments.

We are cryptographic, we are deploying our infrastructure globally, for everybody — there is no need for every department to have their own Vault installation. Another good thing that Vault is implementing is the possibility of managing keys in the cloud — managing keys with Google Azure, MF, and Amazon Web Services. This is another feature that we are exploring, but we are missing Consul right now because sometimes key custodians are not technically savvy.

They need to right-click, and then we have to educate them. But this is something that I don't know is coming in the future. We are exploring that possibility; having the possibility of bringing your own key to the cloud is very important because we are starting to have regulations. They require to just create the key in your main datacenter because of random number generators — and then push the key to the cloud. 

And the last one, but not least, is the encryption and tokenization feature that Vault has. We already have in cryptography many — or a big installation — where we are providing cryptography services. The problem for using this functionality with Vault is that in a bank we require many algorithms. Like everywhere? 

Encryption in Vault is implemented with several algorithms. Sometimes it is not enough, but why not? Why not — if we have somebody that is going to use an API and is requesting a very stable installation and available installation — why don’t they start using encryption and tokenization?Those are the fourth scenarios that we are exploring in the future.

Conclusion 

I would like to conclude that not only for storing secrets but for pushing keys to the cloud, or just issuing certificates, or encrypting, Vault is going to be a key component of the architecture in BBVA. 

That is very well aligned with the strategy of cryptography because, if you remember, what we are doing is to encrypt, issue certificates and provide tools for secrets management, so it's doing more or less what we do.

That is very well aligned with the strategy of BBVA of computing hybridly between the two clouds without any effort. I would say that the relationship between BBVA and HashiCorp is going to last many years and with very great success. 

This is what I wanted to share. Thank you. Thank you for coming. If you have any questions, I'm going to respond in HashiConf soon.

More resources like this one