A moving target
Online shopping has completely transformed consumer behaviors. Today's shoppers have more choices than ever among brands and where to buy them. Which is why companies around the world turn to Criteo to help them generate more awareness, increase online traffic, and drive more sales with optimized and highly-targeted advertising.
The company's proprietary advertising platform aggregates transaction data from nearly three-quarters of the world's online shoppers and operationalizes the intelligence to deliver high-performance ads for marketers and brands. But ingesting and mobilizing that much data across platforms and global markets requires coordinating services across a massive global infrastructure that conventional service delivery practices couldn't handle.
"We pride ourselves on delivering crucial insights and targeted messaging for brands around the world, but trying to do so while manually connecting physical pieces of infrastructure together makes the task exponentially more challenging," says Pierre Souchay, Discovery and Security Authorization Lead at Criteo. "Even after transitioning to containerized workflows we still needed a more efficient, simpler way to deploy new services faster and turned our attention to optimizing our orchestration to do it."
Breaking the bottleneck
Criteo's global business features a range of services and applications like real-time ad bidding, analytics, and campaign management tools. For years the company relied on an expansive physical footprint and thousands of bare metal servers in multiple data centers to accommodate its latency-sensitive operations.
While the single-tenant boxes helped shorten data trip times, they also required a ton of time and manual effort to maintain and could take as long as three weeks to spin up enough resources to launch a new service. In response, Criteo transitioned to containerized development to accelerate this process, reduce its physical infrastructure, and minimize operating costs.
But figuring out how to run multiple services and systems on the same box to shrink the company's physical footprint came with its own unique challenges. The company already had several hundred services operating daily and wanted to add more.
Yet, every time the team added a new service, members had to manually create a new DNS entry and scour the DNS repository to figure out where to connect it. Each new service had to be updated in three separate databases, and while the team has created some processes to do it more efficiently, there were still occasional data misalignments that required additional investigation or caused outages that delayed the release of new builds.
"Even after significantly consolidating some of our 20,000 boxes and cutting our server spin-up time from a few hours to just seconds, we realized that service discovery was going to be a big bottleneck," Souchay says. "The only thing holding us back from consistently shipping new services, features, and upgrades faster was the fact we had to manage them all manually, which was completely unsustainable."
Consolidating physical infrastructure
Reducing operating costs
Accelerating service discovery and deployment
Faster, smarter, better service delivery
Criteo adopted HashiCorp Consul to automate and streamline its service discovery operations. Specifically, the company chose Consul for the tool's ability to connect and secure services across any runtime platform and any public or private cloud.
The lightweight service-based networking product provides a real-time, multi-platform directory of all running services to improve application inventory management. More importantly, it dynamically locates applications and infrastructure services to accelerate discovery, and automates network configuration to simplify connectivity without human intervention.
Along with HashiCorp Vault for database secrets management and HashiCorp Terraform for deploying AWS and Microsoft Azure resources for newly acquired cloud services, Consul forms the backbone of Criteo's entire operation. Souchay says that without Consul the rest of Criteo's operations won't work. In particular, he cites how the tool has freed his team to pursue higher-value activities that support the company's long-term business objectives.
"One of the biggest benefits of Consul is that it tells us where our services are, whether they're healthy or not, and shows us the shortest network path to reach them regardless of whether they're running on virtual machines or bare metal," Souchay says. "Consul has fully replaced our manual service discovery activities with automated workflows and we've repurposed most of our Consul operations staff to other projects because the tool is so reliable, efficient, and intelligent. We don't even work on discovery anymore unless there's a specific reason."
Resolving a potential mess with service mesh
Beyond automating discovery and connectivity among various microservices, Souchay says that Consul has also played a pivotal role in helping the organization enhance its service monitoring and overall performance across the enterprise as well.
In particular, the team uses Consul to support its open-source monitoring tool, Prometheus. The custom-built tooling exports Consul metrics into a Prometheus format that centralizes all infrastructure health metrics into a single tool for greater visibility across the environment. The solution enables Criteo engineers to observe Consul and make sure it's functioning at peak performance and capable of scaling horizontally to manage its other services spread across more than 45,000 bare metal servers and approximately 1,000 virtual machines.
A properly configured and fully functional Consul is key to Criteo's application of IoC — Inversion of Control — in which the service discovery solution interconnects all services with their probes and links services with their owners to automate discovery. This replaces a much more complicated process of having to assign individual services to run within a monitoring tool.
"Using Consul to align all our services and Prometheus to monitor Consul's performance creates a widespread service mesh that automates basic monitoring of production services and the availability of Service-Level Objectives," Souchay explains. "We're able to register and de-register services instantly, regularly check service health and availability, and automatically back up any upgrades or changes we've made."
Criteo is already planning for their future with service mesh using Consul and HAProxy. The Criteo team has developed an in-house tool that enables them to use HAProxy as a sidecar proxy for Consul. This would allow them to incorporate intentions, Consul's service mesh traffic routing capability, into their datacenters for automating TLS-encrypted connections between services. This is in the early stages of development for Criteo, but highlights their continued focus on innovation.
Pushing the envelope of efficiency, transparency, and performance
Automating infrastructure deployment and service discovery has paid huge dividends for Criteo. The increased business agility helped the company radically accelerate the development of new services from weeks to minutes and now boasts more than 4,000 service types running on its servers to meet virtually any and every customer need around the globe.
At the same time, the company has also used HashiCorp tools to replace a wide range of other infrastructure components, radically reducing its infrastructure footprint and helping save millions of dollars per year in infrastructure maintenance, licensing, and upgrade costs.
"HashiCorp tools have been instrumental in helping us gain the flexibility, speed, and scale our business needs to keep up with the changing demands of our customers," Souchay says. "We're eager to add more features to our Consul instance and push the envelope of what's possible with this level of automation and efficiency."
Transitioned to a containerized environment
Reduced quantity of physical servers needed for service delivery
Accelerated existing service discovery from 4 hours to minutes
Cut new service delivery timelines from 3 weeks to 1 minute
Reduced number of manual operations required to spawn new services down to zero
Criteo is using HashiCorp Consul, Terraform, and Vault to automate service discovery, infrastructure setup, and secrets management, for thousands of advertising production services for a global customer base.
Pierre Souchay Discovery and Security Authorization Lead Criteo
Pierre Souchay is the Discovery and Security Authorization Lead for Criteo, responsible for creating software development kits (SDKs) for all of Criteo's applications and infrastructure. Souchay boasts more than 15 years of progressive software development and infrastructure management experience and holds a Master's degree in Computer Software Engineering from Université Pierre et Marie Curie in Paris.
- AWS, Microsoft Azure
- Linux, Windows, C#, Scala, Java, Python, Ruby (infrastructure only), Chef, Mesos, Kubernetes, Hadoop, Kafka
- Load balancers:
- HAProxy, F5
- In-house tool, starting to use Vault