Presentation

From the First Photocopy to Modern Failure Detection in Distributed Systems

Learn about the Gossip & SWIM protocols for managing group membership and failure detection in a distributed system, and learn how HashiCorp Consul & Nomad build on Gossip with "Lifeguard" extensions from HashiCorp Research.

Speakers

  • Nic Jackson
    Nic JacksonDeveloper Advocate, HashiCorp

The challenge of distributed systems has been around ever since companies have had n+1 computers in a network. The scenario of managing membership for a group of nodes and detecting any failures has challenged engineers and been optimized for decades. However, many of the core algorithms still used today date back over 30 years.

One of the more well-known distributed systems challenges came in the 1980s when Xerox was experimenting with distributed email servers. From their experiments came the research paper: Epidemic Algorithms for Replicated Database Maintenance. From that initial algorithm, SWIM and the Gossip protocol were born, and at 5000+ machines, Gossip has still not been beaten to date in terms of performance.

In this session from Istanbul Tech Talks 2019, HashiCorp developer advocate Nic Jackson will look at how Xerox's epidemic algorithm became the Gossip protocol used in SWIM to manage group membership and failure detection in modern distributed systems.

Gossip is the protocol used in HashiCorp Consul for service discovery and configuration. HashiCorp Nomad, a cluster scheduler, also uses Gossip.

Both of these tools received novel improvements in the last year by implementing extensions from a HashiCorp Research project called Lifeguard which makes the Gossip protocol more robust. Jackson will explain how these extensions work and where their inspiration came from.

More resources like this one

  • 3/15/2023
  • Case Study

Using Consul Dataplane on Kubernetes to implement service mesh at an Adfinis client

  • 1/20/2023
  • FAQ

Introduction to Zero Trust Security

  • 1/19/2023
  • Presentation

10 Things I Learned Building Nomad-Packs

  • 1/4/2023
  • Presentation

A New Architecture for Simplified Service Mesh Deployments in Consul