Technology Deep Dive

2:00 PM - 2:40 PM, PST , October 29

Scaling Kafka Replication at Uber's Monumental Scale

Uber employs one of the largest Apache Kafka clusters in the world, acting as the pivotal hub connecting the entire Uber ecosystem. We aggregate system metrics, application logs, database changelogs, and event data from rider/driver/eats apps. This intricate process guarantees the seamless downstream availability of critical data through the Kafka platform.
In 2016, we pioneered our own Kafka replication system, uReplicator, designed to address MirrorMaker 1.0's limitations and meet our rapidly growing, large-scale Kafka replication needs across multiple clusters.
As our product ecosystem undergoes expansive growth, we have adeptly evolved and innovated our system to seamlessly sustain Kafka replication across more than 20+ clusters, managing trillions of messages daily. Throughout this journey, we have consistently upheld exceptional levels of availability, scalability, and reliability while also ensuring minimal replication delays.
Topic includes:
1. Unveil the architectural design of Uber's robust Kafka replication system and shed light on the array of use cases it empowers.
2. Highlight the scale with the challenges we've grappled with and how we've maneuvered through them.
3. Exploring the architecture and functionality of our unique core Auto-Rebalance: a critical component for autonomous cluster management and seamless auto-healing in catastrophic scenarios.
4. Detailed exploration of our innovative auto-scaler that intelligently scales clusters predicated on topic traffic fluctuations.
5. Showcasing our observability innovations: transforming operations and troubleshooting into developer-friendly tasks.
6. Concluding with reflections on encountered challenges and valuable lessons learned, offering guidance for others on similar journeys.

Speaker

Hao Sun

Senior Software Engineer, Uber

Si Lao

Staff Software Engineer, Uber