Technology Deep Dive

3:50 PM - 4:30 PM, PST , October 29, 2024

Pinterest Tiered Storage for Apache Kafka

The release of Apache Kafka®️ 3.6.0 introduced a native Tiered Storage offering, tightly integrated within the broker process. This design enables deep access to internal Kafka protocols and metadata, ensuring a highly coordinated storage solution. However, this tight coupling also imposes significant limitations, particularly in unlocking the full potential of Tiered Storage. The inherent design ensures that the broker remains within the active serving path even during data consumption, consequently missing an opportunity to fully leverage remote storage as an autonomous serving path.

To overcome these limitations, we explored and applied the MemQ design pattern to Tiered Storage for Apache Kafka, effectively decoupling it from the broker process. This architectural shift enables direct consumption from remote storage, thereby delegating the active serving path away from the Kafka broker. The result is a remarkable reduction in resource utilization on the Kafka cluster, extending benefits far beyond mere storage efficiencies and significantly lowering serving costs. In this talk, we will cover the intricacies of our design, the transformation it brings to the operational landscape of Apache Kafka, and a comparative analysis against the native implementation.

Speaker

Vahid Hashemian

Staff Software Engineer, Pinterest