Use Case

11:40 AM - 12:30 PM, PST , October 29

Harmonious Integration of Pulsar with ClickHouse & StarRocks

When building traditional real-time data analytics systems, we primarily integrated Kafka and Flink through tightly coupled, predefined static aggregations. This approach had several limitations, making it difficult to respond flexibly to surges in social events or recovery scenarios, and scaling was highly constrained.
To address these issues, we adopted Pulsar, which provided messaging and lightweight streaming through FunctionMesh and internal geo-replication features. Despite its advantages, this approach still had limitations in handling various analytical needs, prompting us to rethink our system architecture.
First, we combined Pulsar with Clickhouse to build our analytics system. Using Pulsar's FunctionMesh, we efficiently segmented data into topics, allowing for data reuse and aggregation. Clickhouse excelled in handling large volumes of data and real-time analysis, enabling a more streamlined and flexible streaming system.
However, as data volume grew, we faced limitations even with Pulsar and Clickhouse. We decided to incorporate StarRocks, a high-performance JOINable database. StarRocks simplified complex mesh aggregation tasks, reducing constraints related to predefined aggregations and easing the burden on Pulsar.
In this session, we will share our challenges and experiences with Pulsar, FunctionMesh, and StarRocks over the past five years, demonstrating how this combination maximizes strengths and mitigates weaknesses. We continue to seek and develop systems to enhance the efficiency of Pulsar and our data analytics.

Speaker

Youngjin Kim

Leader/Principal Data Engineer, NAVER

Moweon Lee

Principal Data Engineer, NAVER