Ecosystem
3:10 PM - 3:40 PM, CEST , May 23
Building a Real-time Analytics Application with Apache Pulsar and Apache Pinot
Apache Pulsar is a distributed, open source pub-sub messaging and streaming platform for real-time workloads, managing hundreds of billions of events per day. It is being run in production, processing millions of messages per second across millions of topics. It has been adopted by companies such as yahoo!, Verizon Media, Splunk, and more. In this talk we'll learn how analytical queries can be run on top of Puslar's event data with Apache Pinot, a real-time distributed OLAP datastore, which is used to deliver scalable real-time analytics with low latency. We'll explore the integration between Pulsar and Pinot, explaining the features that it supports and the challenges faced while building it. After that we'll demonstrate how to build a real-time analytics dashboard with these technologies. We’ll stream data into Pulsar using its Python client, ingest that data into a Pinot real-time table, and write some basic queries using Pinot’s Python SDK. Once we've done that, we’ll bring everything together with an auto refreshing dashboard using Plot.ly Dash, so that we can see changes to the data as they happen.
Speaker

Mary Grygleski
Streaming Developer Advocate at DataStax, Java Champion, President of Chicago-JUG

Mark Needham
Cloud Native Engineer, StarTree