Tech Talk: Apache Samza, a distributed stream processing framework.
加入我们的community：http://www.tech-meetup.com/wechat and http://www.tech-meetup.com/signup
Apache Samza: a distributed stream processing framework.
The world is going real-time. MapReduce, SQL-on-Hadoop and similar batch processing tools are fine for analyzing and processing data after the fact — but sometimes you need to process data continuously as it comes in, and react to it within a few seconds or less. How do you do that at Hadoop scale?
Apache Samza is an open source stream processing framework designed for continuous data processing. Unlike batch processing systems such as Hadoop which typically has high-latency responses (sometimes hours), Samza continuously computes results as data arrives which makes sub-second response times possible. Samza has some unique features that make it powerful. It provides high performance for stateful processing jobs, including aggregation and joins between many input streams. It is designed to support an ecosystem of many different jobs written by different teams, and it isolates them from each other, so that one badly behaved job can’t affect the others.
At LinkedIn, we have been using Samza in production both for internal analytic purposes and for data products that are served on the live site. In this talk, we will focus on detailed architecture of Samza, and comparison with other major open-sourced streaming process frameworks.
1:30pm - 1:50pm receiption and social time
1:50pm - 3:00pm talk and Q&A
3:00pm - 4:00pm: offline networking