Skip to main content

Posts

Showing posts from January, 2016

Mindmap for data stream analysis frameworks

In this post, I'd briefly discuss the evolution timeline of data stream processing frameworks and current state. A famous paper by Stonebraker et el few years ago established some guidelines for  truly streaming, fault tolerant computational frameworks. The mind map drawn in this post can be used as rough checklist to select/discard a streaming framework based on your application requirements. Similarities While there are differences in how streaming approaches process events, there are some commonalities like: Most of them achieve cluster mode execution via Yarn , Mesos and use Zookeper for cluster state management (for master-slave high availability). Many of them prepare a directed acyclic graph of computation which is presented to the master node in cluster which decides how to perform the computation in most efficient and parallel manner. Apache Kafka appears as a commonly employed approach as a fault tolerant event source. (though these frameworks provi