Blendo Data Monthly: The Kafka effect

Giorgos Psistakis

In our previous month’s Blendo Data Monthly we saw that one of the hottest topics in the Big data world today is Apache Kafka.

Apache Kafka is an open source technology that acts as a real-time, fault tolerant, highly scalable messaging system. It was created inside LinkedIn to manage their huge data messaging needs. The technology then became an Apache Project and brought up a separate company called Confluent.

Kafka Summit

In April 26th, Confluent hosted the first ever Kafka Summit in San Francisco.
Kafka Summit

I have tried to gather together most of the presentations available out there.




+ All the Kafka Summit Videos here

+ More details from Confluent’s Log Compaction – Kafka Summit Edition

+ Confluent announced the results from their recent survey of the Kafka community that illuminates Kafka’s growing impact on the way organizations collect and process their data. “We see more and more organizations embracing real-time data and stream processing, and Kafka is at the heart of that shift,” said Jay Kreps, one of Kafka’s co-creators and the CEO and co-founder of Confluent. Read more here.

+ How easily get started with Kafka Connect and building connectors to your favorite apps like Mixpanel (along with the source code).

More Kafka

Kafka was first created inside LinkedIn. No better place to read about their problems/solutions. Here is the Kafka Ecosystem at LinkedIn.

+ …and Salesforce’s operations teams view about Apache Kafka. What would you do if you had terabytes of operational data being generated in production each day, and hundreds of engineering teams wanting to use that data to improve their services … but no way to connect the two? Read more here.

Great overview by Robert Metzger provides an overview of the Apache Flink internals and stream processing. Article from InfoQ.

Apache Apex

The Apache Software Foundation announces Apache Apex as a Top-Level Project. Apache Apex is a large scale, high throughput, low latency, fault tolerant, unified Big Data stream and batch processing platform for the Apache Hadoop® ecosystem. Apex was originally created at DataTorrent Inc. in 2012 (coinciding with the first alpha release of YARN), and entered the Apache Incubator in August 2015. The Apache Software Foundation Announces Apache Apex as a Top-Level Project


Today, when high-powered machines can predict market trends better than any finance analyst, play chess and Go better than any grandmaster and build a car faster, cheaper and more efficiently than any autoworker, it comes as no surprise to me that an estimated 47% of jobs are at risk of automation in the next five years. What does surprise me, however, is the fact that some businesses see automation as a one-size-fits-all solution, removing humans from the very tasks that need them most. Some things—often the most important aspects of management and leadership—still need a human touch…Great article by Dov Seidman at Forbes.