In our previous month’s Blendo Data Monthly we saw that one of the hottest topics in the Big data world today is Apache Kafka.
Apache Kafka is an open source technology that acts as a real-time, fault tolerant, highly scalable messaging system. It was created inside LinkedIn to manage their huge data messaging needs. The technology then became an Apache Project and brought up a separate company called Confluent.
Kafka Summit
In April 26th, Confluent hosted the first ever Kafka Summit in San Francisco.
I have tried to gather together most of the presentations available out there.
- Neha Narkhede, CTO of Confluent discussed Kafka project, the release of Apache Kafka 0.10.0 and Kafka streams. Here is her presentation in video.
- Gwen (Chen) Shapira of Confluent and Jeff Holoman of Cloudera talked about all the ways Apache Kafka may lose messages and how to fix it. Here are the slides and Video
- Event Time vs. Processing Time, Using Star Wars as an analogy by Stephan Ewen commiter of Apache Flink and CTO of Data Artisans. Video and slides here
- 101 Ways to Configure Kafka – Badly, by Henning Spjelkavik & Audun Strand from Finn.no, who shared all the mistakes they made as new Kafka users and how they corrected them. Video here
- Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign Management for Globally Distributed Data Flows by Helena Edelson. Video here
- More Datacenters, More Problems. One of the most actionable talks for deployment architecture at scale by Todd Palino. Video here
- Real-time, Streaming Recommendations with NiFi, Kafka, and Spark by Chris Fregly Principal Data Solutions Engineer at IBM Spark Technology Center. Video here
- Kafkaesque days at LinkedIn in 2015 by Joel Koshy Staff Software Engineer at LinkedIn. Video here
- Streaming SQL by Julian Hyde of Hortonworks. Video
- They also held a Hackathon, where people from Cloudera, Silicon Valley Data Science and Equifax won.
+ All the Kafka Summit Videos here
+ More details from Confluent’s Log Compaction – Kafka Summit Edition
+ Confluent announced the results from their recent survey of the Kafka community that illuminates Kafka’s growing impact on the way organizations collect and process their data. “We see more and more organizations embracing real-time data and stream processing, and Kafka is at the heart of that shift,” said Jay Kreps, one of Kafka’s co-creators and the CEO and co-founder of Confluent. Read more here.
+ How easily get started with Kafka Connect and building connectors to your favorite apps like Mixpanel (along with the source code).
More Kafka
Kafka was first created inside LinkedIn. No better place to read about their problems/solutions. Here is the Kafka Ecosystem at LinkedIn.
+ …and Salesforce’s operations teams view about Apache Kafka. What would you do if you had terabytes of operational data being generated in production each day, and hundreds of engineering teams wanting to use that data to improve their services … but no way to connect the two? Read more here.
Apache Flink
Great overview by Robert Metzger provides an overview of the Apache Flink internals and stream processing. Article from InfoQ.
Apache Apex
The Apache Software Foundation announces Apache Apex as a Top-Level Project. Apache Apex is a large scale, high throughput, low latency, fault tolerant, unified Big Data stream and batch processing platform for the Apache Hadoop® ecosystem. Apex was originally created at DataTorrent Inc. in 2012 (coinciding with the first alpha release of YARN), and entered the Apache Incubator in August 2015. The Apache Software Foundation Announces Apache Apex as a Top-Level Project
Data
“Today, when high-powered machines can predict market trends better than any finance analyst, play chess and Go better than any grandmaster and build a car faster, cheaper and more efficiently than any autoworker, it comes as no surprise to me that an estimated 47% of jobs are at risk of automation in the next five years. What does surprise me, however, is the fact that some businesses see automation as a one-size-fits-all solution, removing humans from the very tasks that need them most. Some things—often the most important aspects of management and leadership—still need a human touch…” Great article by Dov Seidman at Forbes.