Blendo Data Monthly: Captain America, Data Science & how to optimize email campaigns

Giorgos Psistakis

Our previous month’s Blendo Data Monthly was all around the first ever Kafka Summit in San Francisco and our usual data sources.

This month we will begin with a question: Who would win in Captain America: Civil War? Pure awesomeness! The great people from FiveThirtyEight pulled Data Science to give Captain America 4-To-1 odds against winning the Civil War.

Who would win in Captain America: Civil War?Source: MARVEL / AP / FiveThirtyEight

#Data Science

A list of online courses on one of the most time-consuming processes in the data pipeline. Data cleaning.


21+ Online Data Science courses about Data Cleaning

I see dead people: Making simulations of deceased people


Making simulations of deceased people

+ PyCon 2016 Videos. All of them!

+ 39 studies about human perception in 30 minutes.

+ Infographic: 16 Genius Minds Whose Inventions Made Data Science Easier For Us

#Marketing & Customer Data

A post about Data preparation & modeling of email marketing data to create new features in order to optimize marketing campaigns.

+ Intercom Data Model for Databases and custom analytics.

+ 5 Ways in Which Big Data Can Help Leverage Customer Data

#Apache Spark

The preview release of Apache Spark 2.0 is available now! Since Spark 1.0 came out a lot have changed, Spark 2.0 builds on what they learned out of the community. You may read a great blog post from Databricks as they also provide a Technical Preview of Apache Spark 2.0.Technical Preview of Apache Spark 2.0 with DatabricksSource: Databricks

+ How Spark and Hadoop Are Advancing Cancer Research.

#Twitter Heron

Twitter process billions of events every day. In order to analyze these events in real time is a huge challenge. In the beggining Twitter used Storm which was open sourced in 2011. In 2015 they introduced Heron, a real-time distributed stream computation system. This month Twitter open sourced Heron under Apache v2.0 license!
Heron is powering all of Twitter’s real-time analytics for over two years as a heavy-weight real-time stream processing engine and backward compatible with Storm.


Apache Kafka made available its Kafka 0.10.0 major release. If you recall from our previous post about Kafka Summit, it includes: Kafka Streams, Message Timestamps and many more features.

+ Read how to use Kafka and Flink to migrate batch jobs to streaming.

+ Liveperson just open sourced kafka-java-bridge, so if you are using kafka and nodejs you should try it out.

+ Martin Kleppmann explores using event streams and Kafka for keeping data in sync across heterogeneous systems. Video of Staying in Sync: from Transactions to Streams at InfoQ.

#Machine Learning / AI

This post is about Reinforcement Learning and it’s chalenges. Computers learning and winning on games like Go or Pong up to robots learning to perform complex tasks and what Reinforcement Learning has to do with it. Great post from Andrej Karpathy, a PhD student at Stanford working on Deep Learning.
Reinforcement Learning

+ An awesome curated list of awesome TensorFlow experiments, libraries, and projects.

+ My path to OpenAI


Data is at its most powerful when it is interconnected. A major challenge for modern data is interconnection of different data types to obtain a fuller picture of the data subject. The idea of Data Trusts explained.

Enjoyed this month’s Blendo Data Digest? Want more? Read more Blendo Data Monthly posts.