Blendo Data Monthly: All Hail our Machine Overlords, Data & Predicting future

Giorgos Psistakis

In the previous month’s Blendo Data Monthly we read was how Data Science gave to Captain America 4-To-1 odds against winning the Civil War or the list of 21+ online courses on one of the most time-consuming processes in the data pipeline. Data cleaning.

All Hail our Machine Overlords

It certainly pays off to try and learn new things, even if it is something outside our comfort zone. For engineers, this might be a dive in the Machine Learning and Data Science field. Per, who is a developer at Xeneta, decided to use some machine learning to boost the sales of the company he works at, and even if the result is not perfect it’s definitely rewarding.

+ Berkeley has an awesome list of AI related material from what they teach at the university containing video recordings of the lectures. But if you’d prefer to concentrate more on Deep Learning, Ofir has compiled a great list of material that you would follow to kick start your career as a Deep Learning expert.

+ Data science and Machine Learning need data right? Have you ever thought of building an AI sous-chef? Well to do that you will certainly need food related data and we have you covered. Using this great BBC food recipe scraper written in Go you can easily get a lot of food related data.

+ What would have happen if HAL 9000 met Picasso? Find out by watching this amazing video of 2001: Space Odyssey rendered in the style of Picasso using deep neural nets.

+ Google’s artsy AI composes its first song

+ Free Machine Learning Books

#Predicting the future

We humans are really bad in Predicting the future, especially when we try something new we almost always fail to predict accurately how long it will take to achieve our goals and what resources will be needed. But, it always helps to get examples from others, from their experiences and strangles. @nathanbarry shares with us his experience going from losing money to a 51% profit margin in 5 months.

+ And probably you have a clue already even before reading the article about one of the decisive factors in success, which is resilience. But do we perceive resilience correctly? Most of us we just think of resilience as endurance but researchers think differently. Here’s a great article to complement the journey of Nathan about resilience and what is all about.

#Data Science

Data Science becomes important for almost every job that includes data. Marketing is one of them. How can a marketing professional, better understand this brand new world of data science?

+ Customer Support is another. You can go as wild as you want with your data, from maintaining real-time dashboards to joining your data from Zendesk with other sources coming from sales, product, and marketing to building predictive models by involving your data science team.

+ R, Python Duel As Top Analytics, Data Science software – KDnuggets 2016 Software Poll Results

+ Using Python for email marketing data preparation and cleaning.

+ How to load MailChimp’s data into Tables and DataFrames

+ System that replaces human intuition with algorithms outperforms human teams

#Dashboards and BI

There are many tools out there. And just like the apps in the App Store, there are a few that stand out from the rest. We’ve taken it upon ourselves to crowdsource, curate, and brainstorm all of the best, dashboards and BI tools in the web.

#Data infrastructure

Working with data requires to first acquire the data from different sources, Thumbtack has an interesting series of posts on how they have designed and implemented their data infrastructure and more specifically the operational details of the systems that make up their infrastructure.

+ Running data infrastructure on a scale, most certainly requires some distributed systems involved. One of the least discussed aspects of distributed systems is how you test them, most of the times we take it for granted that Spark or Kafka or our own designed distributed key-value store will work, but how do we test such systems in realistic conditions? Here’s a very interesting list of resources with material related to the task of testing distributed systems.

+ Finally, although streaming data is all the hype lately, we most certainly have to include some static data in the mix. Martin has made an awesome presentation about streaming static data.

+ Is Spark Overhyped?

On the side

This month we had two major events.

The first was that Microsoft to Acquire LinkedIn for $26.2 Billion but you know that right?

The second was this #got