How to load MailChimp’s data into Tables and DataFrames

Blendo Team

Email marketing campaigns using a SaaS platform like Mailchimp generate a large number of data that can be used to:

  1. Better understand our customers
  2. Monitor and optimize our email marketing campaigns

But to do that, we first need to understand the data we have to deal with. In this post we will go through the data of Mailchimp, what we can pull out of it using its API, understand and properly model the data. Usually, when data are exposed through a web API, it is optimized for this particular job, so when data is pulled, it first has to be properly re-modeled for the database or analytics system that we plan to use. To do that, we need to both understand the data we are working with but also the requirements of the system we are going to use. For this post we’ll consider a relational database for storing the data and thus the data will be modeled accordingly. Also, if you work with a language like R you can consider each table as a DataFrame, the structure and the process for preparing the data will be the same.

Mailchimp resources & main entities

The resources that Mailchimp exposes and that you should replicate on your database are the following:

Resource Description
Campaign The core concept of Mailchimp is a Campaign, whenever you implement an email campaign, a Campaign is created. From this resource you get all its related data.
Conversations Conversations contains all the responses you got from recipients of your email campaigns.
Messages (Conversation sub-resource) The messages that were exchanged as part of the conversations.
Lists In order to execute an email campaign in Mailchimp you need a pair of a Campaign and a List of recipients where the mails will be delivered.
Members (lists sub-resource) The members of the lists, here you will find the actual information of your recipients like emails and demographic information.
Reports Campaign performance related reports.
Campaign Abuse (Reports sub-resource) Information about campaign abuse complaints.
Campaign Advice (Reports sub-resource) Feedback based on campaign’s statistics.
Click Reports (Reports sub-resource) Reports related to the URLs you have included in your email campaigns.
Domain Performance (Reports sub-resource) Performance of your campaigns, useful in cases that you are sending mails from different domains.
Email activity (Reports sub-resource) Probably the most important report, it contains all the interactions of your users with your emails.
Unsubscribes (Reports sub-resource) Reports related to the recipients you have unsubscribed from your lists.

Ideally, each one of the above resources will end up as a table in your database. But before we load the data into our tables we need to consider a few things as we will see below.

Data format & serialization

Mailchimp delivers its data in JSON and for performance reasons some of the responses contain nested objects, something that makes it difficult to directly store the data in a tabular form. As in the case of the Intercom data we have to consider the following cases:

  • data type conversions
  • Nested objects
  • Arrays

Data types are usually straight forward to convert, special care should be taken for dates and timestamps but the rest are quite easy to implement. Nested objects, if they do not contain any other objects or arrays are also easy to flatten out into a number of columns, where each sub_field is merged with the name of the parent object to form a column in your table or data frame.

Arrays are a bit more difficult to handle and they usually fall into one of the following categories:

  • Arrays of primitive types
  • Arrays of objects

For the first case a possible solution is to flatten out the array into one value by concatenating the array members. The best strategy here is to create one string with a known delimiter and generate only one value. Now in the case of an Array of objects things are a bit more complicated, you will actually have to extract the array and create a new table where each array member will be mapped to one table row. In this last case, make sure that you include information from the parent object as a foreign key, for example an ID of the parent. Now let’s see for each of the Resource mentioned earlier how to handle the delivered data.

Mapping Mailchimp resources to relational tables

The Campaign table

The Mailchimp Campaign entity has the following properties that can be mapped into table columns.

Name JSON Type RDBMS Column Type
id String Text
type String Text
create_time String Text
archive_url String Text
long_archive_url String Text
status String Text
emails_sent Integer Numeric
send_time String Text
content_type String Text
recipients Object Can be flattened out if some fields are removed.
settings Object It can be flattened out.
variate_settings Object Can be flattened out if some fields are removed.
tracking Object Can be flattened out.
rss_opts Object Can be flattened out.
ab_split_opts Object Can be flattened out.
social_card Object Can be flattened out.
report_summary Object Can be flattened out.
delivery_status Object Can be flattened out.

Recipients: field contains a total of 5 fields from which 4 are of primitive types and can be directly converted in an appropriate data type and one field named segment_opts which is of type Object needs to be flattened out appropriately. This last field contains a sub-field named conditions which is an Array and which makes things a harder in representing the resource in a table. There are two options, the first one is to remove the conditions field completely, this will result in a much simpler representation of the resource in our database and it can be done when it is known that the information it carries will not be used. The second option, is to extract the field and create a new table called condition_types where each element of the array will be presented as a row. In this case which adds some additional complexity to our data model, we guarantee that we maintain all the information we can get from Mailchimp. As the embedded array does not include any fields that associate its elements with the parent resource, during the extraction and preparation of the data you need to make sure that you will add information from the parent record to each one of the array elements, a good candidate for this is the list_id field.

Settings: can be easily mapped into a relational table, it contains 16 fields where 15 of them are of primitive types and can be easily converted while one of them the auto_fb_post when present, is an array that can be directly flattened out by concatenating its values.

Variate_settings: is more similar to the case of Recipients. It included 11 fields from which 6 of them are Arrays, 5 of these arrays can be flattened out by concatenating their values but the field combinations needs to either be removed or represented as an additional table as it contains objects. Again if you choose to create an additional table, make sure that you have included an id from the parent resource to use as a foreign key for joining the data.

Tracking: contains a total of 10 fields, where 7 of them are of primitive types while the rest named salesforce, capsule & highrise when present are objects containing only additional fields of primitive types, so it is easy to flatten out these fields and store them as additional columns in the table.

Rss_opts: contains 5 fields, from which only one is not a primitive type, named schedule and which contains another object inside with only fields of primitive types. Although a bit deep as a hierarchical resource, the fact that every object is not contained in an array and it contains only primitive types, makes it easy to flatten out all the fields and store them in one table.

Ab_split_opts: contains 14 fields, all of them of primitive types so mapping into table rows is a trivial task.

Report_summary: contains 7 fields, where one of them is an object containing only primitive types, so it will require to be flattened out and embedded in the original table.

The conversations table

The conversations entity has the following structure:

Name JSON Type RDBMS Column Type
id String Text
message_count Integer Numeric
campaign_id String Text
list_id String Text
unread_messages Integer Numeric
from_label String Text
from_email String Text
subject String Text
last_message Object Can be flattened out

The last_message object contains only sub-fields of primitive types so everything is quite straight forward on how to convert conversations into a table.

The Messages tables

Messages are a part of conversations, if you check the API documentation of Mailchimp you will see that messages are exposed as a sub-resource of conversations. The structure of this resource is the following:

Name JSON Type RDBMS Column Type
conversation_messages Array Array, see below
id String Text
conversation_id String Text
list_id Integer Numeric
from_label String Text
from_email String Text
subject String Text
message String Text
read Boolean Boolean
timestamp String Text

Apart from the conversation_id field, everything else is included in objects inside the conversation_messages array. By having an array it means that each response contains multiple messages that we need to extract and save them as separate rows in our table. So each member of the conversation_messages array will become one row in the Messages table. The rest of the fields have a one to one mapping to columns.

The List table

Lists are the second most important resource of Mailchimp, it actually contains information related to your recipients (customers). It has the following structure.

Name JSON Type RDBMS Column Type
id String Text
name String Text
contact Object Can be flattened out
permission_reminder String Text
use_archive_bar Boolean Boolean
campaign_defaults Object Can be flattened out
notify_on_subscribe String Text
notify_on_unsubscribe String Text
date_created String Text
list_rating Integer Numeric
email_type_option Boolean Boolean
subscribe_url_short String Text
subscribe_url_long String Text
beamer_address String Text
visibility String Text
modules Array Can be flattened out.
stats Object Can be flattened out.

List is a bit more complex as an entity and it contains a number of nested objects that we need to take care of.

Contact, campaign_defaults and stats can be flattened out as they do not contain any nested objects nor any arrays, all their fields are primitive types.

Modules: As an array it can be flattened out by concatenating its value and maintaining its name as a column name with data type Text.

Members table

List members are the way of grouping together recipients, or your customers, for fine tuning your email campaigns. Members are actually a sub-resource of Lists.

Name JSON Type RDBMS Column Type
id String Text
email_address String Text
unique_email_id String Text
merge_fields Object Can be flattened out
interests Object Can be flattened out
stats Object Can be flattened out
ip_signup String Text
timestamp_signup String Text
ip_opt String Text
timestamp_opt String Text
member_rating Integer Numeric
last_changed String Text
language String Text
vip Boolean Text
email_client String Text
location Object Can be flattened out
last_note Object Can be flattened out
list_id String Text
list_id String Text

All the fields of type Object do not contain any arrays or nested objects and can be easily flattened out.

The reports table

The Reports resource contains information about the performance of your campaign and it has the following structure.

Name JSON Type RDBMS Column Type
id String Text
campaign_title String Text
type String Text
emails_sent Integer Numerical
abuse_reports Integer Numerical
unsubscribed Integer Numerical
send_time String Text
bounces Object Can be flattened out
forwards Object Can be flattened out
opens Object Can be flattened out
clicks Object Can be flattened out
facebook_likes Object Can be flattened out
industry_stats Object Can be flattened out
list_stats Object Can be flattened out
ab_split Object Can be flattened out
timewarp Array Array
timeseries Array Array
share_report Object Can be flattened out
ecommerce Object Can be flattened out
delivery_status Object Can be flattened out

All the fields of type object contain simple sub-fields and thus it is straightforward on how to flatten them out and include them in the initial table. The only exception is ab_split which has two sub-objects that contain simple sub-fields, again it is easy to flatten out if you consider one additional level for your objects.

Timewarp & timeseries: are Arrays of objects and thus you have two options, one to remove them if you do not intend to use these fields or to create separate tables for each one. Their objects are simple so nothing special on how to handle them but you will have to include an Id from the reports table to make the reference between the different tables.

The Campaign Abuse report table

This report contains information about abuse mentions for your campaign. Its structure is the following.

Name JSON Type RDBMS Column Type
id Integer Numerical
campaign_id String Text
list_id String Text
email_id String Text
email_address String Text
date String Text

This report does not contain any nested objects or arrays and thus there’s a one-to-one mapping to a database table fields and the object fields.

The Campaign advice table

This resource contains advices given from Mailchimp regarding your campaigns and how you can improve their performance, the structure of the table will look like this:

Name JSON Type RDBMS Column Type
advice Array Numerical
type String Text
message String Text
campaign_id String Text

The API response always contains an array of advices so you will have to parse it and create one row on the table for each response found. The rest of the fields do not need anything special to map them directly into columns.

The Click Reports table

This resource contains information about the interaction between your recipients and any links that you might have included inside your emails. It has the following structure.

Name JSON Type RDBMS Column Type
urls_clicked Array Array
id String Text
url String Text
total_clicks Integer Number
click_percentage Number Number
unique_clicks Integer Number
unique_click_percentage Number Number
last_click String Text
ab_split Object Can be flattened out
campaign_id String Text

Each response contains an array of URLs that you will have to extract and store as a new row in the table. All the fields have a straight forward one-to-one mapping to a table column except ab_split which can be flattened out just as it was described in the case of the Campaign Report earlier.

The Domain Performance table

This resource contains information about the performance of your domains in the case where you send mails from more than one different domains. It has the following structure.

Name JSON Type RDBMS Column Type
domains Array Array
domain String Text
emails_sent Integer Number
bounces Integer Number
opens Integer Number
clicks Integer Number
unsubs Integer Number
delivered Integer Number
emails_pct Number Number
bounces_pct Number Number
opens_pct Number Number
clicks_pct Number Number
unsubs_pct Number Number

Each response of the API delivers an array of objects, you need to parse the array and create one table row per object. The object has a simple structure that is easily mapped to table columns.

The Email activity table

Probably one of the most interesting resources of Mailchimp as it contains an “event log” of all the different interactions of your recipient with your emails, so it contains information about their behavior. Useful data for understanding how your customers behave against your marketing campaigns. It has the following structure.

Name JSON Type RDBMS Column Type
campaign_id String Text
list_id String Text
email_id String Text
email_address String Text
activity Array Array

The really valuable information inside this report is inside the activity array, which contains one entry for each interaction of your user. For this reason the best approach is to create one row on your table for each entry of the activity array and enrich its objects with the data coming from the report like the list_id and the email_address. All the fields of the activity array have a simple one-to-one mapping to table columns.

The Unsubscribes Report table

This report contains information related to the requests of your recipients to be excluded from any future campaigns you might run. Its structure is the following:

Name JSON Type RDBMS Column Type
unsubscribes Array Array
email_id String Text
email_address String Text
timestamp String Text
reason String Text
campaign_id String Text
list_id String Text

You should create one table row for each entry of the “unsubscribes” array which contains the fields that are presented in the rest of the above table. Their mapping to table columns is quite straight forward and they do not require any kind of preprocessing.

Final thoughts

Mailchimp offers a rich dataset that we can use to perform our own analysis to better understand our customers and to optimize our marketing campaigns. Richness comes with a price though, the data model is quite complex and thus it requires to invest some time in better understand the data and how we can represent them more efficiently. Hopefully the above post will help you to understand the data faster and offer you a possible representation, especially if you plan to work with databases or Dataframes (tabular data).

If you would like to automate the process of pulling your data consistently from Mailchimp and having it always at your disposal for analysis, do not forget to check Blendo.

Would you like a demo on how to use Blendo? Reach out to us and we’ll show you around!