Email marketing campaigns using a SaaS platform like MailChimp generate a large number of data that can be used to:
- Better understand our customers
- Monitor and optimize our email marketing campaigns
But to do that, we first need to understand the data we have to deal with. In this post we will go through the data of Mailchimp, what we can pull out of it using its API, understand and properly model the data. Usually, when data are exposed through a web API, it is optimized for this particular job, so when data is pulled, it first has to be properly re-modeled for the database or analytics system that we plan to use. To do that, we need to both understand the data we are working with but also the requirements of the system we are going to use. For this post, we’ll consider a relational database for storing the data and thus the data will be modeled accordingly. Also, if you work with a language like R you can consider each table as a DataFrame, the structure and the process for preparing the data will be the same.
MailChimp resources & main entities
The resources that Mailchimp exposes and that you should replicate on your database are the following:
Resource | Description |
Campaign | The core concept of Mailchimp is a Campaign, whenever you implement an email campaign, a Campaign is created. From this resource you get all its related data. |
Conversations | Conversations contains all the responses you got from recipients of your email campaigns. |
Messages (Conversation sub-resource) | The messages that were exchanged as part of the conversations. |
Lists | In order to execute an email campaign in Mailchimp you need a pair of a Campaign and a List of recipients where the mails will be delivered. |
Members (lists sub-resource) | The members of the lists, here you will find the actual information of your recipients like emails and demographic information. |
Reports | Campaign performance related reports. |
Campaign Abuse (Reports sub-resource) | Information about campaign abuse complaints. |
Campaign Advice (Reports sub-resource) | Feedback based on the campaign’s statistics. |
Click Reports (Reports sub-resource) | Reports related to the URLs you have included in your email campaigns. |
Domain Performance (Reports sub-resource) | Performance of your campaigns, useful in cases that you are sending emails from different domains. |
Email activity (Reports sub-resource) | Probably the most important report, it contains all the interactions of your users with your emails. |
Unsubscribes (Reports sub-resource) | Reports related to the recipients you have unsubscribed from your lists. |
Ideally, each one of the above resources will end up as a table in your database. But before we load the data into our tables we need to consider a few things as we will see below.
![]() |
![]() |
Sync your email marketing data from MailChimp to any data warehouse. Analytics-ready data with no hassle. | Integrate Now |
See more data integrations to sync with your data warehouse powered with ❤ by Blendo
Data format & serialization
MailChimp delivers its data in JSON and for performance reasons some of the responses contain nested objects, something that makes it difficult to directly store the data in a tabular form. As in the case of the Intercom data we have to consider the following cases:
- data type conversions
- Nested objects
- Arrays
Data types are usually straightforward to convert, special care should be taken for dates and timestamps but the rest is quite easy to implement. Nested objects, if they do not contain any other objects or arrays are also easy to flatten out into a number of columns, where each sub_field is merged with the name of the parent object to form a column in your table or data frame.
Arrays are a bit more difficult to handle and they usually fall into one of the following categories:
- Arrays of primitive types
- Arrays of objects
For the first case, a possible solution is to flatten out the array into one value by concatenating the array members. The best strategy here is to create one string with a known delimiter and generate only one value. Now in the case of an Array of objects things are a bit more complicated, you will actually have to extract the array and create a new table where each array member will be mapped to one table row. In this last case, make sure that you include information from the parent object as a foreign key, for example, an ID of the parent. Now let’s see for each of the Resource mentioned earlier how to handle the delivered data.
Mapping Mailchimp resources to relational tables
The Campaign table
The MailChimp Campaign entity has the following properties that can be mapped into table columns.
Name | JSON Type | RDBMS Column Type |
id | String | Text |
type | String | Text |
create_time | String | Text |
archive_url | String | Text |
long_archive_url | String | Text |
status | String | Text |
emails_sent | Integer | Numeric |
send_time | String | Text |
content_type | String | Text |
recipients | Object | Can be flattened out if some fields are removed. |
settings | Object | It can be flattened out. |
variate_settings | Object | Can be flattened out if some fields are removed. |
tracking | Object | Can be flattened out. |
rss_opts | Object | Can be flattened out. |
ab_split_opts | Object | Can be flattened out. |
social_card | Object | Can be flattened out. |
report_summary | Object | Can be flattened out. |
delivery_status | Object | Can be flattened out. |
Recipients: field contains a total of 5 fields from which 4 are of primitive types and can be directly converted in an appropriate data type and one field named segment_opts which is of type Object needs to be flattened out appropriately. This last field contains a sub-field named conditions which is an Array and which makes things a harder in representing the resource in a table. There are two options, the first one is to remove the conditions field completely, this will result in a much simpler representation of the resource in our database and it can be done when it is known that the information it carries will not be used. The second option is to extract the field and create a new table called condition_types where each element of the array will be presented as a row. In this case which adds some additional complexity to our data model, we guarantee that we maintain all the information we can get from Mailchimp. As the embedded array does not include any fields that associate its elements with the parent resource, during the extraction and preparation of the data you need to make sure that you will add information from the parent record to each one of the array elements, a good candidate for this is the list_id field.
Settings: can be easily mapped into a relational table, it contains 16 fields where 15 of them are of primitive types and can be easily converted while one of them the auto_fb_post when present, is an array that can be directly flattened out by concatenating its values.
Variate_settings: is more similar to the case of Recipients. It included 11 fields from which 6 of them are Arrays, 5 of these arrays can be flattened out by concatenating their values but the field combinations needs to either be removed or represented as an additional table as it contains objects. Again if you choose to create an additional table, make sure that you have included an id from the parent resource to use as a foreign key for joining the data.
Tracking: contains a total of 10 fields, where 7 of them are of primitive types while the rest named salesforce, capsule & highrise when present are objects containing only additional fields of primitive types, so it is easy to flatten out these fields and store them as additional columns in the table.
Rss_opts: contains 5 fields, from which only one is not a primitive type, named schedule and which contains another object inside with only fields of primitive types. Although a bit deep as a hierarchical resource, the fact that every object is not contained in an array and it contains only primitive types, makes it easy to flatten out all the fields and store them in one table.
Ab_split_opts: contains 14 fields, all of them of primitive types so mapping into table rows is a trivial task.
Report_summary: contains 7 fields, where one of them is an object containing only primitive types, so it will require to be flattened out and embedded in the original table.
The conversations table
The conversations entity has the following structure:
Name | JSON Type | RDBMS Column Type |
id | String | Text |
message_count | Integer | Numeric |
campaign_id | String | Text |
list_id | String | Text |
unread_messages | Integer | Numeric |
from_label | String | Text |
from_email | String | Text |
subject | String | Text |
last_message | Object | Can be flattened out |
The last_message object contains only sub-fields of primitive types so everything is quite straight forward on how to convert conversations into a table.
The Messages tables
Messages are a part of conversations, if you check the API documentation of Mailchimp you will see that messages are exposed as a sub-resource of conversations. The structure of this resource is the following:
Name | JSON Type | RDBMS Column Type |
conversation_messages | Array | Array, see below |
id | String | Text |
conversation_id | String | Text |
list_id | Integer | Numeric |
from_label | String | Text |
from_email | String | Text |
subject | String | Text |
message | String | Text |
read | Boolean | Boolean |
timestamp | String | Text |
Apart from the conversation_id field, everything else is included in objects inside the conversation_messages array. By having an array it means that each response contains multiple messages that we need to extract and save them as separate rows in our table. So each member of the conversation_messages array will become one row in the Messages table. The rest of the fields have a one to one mapping to columns.
![]() |
![]() |
Sync your email marketing data from MailChimp to any data warehouse. Analytics-ready data with no hassle. | Integrate Now |
See more data integrations to sync with your data warehouse powered with ❤ by Blendo
The List table
Lists are the second most important resource of Mailchimp, it actually contains information related to your recipients (customers). It has the following structure.
Name | JSON Type | RDBMS Column Type |
id | String | Text |
name | String | Text |
contact | Object | Can be flattened out |
permission_reminder | String | Text |
use_archive_bar | Boolean | Boolean |
campaign_defaults | Object | Can be flattened out |
notify_on_subscribe | String | Text |
notify_on_unsubscribe | String | Text |
date_created | String | Text |
list_rating | Integer | Numeric |
email_type_option | Boolean | Boolean |
subscribe_url_short | String | Text |
subscribe_url_long | String | Text |
beamer_address | String | Text |
visibility | String | Text |
modules | Array | Can be flattened out. |
stats | Object | Can be flattened out. |
List is a bit more complex as an entity and it contains a number of nested objects that we need to take care of.
Contact, campaign_defaults, and stats can be flattened out as they do not contain any nested objects nor any arrays, all their fields are primitive types.
Modules: As an array, it can be flattened out by concatenating its value and maintaining its name as a column name with data type Text.
Members table
List members are the way of grouping together recipients or your customers, for fine tuning your email campaigns. Members are actually a sub-resource of Lists.
Name | JSON Type | RDBMS Column Type |
id | String | Text |
email_address | String | Text |
unique_email_id | String | Text |
merge_fields | Object | Can be flattened out |
interests | Object | Can be flattened out |
stats | Object | Can be flattened out |
ip_signup | String | Text |
timestamp_signup | String | Text |
ip_opt | String | Text |
timestamp_opt | String | Text |
member_rating | Integer | Numeric |
last_changed | String | Text |
language | String | Text |
vip | Boolean | Text |
email_client | String | Text |
location | Object | Can be flattened out |
last_note | Object | Can be flattened out |
list_id | String | Text |
list_id | String | Text |
All the fields of type Object do not contain any arrays or nested objects and can be easily flattened out.
The reports table
The Reports resource contains information about the performance of your campaign and it has the following structure.
Name | JSON Type | RDBMS Column Type |
id | String | Text |
campaign_title | String | Text |
type | String | Text |
emails_sent | Integer | Numerical |
abuse_reports | Integer | Numerical |
unsubscribed | Integer | Numerical |
send_time | String | Text |
bounces | Object | Can be flattened out |
forwards | Object | Can be flattened out |
opens | Object | Can be flattened out |
clicks | Object | Can be flattened out |
facebook_likes | Object | Can be flattened out |
industry_stats | Object | Can be flattened out |
list_stats | Object | Can be flattened out |
ab_split | Object | Can be flattened out |
timewarp | Array | Array |
timeseries | Array | Array |
share_report | Object | Can be flattened out |
ecommerce | Object | Can be flattened out |
delivery_status | Object | Can be flattened out |
All the fields of type object contain simple sub-fields and thus it is straightforward on how to flatten them out and include them in the initial table. The only exception is ab_split which has two sub-objects that contain simple sub-fields, again it is easy to flatten out if you consider one additional level for your objects.
Timewarp & timeseries: are Arrays of objects and thus you have two options, one to remove them if you do not intend to use these fields or to create separate tables for each one. Their objects are simple so nothing special on how to handle them but you will have to include an Id from the reports table to make the reference between the different tables.
![]() |
![]() |
Sync your email marketing data from MailChimp to any data warehouse. Analytics-ready data with no hassle. | Integrate Now |
See more data integrations to sync with your data warehouse powered with ❤ by Blendo
The Campaign Abuse report table
This report contains information about abuse mentions for your campaign. Its structure is the following.
Name | JSON Type | RDBMS Column Type |
id | Integer | Numerical |
campaign_id | String | Text |
list_id | String | Text |
email_id | String | Text |
email_address | String | Text |
date | String | Text |
This report does not contain any nested objects or arrays and thus there’s a one-to-one mapping to a database table fields and the object fields.
The Campaign advice table
This resource contains advices given from Mailchimp regarding your campaigns and how you can improve their performance, the structure of the table will look like this:
Name | JSON Type | RDBMS Column Type |
advice | Array | Numerical |
type | String | Text |
message | String | Text |
campaign_id | String | Text |
The API response always contains an array of advices so you will have to parse it and create one row on the table for each response found. The rest of the fields do not need anything special to map them directly into columns.
The Click Reports table
This resource contains information about the interaction between your recipients and any links that you might have included inside your emails. It has the following structure.
Name | JSON Type | RDBMS Column Type |
urls_clicked | Array | Array |
id | String | Text |
url | String | Text |
total_clicks | Integer | Number |
click_percentage | Number | Number |
unique_clicks | Integer | Number |
unique_click_percentage | Number | Number |
last_click | String | Text |
ab_split | Object | Can be flattened out |
campaign_id | String | Text |
Each response contains an array of URLs that you will have to extract and store as a new row in the table. All the fields have a straight forward one-to-one mapping to a table column except ab_split which can be flattened out just as it was described in the case of the Campaign Report earlier.
![]() |
![]() |
Sync your email marketing data from MailChimp to any data warehouse. Analytics-ready data with no hassle. | Integrate Now |
See more data integrations to sync with your data warehouse powered with ❤ by Blendo
The Domain Performance table
This resource contains information about the performance of your domains in the case where you send emails from more than one different domains. It has the following structure.
Name | JSON Type | RDBMS Column Type |
domains | Array | Array |
domain | String | Text |
emails_sent | Integer | Number |
bounces | Integer | Number |
opens | Integer | Number |
clicks | Integer | Number |
unsubs | Integer | Number |
delivered | Integer | Number |
emails_pct | Number | Number |
bounces_pct | Number | Number |
opens_pct | Number | Number |
clicks_pct | Number | Number |
unsubs_pct | Number | Number |
Each response of the API delivers an array of objects, you need to parse the array and create one table row per object. The object has a simple structure that is easily mapped to table columns.
The Email activity table
Probably one of the most interesting resources of MailChimp as it contains an “event log” of all the different interactions of your recipient with your emails, so it contains information about their behavior. Useful data for understanding how your customers behave against your marketing campaigns. It has the following structure.
Name | JSON Type | RDBMS Column Type |
campaign_id | String | Text |
list_id | String | Text |
email_id | String | Text |
email_address | String | Text |
activity | Array | Array |
The really valuable information inside this report is inside the activity array, which contains one entry for each interaction of your user. For this reason the best approach is to create one row on your table for each entry of the activity array and enrich its objects with the data coming from the report like the list_id and the email_address. All the fields of the activity array have a simple one-to-one mapping to table columns.
![]() |
![]() |
Sync your email marketing data from MailChimp to any data warehouse. Analytics-ready data with no hassle. | Integrate Now |
See more data integrations to sync with your data warehouse powered with ❤ by Blendo
The Unsubscribes Report table
This report contains information related to the requests of your recipients to be excluded from any future campaigns you might run. Its structure is the following:
Name | JSON Type | RDBMS Column Type |
unsubscribes | Array | Array |
email_id | String | Text |
email_address | String | Text |
timestamp | String | Text |
reason | String | Text |
campaign_id | String | Text |
list_id | String | Text |
You should create one table row for each entry of the “unsubscribes” array which contains the fields that are presented in the rest of the above table. Their mapping to table columns is quite straightforward and they do not require any kind of preprocessing.
Final thoughts
MailChimp offers a rich dataset that we can use to perform our own analysis to better understand our customers and to optimize our marketing campaigns. Richness comes with a price though, the data model is quite complex and thus it requires to invest some time in better understand the data and how we can represent them more efficiently. Hopefully, the above post will help you to understand the data faster and offer you a possible representation, especially if you plan to work with databases or Dataframes (tabular data).
If you would like to automate the process of pulling your data consistently from Mailchimp and having it always at your disposal for analysis, do not forget to check Blendo.
Would you like a demo on how to use Blendo? Reach out to us and we’ll show you around!