In this post, we will go through the data modeling aspect of the process of pulling data from Intercom to perform custom analytics. We will focus mainly on how data are modeled and delivered by Intercom and how we should remodel the data, to maximize the value we can get from it.
Intercom offers a very detailed documentation of the API they expose. Intercom API is a web-based RESTful API that uses JSON as its serialization.
The first thing that we need to keep in our minds regarding any API, is that data are modeled to ease the delivery process and reduce the number of calls to the API that a system should have to do in order to fetch the data.
For this reason, data are usually denormalized and grouped together, so when we request data for our users the API will respond with all the data of a user but it will also group together data about tags or companies. In JSON this is usually implemented by arrays of objects that usually contain data coming from a different table on the original database. The problem of storing this data directly to a database is that it makes it difficult to query these nested objects. Although databases like Postgres have excellent support for querying JSON that is stored in a table, it’s almost always a better idea to normalize your data.
|Sync your customer success data from Intercom to any data warehouse. Analytics-ready data with no hassle.||Integrate Now|
Intercom API Resources
The main resources that Intercom exposes through its API and that you would like to replicate on your database, are the following:
|Resource||Description||Available to pull|
|Users||The Users resource is the primary way of interacting with Intercom. For more info, check out the Intercom docs here.||Yes|
|Leads||Leads are useful for representing logged-out users of your application.For more info, check out the Intercom docs here.||Yes|
|Companies||Companies allow you to represent commercial organizations using your product.For more info, check out the Intercom docs here.||Yes|
|Admins||Admins are how you can view your teams and team members.For more info, check out the Intercom docs here.||Yes|
|Tags||A tag allows you to label your users and companies and list them using that tag.For more info, check out the Intercom docs here.||Yes|
|Segment||A segment is a group of your users defined by rules that you set.For more info, check out the Intercom docs here.||Yes|
|Notes||Notes allow you to annotate and comment on your users.For more info, check out the Intercom docs here.||Yes|
|Events||Events are how you can submit user activity to Intercom.For more info, check out the Intercom docs here.||Yes|
|Counts||You can use the API to get counts of users and companies filtered by certain criteria.For more info, check out the Intercom docs here.||Yes|
|Conversations||Conversation are how you can communicate with users in Intercom.For more info, check out the Intercom docs here.||Yes|
JSON Data + Intercom
When dealing with JSON data that we wish to store in a database we have to consider how to handle the following:
- Data type conversions.
- Nested objects.
- Arrays of objects.
Data type conversion is usually straightforward when you go from JSON to a relational database, as the latter usually support a much richer set of data types. The type that usually causes problems is dates and timestamps. JSON does not have a native type for these two so you have to decide early how to handle it, depending on how it is stored in your JSON document. Intercom stores dates as timestamps so in most cases when you transfer it to a relational database you can use the “timestamp” data type.
Normalization / Denormalization
With nested objects we start entering the normalization / denormalization space that we mentioned earlier. There are two strategies that we can follow with these objects:
- Flatten out the object and include it to the parent table.
- Create a new table and add a foreign key with the parent table.
As an example of this let’s assume the case of the “avatar” field inside the “user” resource in Intercom. The “avatar” is an object with the following structure:
|type||The type of the object. In this case it’s always “avatar”|
|image_url||A url pointing to the avatar of the user|
As we can see the object does not contain an “ID” field which would make the process of creating a separate table difficult as we should come up with an id. If we wish though we can do that and just create a new table named “avatar” and store the avatar objects there, making sure that we always create a foreign key with the user table, representing a 1-1 relationship.
The other solution is to flatten out these fields and create the following columns in the Users table:
- avatar_type which will hold the type, although if we want we can omit this completely.
- avatar_image_url which will hold the image url value which is also the most important information that this object holds.
You can choose any of the above two strategies depending on your needs and how verbose you would like to go with your database model. In most cases a good rule of thumb is that the simpler the embedded object, the more sense it makes to just flatten it out. Also in this case we do not consider objects that might contain other objects inside or arrays of objects. This would complicate things and usually it’s not a very good design approach. But in case that you have to deal with deeply nested objects then the same strategies apply recursively.
Arrays of objects are a bit harder to deal with as they denote 1-to-n relationships between the resource and the embedded objects and thus you cannot avoid having a separate table for them.
A very important assumption here is that the objects inside the table share the same Schema, otherwise we are in trouble. Although JSON allows to store arbitrary objects in an array, we should always consider them as sharing the same schema if we want to load the data on a database table. In the case of Intercom this stands true. An example of 1-to-n relationships in Intercom objects is between the “conversations” resource and “conversation parts”. There’s also a case where you can potentially avoid creating new tables in the case of having arrays of objects, if the objects inside have a very simple schema. For example, if you have an array of tags which have the following schema:
|type||The type of the object. In this case it’s always “tags”|
|tags||An array containing ids of tags|
Here we can create one column in the user object, called tags and inside add a string with concatenated and delimited the values of the tags array.
Modeling the data correctly is just one part of the process of defining a data pipeline for custom analytics. There are other issues that you have to consider like how to access the API, respect the API rate limits, how to link data coming from different resources. For more information on these, check the Blendo Blog and the Blendo homepage.
Want to save hours on your data management tasks? Blendo can help!