How to load data from Hubspot to SQL Data Warehouse

Blendo Team

How may I load data from Hubspot to SQL Data Warehouse for further analysis? The purpose of this post is to help you define a process or pipeline, for getting your subscription related data from Hubspot and load it into SQL Data Warehouse for further analysis. We will see how to access and extract your data from Hubspot through its API and how to load it into SQL Data Warehouse. This process requires from you to write the code to get the data and make sure that this process will run every time new data are generated. Alternatively you can use products like Blendo that can handle this kind of problems automatically for you.

About Hubspot

Import your Hubspot data into your data warehouse - Blendo.co

Load data from Hubspot

Hubspot is a suite of software offering two different customer related products:

  • Hubspot CRM. For Customer Relations Management.

  • Hubspot Marketing. Marketing platform, focusing mainly on inbound marketing.

Hubspot CRM offers a CRM platform that is connected to all the different channels and touchpoints where the modern sales process takes place. Channels like:

  • Landing pages

  • email

  • phone calls

  • social media

By using Hubspot CRM it is possible to control your content, channels and marketing performance from one single platform, making it much easier to have a birds eye view of your sales process. You can create customised views, it offers deal and task dashboards and it connects directly with Hubspot Marketing so you can effortlessly capture, score and hand off leads to your sales team.

Hubspot Marketing which was also the first product of Hubspot, offers the complete marketing funnel to its user, from attracting users to closing customers with less hassle and more control of the whole process. It focuses mainly on inbound marketing techniques related to the following:

  • Blogging

  • SEO

  • Social Media

  • Landing pages

  • Email campaigns

It allows you to easily perform lead management, it supports marketing automation like triggering mails when specific events happen on your landing page. It offers rich analytics and of course it integrates seamlessly with Hubspot CRM.

It is clear that the Hubspot products generate a large number of valuable data related to your business, product and customers. While it offers analytic services, you might want to run some more engaged analysis with your Hubspot data, or merge these data with other sources like your ticketing system, your transaction database and logs or with user generated events from your product. Hubspot exposes a rich ecosystem of tools and APIs for interacting with its platform and it is possible by using these APIs to pull the data that are valuable to you from it.

About Microsoft Azure SQL Data Warehouse

SQL Data Warehouse is the data warehousing solution that you can use if you are a user of Microsoft Azure. It’s an elastic data warehousing as a service solution, emphasising it’s enterprise focus. It also speaks SQL like the previous two solutions and it supports querying both relational and non-relational data.  It offers a number of enterprise-class features like support for hybrid cloud installations and strong security. It’s probably the less mature solution compared to the two others though, it’s still in “Preview” mode although accessible to existing Azure subscribers.

Extract your data from Hubspot

The Hubspot APIs are following the REST architecture, that can be accessed through HTTP. As a RESTful API, interacting with it an be achieved by using tools like CURL or Postman or Apirise or by using http clients for your favourite language or framework.

A few suggestions:

Responses of the API are all in JSON, including errors although HTTP codes are also returned to indicate errors. Hubspot maintains some official SDKs for their APIs, like the hapipy client for python, also un official clients can be found.

Hubspot API Authentication

The Hubspot API allows two types of authentication. OAuth and API keys. Hubspot encourages the use of oAuth for any serious integration and suggests to use basic authentication with API keys only for testing and rapid prototyping purposes.

Hubspot API rate limiting

The HubSpot public endpoints are powered by the same underlying technology that powers the core HubSpot application. As a result, HubSpot engineering closely monitors usage of the public APIs to ensure a quality experience for users of the HubSpot application.

Below, you’ll find the limits by which a single integration (as identified by an access token) can consume the HubSpot public APIs.

  1. Integrations cannot consume the HubSpot API at a rate greater than 10 requests/second.
  2. Polling of HubSpot APIs should occur at intervals of 5 minutes or more.
  3. Total requests to the HubSpot APIs should not exceed 10,000 in a 24 hour period.
  4. Failed requests to the HubSpot APIs may not exceed 5% of total syncs.
  5. All data passed to HubSpot must be properly encoded, and use application/json formatting.
  6. Integrations should use HubSpot’s OAuth protocol.
  7. Integrators must store time-to-live (TTL) data for OAuth access tokens as well as refresh tokens. Unauthorized (401) requests are not a valid indicator that a new access token must be retrieved.
  8. Integrators should use their own public and documented APIs when working with the HubSpot APIs.
  9. We reserve the right to change or deprecate the APIs over time – we will provide developers ample notification in those cases.

Endpoints and available Resources

The Hubspot API is actually a collection of a large number of APIs. Something that make sense if we consider that Hubspot is actually more than one products and each one of these products are quite complex on its own. The APIs are the following:

  • Calendar API. Anything that has to do with calendars in Hubspot and their possible operations.
  • Companies APIs. When we talk about business and customers we talk about companies, so here is your API for interacting with companies.
  • Companies properties APIs. Companies are important for Hubspot and there’s a lot of functionality around them so there’s an auxiliary API just for working with properties of companies.
  • Contacts APIs. Contacts are the fundamental building block to HubSpot – they store lead-specific data that makes it possible to leverage much of the functionality in HubSpot, from marketing automation, to lead scoring to smart content.
  • Contact Lists APIs. API for managing the lists of your contacts.
  • Contact properties APIs. Similar to companies, this API allows you to interact with the properties of your contacts.
  • COS Blog API. Interact with Blogs throughs the Hubspot platform.
  • COS Blog Authors API. API for interacting with the authors of your blogs.
  • COS Blog Comments API. Exposes functionality related to the comments of your APIs.
  • COS Blog Posts API. Anything related to the posts of your Blog.
  • COS Blog Topics API. Manage the topics of your Blog.
  • COS Domains API. Manage your Domains through this API.
  • COS Files API. Operations related to file management inside Hubspot.
  • COS Layouts API. Manage the layouts of your pages through this API.
  • COS Page Publishing API. Operations related to publishing content through the Hubspot platform.
  • COS Sitemaps API. Operations related to the management of sitemaps for the sites you create through the Hubspot platform.
  • COS Templates API. API for managing the templates of your sites.
  • COS URL Mappings API. Operations related to URL mappings for the sites inside Hubspot.
  • Deals API. Anything that has to do with deals inside your CRM.
  • Deal Pipelines API. Manage the sales pipelines through this API.
  • Deals Properties API. Again, manage the properties of your deals.
  • Email API. Anything related to emailing from within the hubspot platform.
  • Email Events API. Track and interact with events that happen inside emails.
  • Engagements API. Anything related to customer engagement inside the hubspot platform.
  • Events API. Event handling for Hubspot.
  • Forms API. Manage custom forms that you create inside the platform.
  • Keywords API. Operations related to keywords for SEO.
  • Owners API. Anything related to Owner.
  • Social Media API. API for interacting with Social media through the Hubspot platform.
  • Transactional Email API. The transactional email functionality of Hubspot.
  • Workflows API. Define and manage sales and marketing workflows.

From all the above endpoints we can pull data out of the platform, so it is easy to understand the richness of the data we can get from a platform like Hubspot. Let’s assume, as an example, that we want to get all the Deals data. By executing a GET request like this GET /deals/v1/deal/recent/modified we can get all the recently modified deals. The parameters that we can pass to the call are the following:

  • count: for specifying the number of results per page of the response.
  • offset: for paginating through the all available results.
  • since: a timestamp for defining from which exact time you would like to fetch data from.

As we said earlier, results from the Hubspot API are always in JSON, so if we successfully execute the above query we’ll get the following results back:

{
    "results": [
        {
            "portalId": 62515,
            "dealId": 1030663,
            "isDeleted": false,
            "associations": {
                "associatedVids": [
                    27316
                ],
                "associatedCompanyIds": [

                ],
                "associatedDealIds": [

                ]
            },
            "properties": {
                "dealstage": {
                    "value": "closedwon",
                    "timestamp": 1417686612442,
                    "source": "API",
                    "sourceId": null,
………

The API offers you the opportunity to get very granular data about your accounting activities and use it for analytic and reporting purposes.

Load Data from Hubspot to SQL Data Warehouse

SQL Data Warehouse support numerous options for loading data, such us:

  • PolyBase
  • Azure Data Factory
  • BCP command-line utility
  • SQL Server integration services

As we are interested in loading data from online services by using their exposed HTTP APIs, we are not going to consider the usage of BCP command-line utility or SQL server integration in this guide. We’ll consider the case of loading our data as Azure storage Blobs and then use PolyBase to load the data into SQL Data Wareho use.

Accessing these services happens through HTTP APIs, as we see again APIs play an important role in both the extraction but also the loading of data into our data warehouse. You can access these APIs by using a tool like CURL or Postman. Or use the libraries provided by Microsoft for your favourite language. Before you actually upload any data you have to create a container which is something similar as a concept to the Amazon AWS Bucket, creating a container is a straight forward operation and you can do it by following the instructions found on the Blog storage documentation from Microsoft. As an example, the following code can create a container in Node.js.

blobSvc.createContainerIfNotExists('mycontainer', function(error, result, response){
  if(!error){
    // Container exists and allows
    // anonymous read access to blob
    // content and metadata within this container
  }
});

After the creation of the container you can start uploading data to it by using again the given SDK of your choice in a similar fashion:

blobSvc.createBlockBlobFromLocalFile('mycontainer', 'myblob', 'test.txt', function(error, result, response){
  if(!error){
    // file uploaded
  }
});

When you are done putting your data into Azure Blobs you are ready to load it into SQL Data Warehouse using PolyBase. To do that you should follow the directions in the Load with PolyBase documentation. In a summary the required steps to do it, are the following:

  • create a database master key
  • create a database scoped credentials
  • create an external file format
  • create an external data source

PolyBase’s ability to transparently parallelize loads from Azure Blob Storage will make it the fastest tool for loading data. After configuring PolyBase, you can load data directly into your SQL Data Warehouse by simply creating an external table that points to your data in storage and then mapping that data to a new table within SQL Data Warehouse.

Of course you will need to establish a recurrent process that will extract any newly created data from your service, load them in the form of Azure Blobs and initiate the PolyBase process for importing the data again into SQL Data Warehouse. One way of doing this is by using the Azure Data Factory service. In case you would like to follow this path you can read some good documentation on how to move data to and from Azure SQL Warehouse using Azure Data Factory.

 

The best way to load data from Hubspot to SQL Data Warehouse and possible alternatives

So far we just scraped the surface of what can be done with Microsoft Azure SQL Data Warehouse and how to load data into it. The way to proceed relies heavily on the data you want to load, from which service they are coming from and the requirements of your use case. Things can get even more complicated if you want to integrate data coming from different sources. A possible alternative, instead of writing, hosting and maintaining a flexible data infrastructure, is to use a product like Blendo that can handle this kind of problems automatically for you.

 Blendo integrates with multiple sources or services like databases, CRM, email campaigns, analytics and more.  Quickly and safely move all your data from Recurly into SQL Data Warehouse and start generating insights from your data.