How to load data from Hubspot to Google BigQuery

Blendo Team

This post helps you with loading your data from Hubspot to BigQuery. If you are looking to get analytics-ready data without the manual hassle you can integrate Hubspot to BigQuery with Blendo, so you can focus on what matters, getting value out of your customers’ data.

Extract your data from Hubspot

The Hubspot APIs are following the REST architecture, that can be accessed through HTTP. As a RESTful API, interacting with it can be achieved by using tools like CURL or Postman or by using http clients for your favourite language or framework.

A few suggestions:

You will need more time to read this post than integrating Hubspot to BigQuery.

Effortlessly Sync All Your Hubspot Data to BigQuery

Responses of the API are all in JSON, including errors although HTTP codes are also returned to indicate errors. Hubspot maintains some official SDKs for their APIs, like the hapipy client for python, also un official clients can be found.

Hubspot API Authentication

Hubspot API allows two types of authentication. OAuth and API keys. Hubspot encourages the use of oAuth for any serious integration and suggests to use basic authentication with API keys only for testing and rapid prototyping purposes.

Hubspot API rate limiting

HubSpot public endpoints are powered by the same underlying technology that powers the core HubSpot application. As a result, HubSpot engineering closely monitors usage of the public APIs to ensure a quality experience for users of a HubSpot application.

Below, you’ll find the limits by which a single integration (as identified by an access token) can consume HubSpot public APIs.

  1. Integrations cannot consume the HubSpot API at a rate greater than 10 requests/second.
  2. Polling of HubSpot APIs should occur at intervals of 5 minutes or more.
  3. Total requests to the HubSpot APIs should not exceed 10,000 in a 24 hour period.
  4. Failed requests to the HubSpot APIs may not exceed 5% of total syncs.
  5. All data passed to HubSpot must be properly encoded, and use application/json formatting.
  6. Integrations should use HubSpot’s OAuth protocol.
  7. Integrators must store time-to-live (TTL) data for OAuth access tokens as well as refresh tokens. Unauthorized (401) requests are not a valid indicator that a new access token must be retrieved.
  8. Integrators should use their own public and documented APIs when working with HubSpot APIs.
  9. We reserve the right to change or deprecate the APIs over time – we will provide developers ample notification in those cases.

Endpoints and available Resources

Hubspot API is actually a collection of a large number of APIs. Something that make sense if we consider that Hubspot is actually more than one products and each one of these products are quite complex on its own. The APIs are the following:

  • Calendar API. Anything that has to do with calendars in Hubspot and their possible operations.
  • Companies APIs. When we talk about business and customers we talk about companies, so here is your API for interacting with companies.
  • Companies properties APIs. Companies are important for Hubspot and there’s a lot of functionality around them so there’s an auxiliary API just for working with properties of companies.
  • Contacts APIs. Contacts are the fundamental building block to HubSpot – they store lead-specific data that makes it possible to leverage much of the functionality in HubSpot, from marketing automation, to lead scoring to smart content.
  • Contact Lists APIs. API for managing the lists of your contacts.
  • Contact properties APIs. Similar to companies, this API allows you to interact with the properties of your contacts.
  • COS Blog API. Interact with Blogs throughs the Hubspot platform.
  • COS Blog Authors API. API for interacting with the authors of your blogs.
  • COS Blog Comments API. Exposes functionality about the comments of your APIs.
  • COS Blog Posts API. Anything related to the posts of your Blog.
  • COS Blog Topics API. Manage the topics of your Blog.
  • COS Domains API. Manage your Domains through this API.
  • COS Files API. Operations concerning file management inside Hubspot.
  • COS Layouts API. Manage the layouts of your pages through this API.
  • COS Page Publishing API. Operations linked with publishing content through Hubspot’s platform.
  • COS Sitemaps API. Operations related to the management of sitemaps for the sites you create through the Hubspot platform.
  • COS Templates API. API for managing the templates of your sites.
  • COS URL Mappings API. Operations for URL mappings for the sites inside Hubspot.
  • Deals API. Anything that has to do with deals inside your CRM.
  • Deal Pipelines API. Manage the sales pipelines through this API.
  • Deals Properties API. Again, manage the properties of your deals.
  • Email API. Anything about emailing from within the hubspot platform.
  • Email Events API. Track and interact with events that happen inside emails.
  • Engagements API. Anything related to customer engagement inside the hubspot platform.
  • Events API. Event handling for Hubspot.
  • Forms API. Manage custom forms that you create inside the platform.
  • Keywords API. Operations about keywords for SEO.
  • Owners API. Anything related to Owner.
  • Social Media API. API for interacting with Social media through Hubspot’s platform.
  • Transactional Email API. The transactional email functionality of Hubspot.
  • Workflows API. Define and manage sales and marketing workflows.

From all the above endpoints we can pull data out of the platform, so it is easy to understand the richness of the data we can get from a platform like Hubspot. Let’s assume, as an example, that we want to get all the Deals data. By executing a GET request like this GET /deals/v1/deal/recent/modified we can get all the recently modified deals. The parameters that we can pass to the call are the following:

  • count: for specifying the number of results per page of the response.
  • offset: for paginating through the all available results.
  • since: a timestamp for defining from which exact time you would like to fetch data from.

As we said earlier, results from a Hubspot API are always in JSON, so if we successfully execute the above query we’ll get the following results back:

{
    "results": [
        {
            "portalId": 62515,
            "dealId": 1030663,
            "isDeleted": false,
            "associations": {
                "associatedVids": [
                    27316
                ],
                "associatedCompanyIds": [

                ],
                "associatedDealIds": [

                ]
            },
            "properties": {
                "dealstage": {
                    "value": "closedwon",
                    "timestamp": 1417686612442,
                    "source": "API",
                    "sourceId": null,
………

The API offers you the opportunity to get very granular data about your accounting activities and use it for analytic and reporting purposes.

About Hubspot
Import your Hubspot data into your data warehouse - Blendo.co

Load from Hubspot to BigQuery

Hubspot is a suite of software offering two different customer related products:

  • Hubspot CRM. For Customer Relations Management.
  • Hubspot Marketing. Marketing platform, focusing mainly on inbound marketing.

Hubspot CRM offers a CRM platform that is connected to all the different channels and touchpoints where the modern sales process takes place. Channels like:

  • Landing pages
  • email
  • phone calls
  • social media

By using Hubspot CRM it is possible to control your content, channels and marketing performance from one single platform, making it much easier to have a birds eye view of your sales process. You can create customised views, it offers deal and task dashboards and it connects directly with Hubspot Marketing so you can effortlessly capture, score and hand off leads to your sales team.

Hubspot Marketing which was also the first product of Hubspot, offers the complete marketing funnel to its user, from attracting users to closing customers with less hassle and more control of the whole process. It focuses mainly on inbound marketing techniques related to the following:

  • Blogging
  • SEO
  • Social Media
  • Landing pages
  • Email campaigns

It allows you to easily perform lead management, it supports marketing automation like triggering mails when specific events happen on your landing page. It offers rich analytics and of course it integrates seamlessly with Hubspot CRM.

It is clear that any Hubspot products generate a large number of valuable data related to your business, product and customers. While it offers analytic services, you might want to run some more engaged analysis with your Hubspot data, or merge these data with other sources like your ticketing system, your transaction database and logs or with user generated events from your product. Hubspot exposes a rich ecosystem of tools and APIs for interacting with its platform and it is possible by using these APIs to pull the data that are valuable to you from it.

Prepare your data to be sent from Hubspot to Google BigQuery

Before you load any data into BigQuery, you should make sure that it is presented in a format supported by it, so for example if the API you pull data from returns XML you have to first transform it into a serialisation that BigQuery understands. Currently two data formats are supported:

You also need to make sure that the data types you are using are the ones supported by BigQuery, which are the following:

  • STRING
  • INTEGER
  • FLOAT
  • BOOLEAN
  • RECORD
  • TIMESTAMP

for more information please check the Preparing Data for BigQuery page on the documentation.

About Google BigQuery

 

BigQuery is the data warehousing solution of Google. It’s part of the Google Cloud Platform and it also speaks SQL like Redshift does. Queries are executed against append-only tables using the processing power of Google’s infrastructure. It is also fully managed and is offered as a service over the cloud. You can interact with BigQuery through its web UI, using a command line tool while a variety of client libraries exist so you can interact with through your application.

Load Data using Hubspot to Google BigQuery

If you want to load any data using Hubspot to BigQuery, you have to use one of the following supported data sources.

  1. Google Cloud Storage
  2. Sent data directly to BigQuery with a POST request
  3. Google Cloud Datastore Backup
  4. Streaming insert
  5. App Engine log files
  6. Cloud Storage logs

From the above list of sources, 5 and 6 are not applicable in our case.

For Google Cloud Storage, you first have to load some data into it, there are a few options on how to do this, for example you can use the console directly as it is described here and do not forget to follow the best practices. Another option is to post your data through the JSON API, as we see again APIs play an important role in both the extraction but also the loading of data into our data warehouse.. In it’s simplest case it’s just a matter of one HTTP POST request using a tool like CURL or Postman. It should look like the following example.

POST /upload/storage/v1/b/myBucket/o?uploadType=media&name=myObject HTTP/1.1
Host: www.googleapis.com
Content-Type: application/text
Content-Length:
number_of_bytes_in_file
Authorization: Bearer
your_auth_token your Hubspot data

and if everything went ok, you should get something like the following as a response from the server:

HTTP/1.1 200
Content-Type: application/json
{
"name": "myObject"
}

Working with Curl or Postman, is good only for testing, if you would like to automate the process of loading data into Bigquery, you should write some code to send your data to Google Storage in the Cloud. In case you are developing on the Google App Engine you can use the library that is available for the languages that are supported by it:

  1. Python
  2. Java
  3. PHP
  4. Go

If you are using one of the above languages and you are not coding for the Google App Engine, you can use it to access the Cloud Storage from your environment. Interacting such a feature rich product like Cloud Storage can become quite complicated depending on your use case, for more details on the different options that exist you can check Cloud Storage documentation. If you are looking for a less engaged and more neutral way of using Cloud Storage, you can consider a solution like Blendo.

After you have loaded your data into the Storage in Google Cloud, you have to create a Load Job for BigQuery to actually load the data into it, this Job should point to the source data in Cloud Storage that have to be imported, this happens by providing source URIs that point to the appropriate objects.

The previous method described, used a POST request to the Google Cloud for Storage API for storing the data there and then load it into BigQuery. Another way to go is to do a direct HTTP POST request to BigQuery with the data you would like to query. This approach is similar to how we loaded the data to Storage in the Google Cloud through the JSON API, but it uses the appropriate end-points of BigQuery to load the data there directly. The way to interact with it is quite similar, for more information can be found on the Google BigQuery API Reference and on the page that describes how you can load data into BigQuery using POST. You can interact with, by using the HTTP client library of the language or framework of your choice, a few options are:

The best way to load data from Hubspot to BigQuery

So far we just scraped the surface of what you can do with BigQuery and how you can load data into it. Things can get even more complicated if you want to integrate data coming from different sources.

Are you striving to achieve results right now?

Instead of writing, hosting and maintaining a flexible data infrastructure use Blendo that can handle everything automatically for you.

Blendo with one click integrates with sources or services, creates analytics-ready data and syncs your Hubspot to BigQuery right away.

Help your marketing and executive team take ownership of the insights that live in your Hubspot marketing platform to transform the performance and ROI of your campaigns.

Blendo is the easiest way to automate powerful data integrations.

Try Blendo free for 14 days. No credit card required.