How to load data from Stripe to Google BigQuery

Blendo Team

This post helps you with loading your data from Stripe to BigQuery. If you are looking to get analytics-ready data without the manual hassle you can integrate Stripe to BigQuery with Blendo, so you can focus on what matters, getting value out of your transaction data.

What we will see:

Extract data from Stripe

Stripe is an API first product, it’s a unified set of APIs and tools that instantly enables businesses to accept and manage online payments. It is a web API following the RESTful principles, they try to use as many as possible HTTP build in features to make it accessible to off-the-shelf HTTP clients and the serialization they support for their responses is JSON. They also have two different types of keys used for authentication, one for testing mode and one for live mode, using the testing mode key it becomes easy to test every aspect of the API without messing with your real data. Also, keep in mind that the calls you make to the Stripe API have to be over HTTPS only for security reasons, plain HTTP calls will fail, same happens for non-authenticated calls, so do not forget to use your testing mode key in case you want to experiment with the API.

Currently, the Stripe API is built around the following ten core resources:

  • Balance – an object that represents your stripe balance.
  • Charges – to charge a credit or debit card you create a charge
  • Customers – Customer objects allow you to perform recurring charges and track multiple charges that are associated with the same customer.
  • Dispute – A dispute occurs when a customer questions your charge with their bank or credit card company.
  • Events – Events are our way of letting you know when something interesting happens in your account.
  • File uploads – There are various times when you’ll want to upload files to Stripe (for example, when uploading dispute evidence).
  • Refunds – Refund objects allow you to refund a charge that has previously been created but not yet refunded.
  • Tokens – Tokens can be created with your publishable API key.
  • Transfers – When Stripe sends you money or you initiate a transfer to a bank account
  • Transfer reversals – A previously created transfer can be reversed if it has not yet been paid out.

All of the above resources support CRUD operations by using HTTP verbs on their associated endpoints. As a web API, you can access it using by using tools like CURL or Postman or your favorite http client for the language or framework of your choice. Some options are the following:

There’s also a large number of libraries that wrap around the Stripe API and offer an easier way to interact with it, both community developed and from Stripe. For more information, you can check the libraries section in the API documentation.

You will need more time to read this post than integrating Stripe to BigQuery.

Effortlessly Sync All Your Stripe Data to BigQuery

Stripe and any other service that you might be using, has figured out (hopefully) the optimal model for its operations, but when we fetch data out of them we usually want to answer questions or do things that are not part of the context that these services operate, something that makes these models sub-optimal for your analytic needs. For this reason, we should always keep in mind that when we work with data coming from external services we need to re-model it and bring it to the right form for our needs.

So let’s assume that we want to perform some churn analysis for our company and to do that we need customer data that indicate when they have canceled their subscriptions. To do that we’ll have to request the customer objects that Stripe holds for our company. We  can do that with the following command:

curl https://api.stripe.com/v1/charges?limit=3 -u sk_test_BQokikJOvBiI2HlWgH4olfQ2:

and a typical response will look like the following:

{ "object": "list", "url": "/v1/charges", "has_more": false, "data": [ { "id": "ch_17SY5f2eZvKYlo2CiPfbfz4a", "object": "charge", "amount": 500, "amount_refunded": 0, "application_fee": null, "balance_transaction": "txn_17KGyT2eZvKYlo2CoIQ1KPB1", "captured": true, "created": 1452627963, "currency": "usd", "customer": null, "description": "thedude@grepinnovation.com Account Credit", "destination": null, "dispute": null, "failure_code": null, "failure_message": null, "fraud_details": { }, …….

Inside the customer object there’s a list of subscription objects that look like the following JSON document:

{ "id": "sub_7hy2fgATDfYnJS", "object": "subscription", "application_fee_percent": null, "cancel_at_period_end": false, "canceled_at": null, "current_period_end": 1455306419, "current_period_start": 1452628019, "customer": "cus_7hy0yQ55razJrh", "discount": null, "ended_at": null, "metadata": { }, "plan": { "id": "gold2132", "object": "plan", "amount": 2000, "created": 1386249594, "currency": "usd", "interval": "month", "interval_count": 1, "livemode": false, "metadata": { }, "name": "Gold ", "statement_descriptor": null, "trial_period_days": null }, "quantity": 1, "start": 1452628019, "status": "active", "tax_percent": null, "trial_end": null, "trial_start": null }

These objects together with part of the customer object, contain the information we need to perform churn analysis. Of course, we’ll have to extract all the information we need, map it to the schema of our data warehouse repository and then load the data to it following the instructions of this post.

About Stripe

Stripe is the best way to accept payments online. Stripe aims to expand internet commerce by making it easy to process transactions and manage an online business. They want to increase the GDP of the internet. Enabling more transactions is a problem rooted in code and design, not finance. Stripe is built for developers, makers, and creators. In almost every front, it was becoming easier to build and launch an online business. Payments, however, remained dominated by clunky legacy players. It seemed clear that there should be a developer-focused, instant-setup payment platform that would scale to any size. Stripe launched in September 2011.

Stripe now processes billions of dollars a year for thousands of businesses, from newly-launched start-ups to Fortune 500 companies. Since Stripe powers so many new businesses, it’s a snapshot of how the internet is changing; many users are in categories that barely existed five years ago.

Stream data using the Stripe API to your data warehouse

It is also possible to setup a streaming data infrastructure that will collect data out of Stripe and push them into data warehouse in a streaming fashion. This can be achieved by using the webhooks functionality that Stripe supports, you register some events to it and everytime something happens, Stripe will push a message to your webhook. For more information about that, check the API documentation on webhooks.

Prepare your Stripe Data for Google BigQuery

Before you load any data into BigQuery, you should make sure that it is presented in a format supported by it, so for example if the API you pull data out returns XML you have to first transform it into a serialization that BigQuery understands. Currently two data formats are supported:

You also need to make sure that the data types you are using are the ones supported by BigQuery, which are the following:

  • STRING
  • INTEGER
  • FLOAT
  • BOOLEAN
  • RECORD
  • TIMESTAMP

for more information please check the Preparing Data for BigQuery page on the documentation.

Load from Stripe to Google BigQuery

If you want to load data by using Stripe to BigQuery, you have to use one of the following supported data sources.

  1. Google Cloud Storage
  2. Sent data directly to BigQuery with a POST request
  3. Google Cloud Datastore Backup
  4. Streaming insert
  5. App Engine log files
  6. Cloud Storage logs

From the above list of sources, 5 and 6 are not applicable in our case.

For Google Cloud Storage, you first have to load data into it, there are a few options on how to do this, for example, you can use the console directly as it is described here and do not forget to follow the best practices. Another option is to post data through the JSON API, as we see again APIs play an important role in both the extraction but also the loading of data in our data warehouse.. In it’s simplest case it’s just a matter of one HTTP POST request using a tool like CURL or Postman. It should look like the following example.

POST /upload/storage/v1/b/myBucket/o?uploadType=media&name=myObject HTTP/1.1 Host: www.googleapis.com Content-Type: application/text Content-Length: 
number_of_bytes_in_file
 Authorization: Bearer 
your_auth_token your Stripe data

and if everything went ok, you should get something like the following as a response from the server:

HTTP/1.1 200 Content-Type: application/json { "name": "myObject" }

Working with Curl or Postman, is good only for testing, if you would like to automate the process of loading every data into Bigquery, you should write some code to send data you own to Google Cloud Storage. In case you are developing on the Google App Engine you can use the library that is available for the languages that are supported by it:

  1. Python
  2. Java
  3. PHP
  4. Go

If you are using one of the above languages and you are not coding for the Google App Engine, you can use it to access the Cloud Storage from your environment. Interacting such a feature rich product like Google Cloud Storage can become quite complicated depending on your use case, for more details on the different options that exist you can check Cloud Storage documentation. If you are looking for a less engaged and more neutral way of using Cloud Storage, you can consider a solution like Blendo.

After you have loaded data to Cloud Storage, you have to create a Load Job for BigQuery to actually load the data into it, this Job should point to the source data in Storage on the Cloud that have to be imported, this happens by providing source URIs that point to the appropriate objects.

The previous method described, used a POST request to the Google Storage API on the Cloud for storing the data there and then load it into BigQuery. Another way to go is to do a direct HTTP POST request to BigQuery with the data you would like to query. This approach is similar to how we loaded the data to the Cloud in Google Storage through the JSON API, but it uses the appropriate end-points of BigQuery and load the data there directly. The way to interact with it is quite similar, for more information can be found on the BigQuery API Reference and on the page that describes how you can load data by BigQuery using POST. You can interact with it using the HTTP client library of the language or framework of your choice, a few options are:

The best way to load data from Stripe to BigQuery

So far we just scraped the surface of what you can do with BigQuery and how you can load data into it. Things can get even more complicated if you want to integrate data coming from different sources.

Are you striving to achieve results right now?

Instead of writing, hosting and maintaining a flexible data infrastructure use Blendo that can handle everything automatically for you.

Blendo with one click integrates with sources or services, creates analytics-ready data and syncs your Stripe to BigQuery right away.

Help your sales and executive team take ownership of the insights of your payments' data that live in Stripe.

Blendo is the easiest way to automate powerful data integrations.

Try Blendo free for 14 days. No credit card required.