How to load data from Stripe to SQL Data Warehouse

Blendo Team

How may I load data from Stripe to SQL Data Warehouse? The purpose of this guide is to help you define a process or pipeline, for getting your data from Stripe and load it into SQL Data Warehouse for further analysis. Information will be given on how yo access and extract your data from Stripe through its API and how to load it into SQL Data Warehouse, this process requires from you to write the code to get the data and make sure that this process will run every time new data are generated. Alternatively you can use products like Blendo that can handle this kind of problems automatically for you.

“]

About Stripe

Stripe is the best way to accept payments online. Stripe aims to expand internet commerce by making it easy to process transactions and manage an online business. They want to increase the GDP of the internet. Enabling more transactions is a problem rooted in code and design, not finance. Stripe is built for developers, makers, and creators. In almost every front, it was becoming easier to build and launch an online business. Payments, however, remained dominated by clunky legacy players. It seemed clear that there should be a developer-focused, instant-setup payment platform that would scale to any size. Stripe launched in September 2011.

Stripe now processes billions of dollars a year for thousands of businesses, from newly-launched start-ups to Fortune 500 companies. Since Stripe powers so many new businesses, it’s a snapshot of how the internet is changing; many users are in categories that barely existed five years ago.

Extract your data from Stripe

Stripe is an API first product, it’s a unified set of APIs and tools that instantly enables businesses to accept and manage online payments. It is a web API following the RESTful principles, they try to use as many as possible HTTP build in features to make it accessible to off-the-self HTTP clients and the serialization they support for their responses is JSON. They also have two different types of keys used for authentication, one for testing mode and one for live mode, using the testing mode key it becomes easy to test every aspect of the API without messing with your real data. Also, keep in mind that they calls you make to the Stripe API has to be over HTTPS only for security reasons, plain HTTP calls will fail, same happens for non authenticated calls, so do not forget to use your testing mode key in case you want to experiment with the API.

Currently the Stripe API is built around the following ten core resources:

  • Balance – an object that represents your stripe balance.
  • Charges – to charge a credit or debit card you create a charge
  • Customers – Customer objects allow you to perform recurring charges and track multiple charges that are associated with the same customer.
  • Dispute – A dispute occurs when a customer questions your charge with their bank or credit card company.
  • Events – Events are our way of letting you know when something interesting happens in your account.
  • File uploads – There are various times when you’ll want to upload files to Stripe (for example, when uploading dispute evidence).
  • Refunds – Refund objects allow you to refund a charge that has previously been created but not yet refunded.
  • Tokens – Tokens can be created with your publishable API key.
  • Transfers – When Stripe sends you money or you initiate a transfer to a bank account
  • Transfer reversals – A previously created transfer can be reversed if it has not yet been paid out.

All of the above resources support CRUD operations by using HTTP verbs on their associated endpoints. As a web API, you can access it using  by using tools like CURL or Postman or Apirise or your favorite http client for the language or framework of your choice. Some options are the following:

There’s also a large number of libraries that wrap around the Stripe API and offer an easier way to interact with it, both community developed and from Stripe. For more information you can check the libraries section in the API documentation.

Stripe and any other service that you might be using, has figured out (hopefully) the optimal model for its operations, but when we fetch data from them we usually want to answer questions or do things that are not part of the context that these services operate, something that makes these models sub-optimal for your analytic needs. For this reason we should always keep in mind that when we work with data coming from external services we need to re-model it and bring it to the right form for our needs.

So let’s assume that we want to perform some churn analysis for our company and to do that we need customer data that indicate when they have cancelled their subscriptions. To do that we’ll have to request the customer objects that Stripe holds for our company. We  can do that with the following command:

curl https://api.stripe.com/v1/charges?limit=3 
   -u sk_test_BQokikJOvBiI2HlWgH4olfQ2:

 

and a typical response will look like the following:

{
  "object": "list",
  "url": "/v1/charges",
  "has_more": false,
  "data": [
    {
      "id": "ch_17SY5f2eZvKYlo2CiPfbfz4a",
      "object": "charge",
      "amount": 500,
      "amount_refunded": 0,
      "application_fee": null,
      "balance_transaction": "txn_17KGyT2eZvKYlo2CoIQ1KPB1",
      "captured": true,
      "created": 1452627963,
      "currency": "usd",
      "customer": null,
      "description": "thedude@grepinnovation.com Account Credit",
      "destination": null,
      "dispute": null,
      "failure_code": null,
      "failure_message": null,
      "fraud_details": {
      }, …….

Inside the customer object there’s a list of subscription objects that look like the following JSON document:

{
  "id": "sub_7hy2fgATDfYnJS",
  "object": "subscription",
  "application_fee_percent": null,
  "cancel_at_period_end": false,
  "canceled_at": null,
  "current_period_end": 1455306419,
  "current_period_start": 1452628019,
  "customer": "cus_7hy0yQ55razJrh",
  "discount": null,
  "ended_at": null,
  "metadata": {
  },
  "plan": {
    "id": "gold2132",
    "object": "plan",
    "amount": 2000,
    "created": 1386249594,
    "currency": "usd",
    "interval": "month",
    "interval_count": 1,
    "livemode": false,
    "metadata": {
    },
    "name": "Gold ",
    "statement_descriptor": null,
    "trial_period_days": null
  },
  "quantity": 1,
  "start": 1452628019,
  "status": "active",
  "tax_percent": null,
  "trial_end": null,
  "trial_start": null
}

These objects together with part of the customer object, contain the information we need to perform churn analysis. Of course we’ll have to extract all the information we need, map it to the schema of our data warehouse repository and then load the data to it following the instructions of this post.

Stream data from the Stripe API to your data warehouse

It is also possible to setup a streaming data infrastructure that will collect data from Strip and push them into your data warehouse in a streaming fashion. This can be achieved by using the webhooks functionality that Stripe supports, you register some events to it and everytime something happens, Stripe will push a message to your webhook. For more information about that, check the API documentation on webhooks.

Load Data from Stripe to SQL Data Warehouse

SQL Data Warehouse support numerous options for loading data, such us:

  • PolyBase
  • Azure Data Factory
  • BCP command-line utility
  • SQL Server integration services

As we are interested in loading data from online services by using their exposed HTTP APIs, we are not going to consider the usage of BCP command-line utility or SQL server integration in this guide. We’ll consider the case of loading our data as Azure storage Blobs and then use PolyBase to load the data into SQL Data Wareho use.

Accessing these services happens through HTTP APIs, as we see again APIs play an important role in both the extraction but also the loading of data into our data warehouse. You can access these APIs by using a tool like CURL, Postman or Apirise. Or use the libraries provided by Microsoft for your favourite language. Before you actually upload any data you have to create a container which is something similar as a concept to the Amazon AWS Bucket, creating a container is a straight forward operation and you can do it by following the instructions found on the Blog storage documentation from Microsoft. As an example, the following code can create a container in Node.js.

blobSvc.createContainerIfNotExists('mycontainer', function(error, result, response){
  if(!error){
    // Container exists and allows
    // anonymous read access to blob
    // content and metadata within this container
  }
});

 

After the creation of the container you can start uploading data to it by using again the given SDK of your choice in a similar fashion:

blobSvc.createBlockBlobFromLocalFile('mycontainer', 'myblob', 'test.txt', function(error, result, response){
  if(!error){
    // file uploaded
  }
});

When you are done putting your data into Azure Blobs you are ready to load it into SQL Data Warehouse using PolyBase. To do that you should follow the directions in the Load with PolyBase documentation. In a summary the required steps to do it, are the following:

  • create a database master key
  • create a database scoped credentials
  • create an external file format
  • create an external data source

PolyBase’s ability to transparently parallelize loads from Azure Blob Storage will make it the fastest tool for loading data. After configuring PolyBase, you can load data directly into your SQL Data Warehouse by simply creating an external table that points to your data in storage and then mapping that data to a new table within SQL Data Warehouse.

Of course you will need to establish a recurrent process that will extract any newly created data from your service, load them in the form of Azure Blobs and initiate the PolyBase process for importing the data again into SQL Data Warehouse. One way of doing this is by using the Azure Data Factory service. In case you would like to follow this path you can read some good documentation on how to move data to and from Azure SQL Warehouse using Azure Data Factory.

The best way to load data from Stripe to SQL Data Warehouse and possible alternatives

So far we just scraped the surface of what can be done with Microsoft Azure SQL Data Warehouse and how to load data into it. The way to proceed relies heavily on the data you want to load, from which service they are coming from and the requirements of your use case. Things can get even more complicated if you want to integrate data coming from different sources. A possible alternative, instead of writing, hosting and maintaining a flexible data infrastructure, is to use a product like Blendo that can handle this kind of problems automatically for you.

Blendo integrates with multiple sources or services like databases, CRM, email campaigns, analytics and more. Quickly and safely move all your data from Stripe into SQL Data Warehouse and start generating insights from your data.