How to load data from Xero to Google BigQuery

Blendo Team

This post helps you with loading your data from Xero to BigQuery. If you are looking to get analytics-ready data without the manual hassle you can integrate Xero to BigQuery with Blendo, so you can focus on what matters, getting value out of your accounting data.

What we will see:

Extract data from Xero

Xero has an excellent API, or to be more precise a number of APIs, and encourages developers to build applications that can be sold on their add-on marketplace. The APIs that they expose are the following:

  • Xero Core (Accounting) API – exposes accounting and related functions of the main Xero application and can be used for a variety of purposes such as creating transactions like invoices and credit notes, right through to extracting accounting data via our reports endpoint.
  • Xero Payroll API – exposes payroll related functions of Payroll in Xero and can be used for a variety of purposes such as syncing employee details, importing timesheets etc.
  • Files API – provides access to the files, folders, and the association of files within a Xero organization.
  • Fixed Assets API – which is under review, this feature is not yet available but users can vote for it to become publicly available.
  • Xero Practice Manager API – a recently released product built on the WorkflowMax product. Which is an API for managing workflows.

In this post we’ll focus on the Xero Core (Accounting) API. Which exposes the core accounting functionalities of the Xero product. The API of Xero is a RESTful web service and uses the OAuth (v1.0a) protocol to authenticate 3rd party applications. As a RESTful API, interacting with it an be achieved by using tools like CURL or Postman or by using http clients for your favourite language or framework. A few suggestions:

As an a product and consequently an API that has to deal with sensitive data, Xero API takes really good care of security. For this reason there are a number of different applications that can be developed and integrate with it, where the main difference is how the application authenticates, how often the tokens expire and in general security related aspects. For more about the different applications types, you can consult the application types guides on their documentation.

You will need more time to read this post than integrating Xero to BigQuery.

Effortlessly Sync All Your Xero Data to BigQuery

Xero API requests limits

The API of Xero has three different types of limits that enforces on the usage of their API. It’s extremely important to keep those in mind when developing against its API and a reason for many headaches when someone attempts to build an infrastructure for extracting data out of it.

  • Daily limit – of 1000 API calls per organization.
  • Requests per minute – each OAuth access token can be used up to 60 times in any 60 second period. This rate limit is based on a rolling 60 second window.
  • Request Size Limit – A single POST to the Accounting or Payroll APIs has a size limit of 5MB.

For more information about the API limitations, please consult the documentation for API limits.

Xero API Resources

The Xero API has a very rich data model of 31 resources. It is important to know that by default the response type of the API calls is of type text/xml but you can override this option and request JSON responses if preferred.

Requesting data by a Xero API

Let’s assume that you would like to retrieve all the invoices that you have issued through Xero and put the information in a data warehouse you own to perform analytics and reporting. To do that you should perform a GET request to the https://api.xero.com/api.xro/2.0/Invoices endpoint. A typical result, in XML,  from performing such an action is like the following:

<Invoices> <Invoice> <Type>ACCREC</Type> <Contact> <ContactID>025867f1-d741-4d6b-b1af-9ac774b59ba7</ContactID> <ContactStatus>ACTIVE</ContactStatus> <Name>City Agency</Name> <Addresses> <Address> <AddressType>STREET</AddressType> </Address> <Address> <AddressType>POBOX</AddressType> <AddressLine1>L4, CA House</AddressLine1> <AddressLine2>14 Boulevard Quay</AddressLine2> <City>Wellington</City> <PostalCode>6012</PostalCode> </Address> </Addresses> <Phones> <Phone> <PhoneType>DEFAULT</PhoneType> </Phone> <Phone> <PhoneType>DDI</PhoneType> </Phone> <Phone> <PhoneType>MOBILE</PhoneType> </Phone> <Phone> <PhoneType>FAX</PhoneType> </Phone> </Phones> <UpdatedDateUTC>2009-08-15T00:18:43.473</UpdatedDateUTC> <IsSupplier>false</IsSupplier> <IsCustomer>true</IsCustomer> </Contact> <Date>2009-05-27T00:00:00</Date> <DueDate>2009-06-06T00:00:00</DueDate> <Status>AUTHORISED</Status> <LineAmountTypes>Exclusive</LineAmountTypes> <LineItems> <LineItem> <Description>Onsite project management </Description> <Quantity>1.0000</Quantity> <UnitAmount>1800.00</UnitAmount> <TaxType>OUTPUT</TaxType> <TaxAmount>225.00</TaxAmount> <LineAmount>1800.00</LineAmount> <AccountCode>200</AccountCode> <Tracking> <TrackingCategory> <TrackingCategoryID>e2f2f732-e92a-4f3a9c4d-ee4da0182a13</TrackingCategoryID> <Name>Activity/Workstream</Name> <Option>Onsite consultancy</Option> </TrackingCategory> </Tracking> <LineItemID>52208ff9-528a-4985-a9ad-b2b1d4210e38</LineItemID> </LineItem> </LineItems> <SubTotal>1800.00</SubTotal> <TotalTax>225.00</TotalTax> <Total>2025.00</Total> <UpdatedDateUTC>2009-08-15T00:18:43.457</UpdatedDateUTC> <CurrencyCode>NZD</CurrencyCode> <InvoiceID>243216c5-369e-4056-ac67-05388f86dc81</InvoiceID> <InvoiceNumber>OIT00546</InvoiceNumber> <Payments> <Payment> <Date>2009-09-01T00:00:00</Date> <Amount>1000.00</Amount> <PaymentID>0d666415-cf77-43fa-80c7-56775591d426</PaymentID> </Payment> </Payments> <AmountDue>1025.00</AmountDue> <AmountPaid>1000.00</AmountPaid> <AmountCredited>0.00</AmountCredited> </Invoice> </Invoices>

It is possible to paginate your results by using the paging support of the Xero API. Which is very useful when you have to work with a large number of invoices. Also, it is possible to request from the API only the latest invoices. This is done by providing the “Modified After” parameter on the GET request to the API. The ModifiedAfter filter is actually an HTTP header: ‘If-Modified-Since‘.

A UTC timestamp (yyyy-mm-ddThh:mm:ss) . Only invoices created or modified since this timestamp will be returned e.g. 2009-11-12T00:00:00.

Xero, exposes a very rich API which offers you the opportunity to get very granular data about your accounting activities and use it for analytic and reporting purposes. This richness comes with a price though, a large number of resources that have to be handled where some of them allow to fetch updates and some other not.

About Xero

Load data from Xero

Load from Xero to BigQuery

Xero provides online accounting software and services for small and medium businesses. It includes a full accrual accounting system with a cashbook, automated daily bank feeds, invoicing, debtors, creditors, sales tax and reporting. Xero’s Software as a Service business model where software is hosted on the Internet as an innovative means of supplying solutions to large fragmented markets such as small businesses. Xero releases new features approximately every month. Additions and extensions to functionality are developed in response to the needs of new customers and new industries. Xero is a Microsoft Gold Certified Partner. The Xero software is developed using Microsoft’s .NET development environment, over a multi-tenanted SQL Server 2008 database, on a Windows 2008 production server infrastructure. Their production servers are hosted by Rackspace, a tier-one NASDAQ listed hosting provider, which monitors their servers and firewalls 24/7/365 at guarded facilities. Xero has also implemented a global Content Delivery Network with Akamai for faster application delivery to customers around the world.

Prepare your Xero Data for Google BigQuery

Before you load any data into BigQuery, you should make sure that it is presented in a format supported by it, so for example if the API you pull data, returns XML you have to first transform it into a serialisation that BigQuery understands. Currently two data formats are supported:

You also need to make sure that the data types you are using are the ones supported by BigQuery, which are the following:

  • STRING
  • INTEGER
  • FLOAT
  • BOOLEAN
  • RECORD
  • TIMESTAMP

for more information please check the Preparing Data for BigQuery page on the documentation.

About Google BigQuery

BigQuery is the data warehousing solution of Google. It’s part of the Google Cloud Platform and it also speaks SQL like Redshift does. Queries are executed against append-only tables using the processing power of Google’s infrastructure. It is also fully managed and is offered as a service over the cloud. You can interact with it through its web UI, using a command line tool while a variety of client libraries exist so you can interact with it through your application.

Load Data from Xero to Google BigQuery

If you want to load Xero data to Google BigQuery, you have to use one of the following supported data sources.

  1. Google Cloud Storage
  2. Sent data directly to BigQuery with a POST request
  3. Google Cloud Datastore Backup
  4. Streaming insert
  5. App Engine log files
  6. Cloud Storage logs

From the above list of sources, 5 and 6 are not applicable in our case.

For Google Cloud Storage, you first must load your own data into it, there are a few options on how to do this, for example you can use the console directly as it is described here and do not forget to follow the best practices. Another option is to post data through the JSON API, as we see again APIs play an important role in both the extraction but also the loading of data into our data warehouse.. In it’s simplest case it’s just a matter of one HTTP POST request using a tool like CURL or Postman. It should look like the following example.

POST /upload/storage/v1/b/myBucket/o?uploadType=media&name=myObject 
HTTP/1.1 Host: www.googleapis.com 
Content-Type: application/text Content-Length: number_of_bytes_in_file
Authorization: Bearer your_auth_token your Xero data

and if everything went ok, you should get something like the following as a response from the server:

HTTP/1.1 200 Content-Type: application/json { "name": "myObject" }

Working with Curl or Postman, is good only for testing, if you would like to automate the process of loading data into Google Bigquery, you should write some code to send your data to Google Storage in the Cloud. In case you are developing on the Google App Engine you can use the library that is available for the languages that are supported by it:

  1. Python
  2. Java
  3. PHP
  4. Go

If you are using one of the above languages and you are not coding for the Google App Engine, you can use it to access the Cloud Storage from your environment. Interacting such a feature rich product like Cloud Storage can become quite complicated depending on your use case, for more details on the different options that exist you can check Google Cloud Storage documentation. If you are looking for a less engaged and more neutral way of using Storage in the Cloud, you can consider a solution like Blendo.

After you have loaded your data into Cloud Storage, you have to create a Load Job for BigQuery to actually load data into it, this Job should point to the source data in Cloud Storage that have to be imported, this happens by providing source URIs that point to the appropriate objects.

The previous method described, used a POST request to the Google Storage API for storing the data there and then load it into BigQuery. Another way to go is to do a direct HTTP POST request to BigQuery with every data you would like to query. This approach is similar to how we loaded data to Google Storage in the Cloud through the JSON API, but it uses the appropriate end-points of BigQuery to load the data there directly. The way to interact with it is quite similar, for more information can be found on the Google BigQuery API Reference and on the page that describes how you can load any data into BigQuery using POST. You can interact with it using the HTTP client library of the language or framework of your choice, a few options are:

The best way to load data from Xero to BigQuery

So far we just scraped the surface of what you can do with BigQuery and how you can load any data you own into it. Things can get even more complicated if you want to integrate data coming from different sources.

Are you striving to achieve results right now?

Instead of writing, hosting and maintaining a flexible data infrastructure use Blendo that can handle everything automatically for you.

Blendo with one click integrates with sources or services, creates analytics-ready data and syncs your Xero to Bigquery right away.

Help your accounting and executive team take ownership of the data that live inside Xero to better consolidate your accounting data.

Blendo is the easiest way to automate powerful data integrations.

Try Blendo free for 14 days. No credit card required.