How to load data from Magento to MS SQL Server

Blendo Team

Magento is one of the most popular e-commerce platforms. How can you combine your Magento data with other sources to gain new insights. Let’s see how to get the data we have on Magento to a MS SQL Server Analytics data warehouse.

This post will help you define a process or pipeline, for getting your e-commerce related data from Magento and load it into SQL Server for further analysis. We will see how to access and extract your data from Magento through its API and how to load it into a SQL Server database.

This process requires from you to write the code to get the data and make sure that this process will run every time new data are generated. Alternatively, you may checkout Blendo to load your data from Magento to MS SQL Server in minutes for you.

Extract your data from Magento

Magento exposes its platform through both a REST and a SOAP interface. Both can be used to pull data from it, which is also the scope of this article, but also to interact with the platform. By using these interfaces, developers create rich applications and plugins for Magento. In this post we will use the REST version of the Magento platform.

As a Web API following the RESTful architecture principles, it can be accessed through HTTP. As a RESTful API, interacting with it an be achieved by using tools like CURL or Postman or by using http clients for your favourite language or framework. A few suggestions:

  • Apache HttpClient for Java
  • Spray-client for Scala
  • Hyper for Rust
  • Ruby rest-client
  • Python http-client

Magento does not publish official SDKs but by using the SOAP interface it is possible to automatically generate clients that can act as SDKs for your favourite language or platform. For example in Java, you can create a client in Eclipse by providing the WSDL file that Magento exposes after you setup the platform.

Magento API Authentication

Magento is a self hosted platform, unless you are using the Enterprise cloud edition, so you have much more control over its access than other solutions, but if you want to access its data through the REST API that it has, then you will have to use oAuth for authentication which is supported by Magento.

Magento rate limiting

As a platform hosted on your own premises it doesn’t really imposes any rate limiting. In any case as you would like to avoid stressing your e-commerce platform that is facing your customers you should make sure that your pipeline process does not over stress your Magento installation. But this is completely at your discretion.

Endpoints and available resources

Magento exposes the following resources:

  • Products. Retrieve the list of products, create, update, delete a product.
  • Product categories. Retrieve the list of categories assigned to a product, assign and unassign the category from a product.
  • Product images. Retrieve the list of websites assigned to a product, assign, unassign a website to/from a product.
  • Customers. Retrieve the list of customers, create, delete a customer, and update the customer information.
  • Customer Addresses. Retrieve the list of customer addresses, create, update, and delete the customer address.
  • Inventory. Retrieve the list of stock items, update required stock items.
  • Sales Orders. Retrieve the list of sales orders with detailed information on order addresses, items, and comments.
  • Order Addresses. Retrieve information on the specified order comments.
  • Order Items. Retrieve information on specified order items.

The API is possible to return either JSON or XML responses, this is something that you can control by providing the appropriate Accept header content type.

For all the above resource we can request from the Magento platform to pull out a list of results with all the associated data, so ideally we would like to pull all the data an make sure that we keep them up to date on our analytics platform of choice for further analysis. For this post we will just see how we can pull data for one resource, the Sales Orders, the process is the same for all other resources.

To pull data for the Sales Orders, we need to execute a get request to the following endpoint:

http://magentohost/api/rest/orders

As a platform hosted by you, you need to replace the “magentohost” part of the URL with the actual URL of the host that has Magento running. The rest of the URL is the same as the above. The default response is in XML and looks like the following:

<?xml version="1.0"?>
<magento_api>
  <data_item_1>
    <customer_id>3</customer_id>
    <base_discount_amount>0.0000</base_discount_amount>
    <base_shipping_amount>455.0000</base_shipping_amount>
    <base_shipping_tax_amount>0.0000</base_shipping_tax_amount>
    <base_subtotal>13650.0000</base_subtotal>
    <base_tax_amount>0.0000</base_tax_amount>
    <base_total_paid></base_total_paid>
    <base_total_refunded></base_total_refunded>
    <tax_amount>0.0000</tax_amount>
    <total_paid></total_paid>
    <total_refunded></total_refunded>
    <base_shipping_discount_amount>0.0000</base_shipping_discount_amount>
    <base_subtotal_incl_tax>13650.0000</base_subtotal_incl_tax>
    <base_total_due>14105.0000</base_total_due>
    <total_due>14105.0000</total_due>
    <base_currency_code>USD</base_currency_code>
    <tax_name></tax_name>
    <tax_rate></tax_rate>
    <addresses>
      <data_item>
        <region>Palau</region>
        <postcode>19103</postcode>
        <lastname>Doe</lastname>
        <street>2356 Jody Road Philadelphia, PA 19103</street>
        <city>PA</city>
        <telephone>610-634-1181</telephone>
        <country_id>US</country_id>
        <firstname>John</firstname>
        <address_type>billing</address_type>
        <prefix></prefix>
        <middlename></middlename>
        <suffix></suffix>
        <company></company>
      </data_item>
      <data_item>
        <region>Palau</region>
        <postcode>19103</postcode>
        <lastname>Doe</lastname>
        <street>2356 Jody Road Philadelphia, PA 19103</street>
        <city>PA</city>
        <telephone>610-634-1181</telephone>
        <country_id>US</country_id>
        <firstname>John</firstname>
        <address_type>shipping</address_type>
        <prefix></prefix>
        <middlename></middlename>
        <suffix></suffix>
        <company></company>
      </data_item>
    </addresses>
    <order_items>
      <data_item>
        <sku>Sunglasses_1</sku>
        <price>150.0000</price>
        <base_price>150.0000</base_price>
        <base_original_price>150.0000</base_original_price>
        <tax_percent>0.0000</tax_percent>
        <tax_amount>0.0000</tax_amount>
        <base_tax_amount>0.0000</base_tax_amount>
        <base_discount_amount>0.0000</base_discount_amount>
        <base_row_total>13650.0000</base_row_total>
        <base_price_incl_tax>150.0000</base_price_incl_tax>
        <base_row_total_incl_tax>13650.0000</base_row_total_incl_tax>
      </data_item>
    </order_items>
  </data_item_1>
  <data_item_2>
    <customer_id>3</customer_id>
    <base_discount_amount>0.0000</base_discount_amount>
    <base_shipping_amount>95.0000</base_shipping_amount>
    <base_shipping_tax_amount>0.0000</base_shipping_tax_amount>
    <base_subtotal>3350.0000</base_subtotal>
    <base_tax_amount>0.0000</base_tax_amount>
    <base_total_paid>2445.0000</base_total_paid>
    <base_total_refunded>1845.0000</base_total_refunded>
    <tax_amount>0.0000</tax_amount>
    <total_paid>2445.0000</total_paid>
    <total_refunded>1845.0000</total_refunded>
    <base_shipping_discount_amount>0.0000</base_shipping_discount_amount>
    <base_subtotal_incl_tax>3350.0000</base_subtotal_incl_tax>
    <base_total_due>1000.0000</base_total_due>
    <total_due>1000.0000</total_due>
    <base_currency_code>USD</base_currency_code>
    <tax_name></tax_name>
    <tax_rate></tax_rate>
    <addresses>
      <data_item>
        <region>Palau</region>
        <postcode>19103</postcode>
        <lastname>Doe</lastname>
        <street>2356 Jody Road Philadelphia, PA 19103</street>
        <city>PA</city>
        <telephone>610-634-1181</telephone>
        <country_id>US</country_id>
        <firstname>John</firstname>
        <address_type>billing</address_type>
        <prefix></prefix>
        <middlename></middlename>
        <suffix></suffix>
        <company></company>
      </data_item>
      <data_item>
        <region>Palau</region>
        <postcode>19103</postcode>
        <lastname>Doe</lastname>
        <street>2356 Jody Road Philadelphia, PA 19103</street>
        <city>PA</city>
        <telephone>610-634-1181</telephone>
        <country_id>US</country_id>
        <firstname>John</firstname>
        <address_type>shipping</address_type>
        <prefix></prefix>
        <middlename></middlename>
        <suffix></suffix>
        <company></company>
      </data_item>
    </addresses>
    <order_items>
      <data_item>
        <sku>Sunglasses_1</sku>
        <price>150.0000</price>
        <base_price>150.0000</base_price>
        <base_original_price>150.0000</base_original_price>
        <tax_percent>0.0000</tax_percent>
        <tax_amount>0.0000</tax_amount>
        <base_tax_amount>0.0000</base_tax_amount>
        <base_discount_amount>0.0000</base_discount_amount>
        <base_row_total>1350.0000</base_row_total>
        <base_price_incl_tax>150.0000</base_price_incl_tax>
        <base_row_total_incl_tax>1350.0000</base_row_total_incl_tax>
      </data_item>
      <data_item>
        <sku>Sun_glasses</sku>
        <price>200.0000</price>
        <base_price>200.0000</base_price>
        <base_original_price>200.0000</base_original_price>
        <tax_percent>0.0000</tax_percent>
        <tax_amount>0.0000</tax_amount>
        <base_tax_amount>0.0000</base_tax_amount>
        <base_discount_amount>0.0000</base_discount_amount>
        <base_row_total>2000.0000</base_row_total>
        <base_price_incl_tax>200.0000</base_price_incl_tax>
        <base_row_total_incl_tax>2000.0000</base_row_total_incl_tax>
      </data_item>
    </order_items>
  </data_item_2>
</magento_api>

As we can see, we get back a list of items with each one representing an order, that contains all the information that we would like to use for further analysis. information like the discount that we might have applied, the taxes paid, the base price of the order etc. As we might have many order objects to retrieve, we should paginate through the results. To do that, we need to provide the “page” and “limit” parameters to our GET request.

Now that we have the results from our Magento shop, we can further process it before we are able to load it into the BI platform of our choice.

Magento Data Preparation for Microsoft SQL Server

As in every relational database, SQL Server requires a well defined database schema before we start populating with data. Data is organized in schemas, which are distinct namespaces where database objects belong to.

The most common database objects are of course tables which have a number of columns with each one having a declared data type. MS SQL Server supports a large number of different data types. Which give us great flexibility in expressing the data that we have and at the same time optimizing our data warehouse.

When working with data coming from web services, where data is usually serialized in JSON, it is important to correctly map the data to the right data types. As changing the data types in the future is a process that might cost in downtime of your database, it is important to spend enough time thinking about the proper data type assignments.

For example, dates in JSON are just strings, but when storing date objects in a database, we can enhance analytics with great capabilities by transforming the raw string data into an appropriate date type. A typical strategy for loading data from Magento to an SQL Server database, is to create a schema where you will map each API endpoint to a table. Each key inside the Magento API endpoint response should be mapped to a column of that table and you should ensure the right conversion to an SQL Server compatible data type.

Of course you will need to ensure that as the data types from the Magento API might change, you will adapt your database tables accordingly, there’s no such thing as automatic data type casting. After you have a complete and well defined data model or schema for Microsoft SQL Server, you can move forward and start loading your data into the database.

Load data from Magento to MS SQL Server

As a feature rich and mature product, MS SQL Server offers a large and diverse set of methods for loading data into a database. One way of importing data into your database is by using the SQL Server Import and export Wizard. With it and through a visual interface you will be able to bulk load data from a number of data sources that are supported.

You can import data from another SQL Server, from an Oracle database, from Flat Files, from an Access Data Source, PostgreSQL, MySQL and finally Azure Blob Storage. Especially if you are using a managed version of MS SQL Server on Azure, you should definitely consider utilizing the Azure Blob Storage connection.

In this way, you will be loading data as Blobs on Azure and your MS SQL  Server database will sync with it through the Import and Export Wizard.

Another way for importing bulk data into an SQL Server, both on Azure and on premises, is by using the bcp utility. This is a command line tool that is built specifically for bulk loading and unloading of data from an MS SQL database.

Finally and for compatibility reasons, especially if you are managing databases from different vendors, you can you BULK INSERT SQL statements.

In a similar way and as it happens with the rest of the databases, you can also use the standard INSERT statements, where you will be adding data row-by-row directly to a table. It is the most basic and straight forward way of adding data into a table but it doesn’t scale very well with larger data sets.

So for bulk datasets, you better consider one of the previous methods.

Updating your Magento data on MS SQL Server

As you will be generating more data on Magento, you will need to update your older data on an MS SQL Server database. This includes new records together with updates to older records that for any reason have been updated on Magento.

You will need to periodically check Magento for new data and repeat the process that has been described previously, while updating your currently available data if needed. Updating an already existing row on a SQL Server table, is achieved by creating UPDATE statements.
Another issue that you need to take care of is the identification and removal of any duplicate records on your database. Either because Magento does not have a mechanism to identify new and updated records or because of errors on your data pipelines, duplicate records might be introduced to your database.

In general, ensuring the quality of the data that is inserted in your database is a big and difficult issue and MS SQL Server features like TRANSACTIONS can help tremendously, although they do not solve the problem in the general case.

The best way to load data from Magento to MS SQL Server and possible alternatives

So far we just scraped the surface of what can be done with MS SQL Server and how to load data into it. The way to proceed relies heavily on the data you want to load, from which service they are coming from and the requirements of your use case.

Things can get even more complicated if you want to integrate data coming from different sources. A possible alternative, instead of writing, hosting and maintaining a flexible data infrastructure, is to use a product like Blendo that can handle this kind of problems automatically for you.

Blendo integrates with multiple sources or services like databases, CRM, email campaigns, analytics and more. Quickly and safely move all your data from Magento to Microsoft SQL Server and start generating insights from your data.

About Magento

Magento is an e-commerce platform built on open source technology which provides online merchants with a flexible shopping cart system, as well as control over the look, content and functionality of their online store. Magento offers powerful marketing, search engine optimisation, and catalog-management tools. Some of its main characteristics are the following:

  • Feature richMagento is very rich in functionality and offers an in-depth and powerful platform.
  • Powerful SEO. Magento is well known for its SEO capabilities, it offers one of the, out of the box, SEO optimisation for the stores hosted on it.
  • Magento, ensures that your store can seamlessly grow along with your business.
  • Flexibility. Its template based architecture allows you to pretty customise everything.
  • Magento community edition is open source and free to use.
  • SecurityMagento is build with security in its core.
  • User friendly. The administration area exposes a simple back end with intuitive navigation and well organised store management features.
  • Community. Being open source, guarantees a healthy ecosystem to support and further develop the platform.

Magento offers three different versions of its platform:

  • Community edition. It’s the open source and free version of Magento.
  • Enterprise edition. Magento Enterprise Edition is designed to empower merchants to rapidly innovate and deliver engaging experiences to customers across all channels and devices.
  • Enterprise cloud edition. The Enterprise Edition of the platform as a service delivered over the cloud.

About Microsoft SQL Server

Microsoft SQL Server, is one of the oldest and most mature database systems. Its first version was introduced about 28 years ago, in 1989 and Microsoft has been consistently supporting and extending the product until today.
So, it’s no surprise that Microsoft SQL Server has one of the richest feature sets among the currently available database systems.

SQL Server is delivered in different editions or flavors. With the most notable being the Enterprise edition which can manage databases as large as as 524 petabytes utilizing up to 12 terabytes of memory and 640 CPU processors. A free and scaled down version which is called Express.

A Business Intelligence version focusing on use cases where BI is performed on premises. This version is actually a bundle of different products, including the core database system, together with other Microsoft related products than can be used for BI purposes like visualization and data management.

In addition, there are also plenty of specialized versions of the database like the Compact edition that can be used on small devices and of course the Azure version, which is the cloud based edition of SQL Server. Microsoft SQL Server, incorporates a modular architecture that can extend the database with additional services. Replication services can extend the database to a cluster version and thus help with scaling and fault tolerance.

The SQL Server Analysis Services, augment the database with with OLAP and data mining capabilities, making the database ideal for the workloads that we care about in this guide.