This integration will allow you to connect with Postgres and start collecting your data.
Setting up the Postgres Integration
Setting up the Postgres pipeline is really simple. Firstly you need to log into your account on Blendo and select the Postgres icon for adding a pipeline.
On the first step you should add your credentials to connect to your Postgres.
- Postgres Host is host of your Postgres instance (e.g db.xxxxx.eu-west-1.rds.amazonaws.com) or its public IP (e.g 123.45.67.89). If you connect using an SSH Tunnel, you need to use your instance’s private IP, accessible by your SSH bastion.
- Port is the port of your Postgres instance. The default Postgres port (5432) is preselected. Change this if your setup is different.
- Database is the database inside your instance, which contains the tables that need to be synced.
- Username and Password are the credentials of a Postgres user that has read access to the schemas and tables that need to be synced.
Additionally, you can select if you want to connect using SSL or using SSH tunnel. After you have filled in all the information you should click on “Next”.
On the second step of the setup you should select the database schema that the new tables will end up to. After selecting that, you should click on “Next”.
On the third step you should select your pipeline resources, each resource corresponding to a database table. If it is the first time that you setup a Postgres pipeline there will not be any available resources to select, so you have to create the resources that you want, by clicking on the “+ Create Resources” button.
In the window that opens you are asked to configure the resource. The resource is by default “table”. You should type the name of the resource and then select the database schema that includes the table that you want as a resource and the specific table that exists in the schema.
Afterwards you should select if you want the resource to have a “table id” or an “update field”. A table id is a table column with unique values, with each value identifying an item in your data. It is necessary in order to have incremental updates correctly replace old values in your destination warehouse, each time a new update is detected. An update field is a table column, with timestamp type, that is used by Blendo to detect when a table row has been updated. Incremental updates will check this value and load only rows with time values after the last time your pipeline synced.
The selection of table id and update field depends on how your database is structured, and will affect the way that Blendo will sync your data. The following examples will help make the right setup:
- Table id no / update field no: If neither a table id nor an update field is defined, your pipeline will always load the complete table, and replace any existing data in your destination table.
- Table id yes / update field no: If only a table id is selected, your pipeline will load the complete table, and update any row with a matching id. Rows in your destination table with a matching value from the source data will not be deleted.
- Table id no / update field yes: If only the update field is selected, your pipeline will detect which rows have been updated and append them to your destination table. This will effectively store in your destination table all versions of your source data as they change.
- Table id yes / update field yes: If both fields are selected, your pipeline will detect which rows have been updated, and update the respective rows in your destination table. Deleted rows in your source table cannot be detected and will not cause any rows in your destination table to be deleted.
After you have created the resource, you are back on the “Resource Selection” step. There you can:
- create a new resource by clicking on the “Create Resource” button
- select/deselect any of the existing ones by clicking on the check box on the left on the resource name
- configure a created resource by clicking on the Cog icon on the right of the resource name (Please keep in mind that in case you delete a resource, the same resource is deleted from all the pipelines that include it)
- delete a created resource by clicking on the Bin icon on the right of the resource name
- create the pipeline by clicking on the “Create” button
That’s it! Your pipeline is ready to receive data!
Your pipeline is just created. If you want to initiate the sync, you should click on the “Sync now” button on the right or wait for the sync to start at the time indicated.
Configuration of the pipeline
When selecting the pipeline, you can see some information about it. On the top right you can see the pipeline ID, the data warehouse that the data is synced to and the schema that it is using.
Below you can find three tabs:
The first tab that you can see while selecting your pipeline is the Status tab. There you can see if the pipeline is syncing at the moment and for how long or if it has stopped. Additionally, you can manually stop the pipeline. Moreover, you can see the details of the current or previous sync, specifically the status of each of the tables that are syncing, and how many records have synced.
The second tab is the Resources tab. There you can edit the resource selection for your pipeline by selecting or deselecting any of the resources, or you can create a new resource as described above,
The third tab is the Settings tab. There you can see the the account information of your Postgres pipeline and you can either pick another account or re-authenticate the pipeline.
Under that, you can see the Destination configuration and you can edit the database schema that your tables end up to.
Blendo collects the data as soon as they arrive and will sync them to your data warehouse according to your pipeline’s schedule. The default behaviour is to do this once per hour, but you can configure it to run as often as once per 15 minutes or once per day, by clicking on “Reschedule” button. Additionally, next to the “Reschedule” button you can find the “Pause” button that pauses the pipeline and keeps it paused until you manually resume it.
On the bottom of the page you can see the button “Remove”. By clicking it you delete your pipeline. This action is irreversible.
Keep in mind that any changes that may be done to the pipeline will be valid starting from the following sync and on.
Expected Postgres Data
Postgres Integration does not have a Schema, since the tables that will be created are custom and depend on the selected resources of the pipeline. The tables that will be created in your database will be named after the Name that is filled in during the resource configuration of the third step of the creation of the Postgres pipeline.