This integration will allow you to connect with Amazon S3 and start collecting your data.
Setting up the Amazon S3 Integration
How to connect
In order to connect Blendo to Amazon S3, you should know in advance the name of existing S3 bucket and your AWS account should have permissions in AWS Identity Access Management (IAM) that allow you to create/modify IAM policies and roles.
How to grant access to your S3 bucket using AWS IAM
- In your AWS account, select Services on the top and then navigate to the IAM service home page.
- Select “Policies” in the menu on the left side of the page and then click on “Create Policy”.
- In the Create Policy page, click the JSON tab. Delete everything that exists inside the text field and paste the IAM policy [1]. Please make sure that you replace YOUR_s3_BUCKET_NAME with your actual S3 bucket name. Then click on “Review Policy”.
- On the Review Policy page, give the policy a name (example: blendo_s3), the description is optional and then click on “Create policy”.
Then you should create an IAM role for Blendo and apply the IAM policy from the previous step. To create the role, you’ll need the Account ID and External ID values provided by Blendo at the Authentication step of creating the Amazon S3 pipeline.
- On the IAM home page, click on “Roles” in the menu on the left side of the page. and then click on “Create Role”.
- On the Create Role page:
- In the “Select type of trusted entity” section, click the “Another AWS account” option.
- In the Account ID field, paste the Account ID from Blendo. Note: This isn’t your AWS Account ID – this is the Account ID that is displayed in Blendo app at the Authentication step of creating the S3 pipeline. Blendo’s account ID that should always be used is 762227381876.
- In the Options section, check the Require external ID box.
- In the External ID field, paste the External ID. Note: the External ID is displayed in Blendo app at the Authentication step of creating the S3 pipeline.
- Click on “Next: Permissions”.
- On the Attach permissions page:
- Search for the policy you created in the previous Step (e.g. blendo_s3_documentation)
- Once located, check the box next to it in the table.
- Click on “Next: Tags”. If you want to enter any tags, do so on the Add tags page. Otherwise, click on “Next: Review”.
- On the Review page, at the Role name field, give the role a name (example: blendo_s3_role_documentation) and then click on “Create Role”.
- Then you should find the Role you just created and click on it to get to the role Summary page. On this page:
- Click on the ‘Trust relationships’ tab.
- Click on ‘Edit trust relationship’.
- Select everything currently in the text field and delete it.
- In the text field, paste the trust relationship [2] and replace YOUR_EXTERNAL_ID with your actual External ID. Note: the External ID is displayed in Blendo app at the Authentication step of creating the S3 pipeline.
- Click on ‘Update Trust Policy’
- Copy the Role ARN of the new role you created (this should look like: arn:aws:iam::YOUR_AWS_ACCOUNT_ID:role/blendo_s3_role)
How to create your pipeline
Then you should log in to your Blendo account. In order to create a new pipeline, click on the Amazon S3 icon.
On the first step of the setup, Blendo provides you with the External ID and the Account ID that are needed on the “Create Role” page of your AWS account. You are also asked to fill in the Role ARN and the name of your Bucket in order to authenticate your pipeline. After you have filled them in you should click on “Next”.
NOTE: As the External ID is generated every time that you add an account to your Amazon s3 pipeline, please make sure that the role that you create in AWS uses the same External ID that is displayed in Blendo app.
NOTE: Blendo’s account ID that should always be used is 762227381876.
On the second step you should configure the resource of your Amazon S3 pipeline. Every Amazon S3 pipeline has only one resource, so in case you want to sync more resources with different configuration, you should create new Amazon S3 pipelines.
In order to configure the resource of your pipeline, you have to fill in the following fields:
- Resource ID: as every pipeline run syncs only updates, the resource ID is needed for the deduplication of data in your database. In case it is left empty, no deduplication will be performed.
- Resolve Type: you should select either “path” or “pattern”.
- Path/Pattern: the Path indicates the folder that Blendo accesses in order to sync the data that this folder contains. Blendo will not access the subfolders of the path. When selecting a Pattern, the Path is optional, to indicate the subfolders that Blendo will access in order to sync data. The Pattern indicates the specific pattern of files that you want to be synced (e.g. in case you want to select all files that end with .csv you should type: (.*).csv ).
- Parser: the type of file that exists in s3 and the way that data will be parsed in the destination table.
On the next step you will be asked if you want to use a prefix to your table name that will be created in your database. In case you leave it blank, the table will use the default name (“document”). You are also asked to select your database schema that the table will end up to. After you have filled them in you should click on “Next”.
Your pipeline is just created. If you want to initiate the sync, you should click on the “Sync now” button on the right or wait for the sync to start at the time indicated.
Annex
[1] IAM policy:
{ "Version":"2012-10-17", "Statement":[ { "Effect":"Allow", "Action":[ "s3:List*", "s3:Get*" ], "Resource":[ "arn:aws:s3:::YOUR_S3_BUCKET_NAME", "arn:aws:s3:::YOUR_s3_BUCKET_NAME/*" ] }, { "Effect":"Allow", "Action":[ "s3:HeadBucket" ], "Resource":"*" } ]
[2] The trust policy:
{ "Version":"2012-10-17", "Statement":[ { "Effect":"Allow", "Principal":{ "AWS":"arn:aws:iam::762227381876:user/blendo_user_etl" }, "Action":"sts:AssumeRole", "Condition":{ "StringEquals":{ "sts:ExternalId":"YOUR_EXTERNAL_ID" } } } ] }