Workflows · From Tasks · from_aws_s3

Purpose

Imports data from Amazon S3. If you have an Amazon Cloud configured as your client_cloud you should probably use the do task.

Method of use

Before you can download data you have to give Workflows access. Follow the steps below:

  1. Have the Amazon Cloud administrator create an IAM account
  2. Hand him the list of buckets and folders you want to access
  3. Make sure the following policies are added:
    1. `AmazonS3ReadOnlyAccess` - Minimum required access. You can only read data, but not set the delete_after=yes to delete source files after downloading them.
    2. `AmazonS3FullAccess` - If you want to delete source files when read, or if you want to upload files to AWS S3 with the to_aws_s3 task
  4. Make sure to activate `Programmatic access`
  5. Send Access Key ID and Secret Access Key back to Onesecondbefore staff. They will add it to Workflows.
  6. You should now be able to download data cloud objects from Amazon S3

Configuration

Example usage

extract:
    conn_id: aws_s3_readonly
    bucket: onesecondbefore-demo
    # Download files in folder `my/folder` with file prefix `part-`
    prefix: my/folder/part-

Properties

propertytyperequireddescription
conn_idstringnoConnection string as handed to you by the Onesecondbefore team. Default is aws_s3
bucketstringyesContains the Amazon S3 bucket
prefixstringnoDefault is no prefix (all files in the bucket). Contains the prefix. If configured like prefix: my/folder/part-, this means that only blobs with a filename that starts with part- will be downloaded from folder my/folder
delete_afteryesno (boolean)noDefault is no. Set to yes if you want the blob to be deleted after you have imported it. This action cannot be undone. Please refer to the access settings to make sure your account has the correct policy for this.

Details

itemdescription
APIAmazon S3 REST API
Pre-formatted schemaNo. Does not come with a pre-formatted schema.