Transfer · To Tasks

To tasks handle all data exports from your data lake to a destination. You can configure the items below. Click on the name for more info.

Extract

The extract part of the configuration considers the data to be selected from your data lake that will be uploaded to a destination.

Extract types

Below are the types of extract. Click on an extract type for an explanation.

extract: database (BigQuery)

Extract data from a BigQuery table.

Example usage

Example of extracting the data for the to_aws_sqs task.

source: database
query: | 
        SELECT 
            'https://sqs.eu-west-1.amazonaws.com/12345678909/my-message-queue' AS QueueUrl,
            '{"run_id": "{{ task.run_id }}", "task_type": "{{ task.type }}"}' AS MessageBody,
            123 AS DelaySeconds
propertytypeoptionaldescription
conn_idstringyesName of the connection. If not declared, client_cloud.db_conn_id is used.
sourceenumerator (database, storage)yesDefault is database
querystringnoUse either query or template (below). Query to be executed, whose results will be uploaded to the destination.
templatestringnoUse either query(above) or template. Contains a link to a file in the `includes` folder in your repository that contains the SQL statement. This query will be executed and the results will be uploaded to the destination.
project_idstringyesProject ID of the destination table. If not declared, client_cloud.project_id is used.
project_idstringyesProject ID of the destination table. If not declared, client_cloud.project_id is used.
dataset_idstringyesDataset ID of the destination table. If not declared, client_cloud.dataset_id is used.
table_idstringyesTable ID of the destination table. If not declared, task.id is used.
use_legacy_sqlyesno (boolean)yesDefault is `no`. If legacy SQL should be used.
paramsobjectyesParameters that can be set. Useful for templating.

extract: database (Snowflake)

Extract data from a Snowflake table.

Example usage

Example of extracting for the to_aws_sqs task.

source: database
query: | 
        SELECT 
            'https://sqs.eu-west-1.amazonaws.com/12345678909/my-message-queue' AS QueueUrl,
            '{"run_id": "{{ task.run_id }}", "task_type": "{{ task.type }}"}' AS MessageBody,
            123 AS DelaySeconds
propertytypeoptionaldescription
sourceenumerator (database, storage)yesDefault is database
conn_idstringyesName of the connection. If not declared, client_cloud.db_conn_id is used.
querystringnoUse either query or template (below). Query to be executed, whose results will be uploaded to the destination.
templatestringnoUse either query(above) or template. Contains a link to a file in the `includes` folder in your repository that contains the SQL statement. This query will be executed and the results will be uploaded to the destination.
databasestringyesDatabase of the destination table. If not declared, client_cloud.database is used.
schemastringyesSchema of the destination table. If not declared, client_cloud.schema is used.
tablestringyesTable of the destination table. If not declared, task.id.upper() is used.
paramsobjectyesParameters that can be set. Useful for templating.

extract: storage

Example usage

Example of extracting the data files with prefix s3://my-test-bucket/some/folder/part-

extract:
    source: storage
    conn_id: s3
    bucket: my-test-bucket
    prefix: some/folder/part-

Extracts data files from Amazon S3, Google Cloud Storage or Azure Blob Storage (in beta).

propertytypeoptionaldescription
sourceenumerator (database, storage)noSet to storage
conn_idstringyesName of the connection. If not declared, client_cloud.storage_conn_id is used.
bucketstringyesName of the bucket. If not declared, client_cloud.bucket is used.
prefixstringyesPrefix of the file(s).
project_idstringyesGoogle Cloud only. Project ID of the bucket. If not declared, client_cloud.project_id is used.

Load

The `load` part of the configuration considers loading the data to external destinations. The vocabulary of the destination is leading in this part. Will Transfer talk about fields and records, it will use columns and rows here if the data destination uses those.

We identify two types of data destinations: management and data uploads. Management tasks sync a table with a destination. For example a table with all DoubleClick display ads including text and bidding amounts with DV360. Data upload tasks are tasks that upload data to an endpoint. An example of this is for example uploading a file to an external FTP server or uploading audiences to Facebook.

Transfer gets new tasks every month. Currently transfer supports the following to tasks:

  1. to_aws_s3
  2. to_aws_sqs
  3. to_dcm
  4. to_doubleclick_offline_conversions_v2
  5. to_facebook
  6. to_facebook_custom_audience
  7. to_facebook_offline_conversions_v2
  8. to_google_analytics_data_import
  9. to_google_analytics_management
  10. to_google_measurement_protocol_v3
  11. to_storage
  12. to_ftp
  13. to_xandr
  14. to_xandr_server_side_segmentation