Tasks · Overview

Tasks refer the individual data tasks that form a job. Onesecondbefore divides the world in three parts: from, do and to.

  • From tasks
    `From` tasks handle all data imports. Onesecondbefore can handle all systems as long as they support data export. If your system is not in our current list, contact our support team to request the data import. All `from` tasks perform an extract (from the datasource), transform (manipulate the incoming data if needed), load (to your data lake) and validate (the result) `From` tasks come with pre-defined and described schemas (field comments in the table) where possible. `From` tasks also handle the deduplication of the target table, to make sure that you don't import the same data twice.
  • Do tasks
    `Do` tasks handle all intra data lake tasks. Typical use-cases are to load files from storage to a table or run a query and save the results in a table. `Do` tasks also contain process flow tasks, like the do_zilch or do_continue.
  • To tasks
    `To` tasks handle all data uploads to data destinations. We identify 2 categories: managment & data uploads. Management tasks sync an online spreadsheet (e.g. with audience or campaign information like max click cost) with a marketing platform. This is especially useful for mass updates. You can use a single source (your data lake) and manage all your marketing platforms with 1 press on a button. Data upload tasks upload data to external systems. Examples are uploading a file to an external FTP server (to_ftp) or uploading audiences to Facebook or DoubleClick.

Task

The `task` part of the configuration considers task related settings. It can be configured for all task types.

Example usage

task:
    type: from_google_analytics_v3
    start_date: yesterday -3 days
    end_date: today

Properties

propertytypeoptionaldescription
typeenumeratornoContains type of task. Must be one of:
  1. from_appfigures
  2. from_aws_s3
  3. from_bigquery
  4. from_bing
  5. from_dcm
  6. from_dcbm
  7. from_facebook
  8. from_google_ads
  9. from_google_analytics_v3
  10. from_google_analytics_management
  11. from_google_drive
  12. from_google_search_console
  13. from_imap
  14. from_imap_message_counter
  15. from_looker
  16. from_url
  17. from_salesforce
  18. from_snowflake
  19. from_ftp
  20. from_xandr

The items below are discussed in more detail in the Do section

  1. do
  2. do_continue
  3. do_google_cloud
  4. do_capture_profiles
  5. do_capture_sessionization_v5
  6. do_zilch

The items below are discussed in more detail in the To section

  1. to_aws_s3
  2. to_aws_sqs
  3. to_dcm
  4. to_doubleclick_offline_conversions_v2
  5. to_facebook
  6. to_facebook_custom_audience
  7. to_facebook_offline_conversions_v2
  8. to_google_analytics_data_import
  9. to_google_analytics_management
  10. to_google_measurement_protocol_v3
  11. to_storage
  12. to_ftp
  13. to_xandr
  14. to_xandr_server_side_segmentation
idstringyesDefault value is the filename without extension. Unique name for the task.
trigger_datestringread-onlyTimestamp when the job (not the task) was triggered in the local timezone. Useful in deduplicating tables and in SQL templates.
run_idstringread-onlyUnique id per run. Every time a task runs, it receives a unique 8 character alphanumeric string.
tmp_dirstringread-onlyTemporary folder on the worker machine where data will be stored during it's lifetime. Once the task is done, the worker and all data on it will be irreversibly deleted.
start_datestring, date or date & timeyes Start date of the period that will be selected in the datasource. Can be filled with an absolute or relative date. Read more about relative date and time here.
end_datestring, date or date & timeyes End date of the period that will be selected in the datasource. Can be filled with an absolute or relative date. Read more about relative date and time here.
loop_byenumerator (year, month, week, day, hour, minute, second, file, list)yesLoop the task depending on the enumerator value. If year, month, week, day, hour, minute or second the loop will add an equal time frame to the start_date until the end_date is reached. If list the loop will cycle through the values in the loop_list. If file the loop will cycle through each file on a data source. This is especially usefull when downloading a large data files in many different chunks.
loop_listarrayyesContains a list of values to loop through.
loop_valuestring or intread-onlyContains the actual value of the loop when loop_by is used. Automatically set by Transfer.
loop_indexintread-onlyStarts at 0. Contains the number of the loop. Use in combination with loop_by
loop_start_datedate or datetimeread-onlySet to task.start_date. Only available when loop_by=hour or loop_by=day. When used, task.start_date will be overwritten with the timeframe of the current loop.
loop_end_datedate or datetimeread-onlySet to task.end_date. Only available when loop_by=hour or loop_by=day. When used, task.end_date will be overwritten with the timeframe of the current loop.
resource_sizeenumerator (0, 1, 2, 4, 8, 16, 32, 64, 128, 256)yesDefault is 0. Resource size to use for the task. Number corresponds with the amount of CPU (0 being 0.25). The memory of the instance is 8 x resource_size Gib. E.g. a resource_size of 16 means 16 CPU with 8 x 16 = 128 Gib.