Tasks · Overview

Tasks refer the individual data tasks that form a job. Onesecondbefore divides the world in three parts: from, do and to.

From tasks
`From` tasks handle all data imports. Onesecondbefore can handle all systems as long as they support data export. If your system is not in our current list, contact our support team to request the data import. All `from` tasks perform an extract (from the datasource), transform (manipulate the incoming data if needed), load (to your data lake) and validate (the result) `From` tasks come with pre-defined and described schemas (field comments in the table) where possible. `From` tasks also handle the deduplication of the target table, to make sure that you don't import the same data twice.
Do tasks
`Do` tasks handle all intra data lake tasks. Typical use-cases are to load files from storage to a table or run a query and save the results in a table. `Do` tasks also contain process flow tasks, like the do_zilch or do_continue.
To tasks
`To` tasks handle all data uploads to data destinations. We identify 2 categories: management & data uploads. Management tasks sync an online spreadsheet (e.g. with audience or campaign information like max click cost) with a marketing platform. This is especially useful for mass updates. You can use a single source (your data lake) and manage all your marketing platforms with 1 press on a button. Data upload tasks upload data to external systems. Examples are uploading a file to an external FTP server (to_ftp) or uploading audiences to Facebook or DoubleClick.

Task

The `task` part of the configuration considers task related settings. It can be configured for all task types.

Example usage

task:
    type: from_google_analytics
    start_date: yesterday -3 days
    end_date: today

Properties

property	type	required	description
`type`	enumerator	yes	Contains type of task. Must be one of: from_appfigures from_apple_app_store_connect from_aws_s3 from_bigquery from_bing from_bluesky from_dcm from_dcbm from_dpg_datalab from_facebook from_ftp from_google_ads from_google_analytics from_google_drive from_google_search_console from_google_sheets from_imap from_linkedin from_looker from_url from_salesforce from_snowflake from_web_scraper from_x from_xandr The items below are discussed in more detail in the Do section do do_zilch do_continue do_profiles do_analytics_sessionization The items below are discussed in more detail in the To section to_aws_s3 to_aws_sns to_aws_sqs to_dcm to_dpg_datalab to_doubleclick_offline_conversions to_meta to_meta_custom_audience to_meta_offline_conversions to_google_analytics_data_import to_google_analytics_management to_google_measurement_protocol_v3 to_ftp to_xandr to_xandr_server_side_segmentation
`id`	string	yes	Default value is the filename without extension. Unique name for the task.
`trigger_date`	string	read-only	Timestamp when the job (not the task) was triggered in the local timezone. Useful in deduplicating tables and in SQL templates.
`run_id`	string	read-only	Unique id per run. Every time a task runs, it receives a unique 8 character alphanumeric string.
`tmp_dir`	string	read-only	Temporary folder on the worker machine where data will be stored during it's lifetime. Once the task is done, the worker and all data on it will be irreversibly deleted.
`start_date`	relative or absolute date or date & time	yes	Start date of the period that will be selected in the datasource. Can be filled with an absolute or relative date. Read more about relative date and time here.
`end_date`	string, date or date & time	yes	End date of the period that will be selected in the datasource. Can be filled with an absolute or relative date. Read more about relative date and time here.
`loop_by`	enumerator (year, month, week, day, hour, minute, second, file, list)	no	Loop the task depending on the enumerator value. If year, month, week, day, hour, minute or second the loop will add an equal time frame to the start_date until the end_date is reached. If list the loop will cycle through the values in the loop_list. If file the loop will cycle through each file on a data source. This is especially useful when downloading a large data files in many different chunks.
`loop_list`	array	no	Contains a list of values to loop through.
`loop_value`	string or int	read-only	Contains the actual value of the loop when loop_by is used. Automatically set by Workflows.
`loop_index`	int	read-only	Starts at 0. Contains the number of the loop. Use in combination with loop_by
`loop_start_date`	date or datetime	read-only	Set to task.start_date. Only available when loop_by=hour or loop_by=day. When used, task.start_date will be overwritten with the timeframe of the current loop.
`loop_end_date`	date or datetime	read-only	Set to task.end_date. Only available when loop_by=hour or loop_by=day. When used, task.end_date will be overwritten with the timeframe of the current loop.
`resource_size`	enumerator (0, 1, 2, 4, 8, 16, 32, 64, 128, 256)	no	Default is 0. Resource size to use for the task. Number corresponds with the amount of CPU (0 being 0.25). The memory of the instance is 8 x resource_size Gib. E.g. a resource_size of 16 means 16 CPU with 8 x 16 = 128 Gib.