Workflows · Do Tasks · do_views

The task with type do_views, is the task that handles view management. It will look for a directory in your repository with the name: views/. Please save all view files in the following way:

  • The file name must be DATABASE OR PROJECT_ID/SCHEMA OR DATASET_ID/VIEWNAME.sql
  • Only add the select statement of the view (everything after CREATE VIEW AS...)
  • Use extra view parameters to control behaviour like materialized views with cluster and partitions
  • You can make use of the include statement. The root of the statement will be the templates directory in your repository.

View creation process

The views are created with the following procedure:

  1. Workflows creates a list of all views that have to be created, updated or deleted. Why a view is appended to the list depends on the task configuration.
  2. The mapping parameter is used to replace the database folder name with the destination database/project id. This is needed meant to be able to create views in the production and sandbox databases.
  3. If the view should be deleted, it is deleted immediately.
  4. If the view should be created or updated, it is first created under a temporary name (and looks like _tmp_f7f87eh) in the destination database/project and schema/dataset, to check if it contains any errors. If there was an error creating the view, this will be shown in the logs and the view will be retried in a next iteration, since it might depend on a change in any of the other views in the same update batch.
  5. If there was no error in the previous step, the view will be created or replace the existing.
  6. When done with all views, Workflows will try to create the views that generated errors in the previous steps, since the error might be due to the fact that the view depends on other views that were not yet created.
  7. The task is finished when all views have been processed or the amount of views with errors equals the amount of views that were in that iteration

View configuration variables

You can set certain variables in your view that impact view creation behavior. These are used to create e.g. materialized views and set the clustering. Please refer to the Jinja2 documentation on how to set variables. The standard set of variables from the configuration YAML that you can also use in your templates can be used in the views. These are: transfer, job, task, load and client_cloud.

variabletypedatabasedescription
osb_materializedbooleanSnowflake and BigQueryDon't set or set to false for a standard view. Set to true if the view must be materialized.
osb_partition_bystringBigQueryOnly use if osb_materialized=true. Set to what you would normally set after PARTITION BY.
osb_cluster_bystringSnowflake and BigQueryOnly use if osb_materialized=true. Set to what you would normally set after CLUSTER BY.
osb_schedule_labelstringSnowflake and BigQueryLabel should match the deduplication.label in the configuration. This way Workflows knows which views to replace. Handy if you need to have your views updated for example at different moments during the day.

Example usage

{%- set osb_schedule_label = 'daily' -%}
{%- set osb_materialized = true -%}
{%- set osb_partition_by = 'DATE(dstamp)' -%}
{%- set osb_cluster_by = 'site_id' -%}

SELECT TIMESTAMP_TRUNC(tstamp, DAY) AS dstamp, site_id, hit_type, COUNT(1) AS hits
FROM `my-production-project.osb_analytics.hits`
WHERE DATE(tstamp) >= DATE('{{ task.trigger_date.strftime('%Y-%m-%d') }}')-14
    AND DATE(tstamp) < DATE('{{ task.trigger_date.strftime('%Y-%m-%d') }}')
GROUP BY 1,2,3

Root properties

The properties below can be used at root level.

Properties: task

The following properties can be set under the task parameter.

propertytyperequireddescription
typeenumerator(do_views)yesContains the task type. Must be `do_views`.

Example usage

task:
  type: do_views

Properties: deduplicate

The following properties can be set under the deduplicate parameter.

propertytyperequireddescription
typeenumerator(replace, replace_always, replace_scheduled)yesreplace = Replace view when a new version is checked in the repository. This type of deduplication is usually used after a deploy task.
always_replace = Always replaces all views. This type of deduplication is usually used in Sandbox mode.
schedule_replace = Use this when you want to have views replaced on e.g. a daily schedule. To make it work, you should add the osb_schedule_label view configuration variable to every view you want to have Workflows replaced on a schedule. You should also make sure that the value of the view configuration variable osb_schedule_label is the same as deduplication.label. This way you can have different tasks at different schedules updating different views. This type of deduplication is usually used for views that contain date filters that have to be updated on a daily basis.
labelstringnoOnly needed when deduplication.type=schedule_replace. Contains the label that the view configuration property have to be set to, in the view template.

Example usage
deduplicate:
    type: schedule_replace
    label: daily

Properties: load

The following properties can be set under the load parameter.

propertytyperequireddescription
mappingobjectyesContains the mappings from the folder in the views directory to the database (in Snowflake) or project id (in BigQuery). We recommend you to keep the local folder name the same as the destination database/project in production.

Example usage

load:
  mapping:
    onesecondbefore-gcloud: onesecondbefore-gcloud-sandbox

Details

Link to create view documentation
BigQuery documentation
Snowflake documentation