The template language is based on a well-known template engine to flatten your learning curve. The most important difference is that the Workflows template engine is extending the base template engine to make it more suitable for SQL templating. The most important items of the template engine are on this page, to make sure you cover the most important topics. If you want to read more about a specific item, you can click on the link underneath the particular section for external reading.
There are a few kinds of delimiters. The delimiters are as follows:
{% ... %}
for Statements{{ ... }}
for Expressions to print to the template output{# ... #}
for Comments not included in the template outputTemplate variables are defined by the context dictionary passed to the template. You have the following config file properties available in your template:
The example below shows how you can use configuration variables in a SQL template.
SELECT {{ deduplicate.field }}, COUNT(1) AS records
FROM `{{ load.table }}`
WHERE date_field BETWEEN DATE('{{ task.start_date.format('%Y-%m-%d') }}')
AND DATE('{{ task.end_date.format('%Y-%m-%d') }}')
GROUP BY 1
Variables can be modified by filters. Filters are separated from the variable by a pipe symbol (|) and may have optional arguments in parentheses. Multiple filters can be chained. The output of one filter is applied to the next.
For example, {{ name|striptags|title }}
will remove all HTML Tags from variable name and title-case the output (title(striptags(name)))
.
Filters that accept arguments have parentheses around the arguments, just like a function call. For example: {{ listx|join(', ') }}
will join a list with commas (str.join(', ', listx)
).
Here is a list of all built-in filters.
Beside filters, there are also so-called “tests” available. Tests can be used to test a variable against a common expression. To test a variable or expression, you add is plus the name of the test after the variable. For example, to find out if a variable is defined, you can do name is defined, which will then return true or false depending on whether name is defined in the current template context.
Tests can accept arguments, too. If the test only takes one argument, you can leave out the parentheses. For example, the following two expressions do the same thing:
{% if loop.index is divisibleby 3 %}
{% if loop.index is divisibleby(3) %}
To comment-out part of a line in a template, use the comment syntax which is by default set to {# ... #}. This is useful to comment out parts of the template for debugging or to add information for other template designers or yourself:
{# note: commented-out template because we no longer use this
{% for field in deduplicate.keys %}
...
{% endfor %}
#}
This template, which we’ll call base-unload.sql. A common use-case in Redshift. You can download data from Redshift with the from_redshift task. The UNLOAD statement is always the same, regardless of the query you put in. You base template would look like this:
UNLOAD ('{{ self.query()|replace("'", "\\'") }}')
TO '{{ extract.params.location }}'
CREDENTIALS 'aws_iam_role=arn:aws:iam::999999999999:role/RedshiftCopyRole'
GZIP
ALLOWOVERWRITE
PARALLEL OFF
DELIMITER ','
;
A child template might look like this:
{% extends 'redshift/unload.sql' %}
{% block query %}
SELECT
DATE(CONVERT_TIMEZONE('UTC', 'Europe/Amsterdam', GETDATE())) AS import_date,
user_id,
COUNT(DISTINCT id) AS hits
FROM osb_analytics.hits
GROUP BY 1
{% endblock %}
The {% extends %}
tag is the key here. It tells the template engine that this template “extends” another template. When the template system evaluates this template, it first locates the parent. The extends tag should be the first tag in the template. Everything before it is printed out normally and may cause confusion. For details about this behavior and how to take advantage of it, see Null-Default Fallback. Also a block will always be filled in regardless of whether the surrounding condition is evaluated to be true or false.
The filename of the template uses the include directory from you repository as root. You can access templates in subdirectories with a slash:
{% extends 'redshift/unload.sql' %}
You can’t define multiple {% block %} tags with the same name in the same template. This limitation exists because a block tag works in “both” directions. That is, a block tag doesn’t just provide a placeholder to fill - it also defines the content that fills the placeholder in the parent. If there were two similarly-named {% block %} tags in a template, that template’s parent wouldn’t know which one of the blocks’ content to use.
{% block query %}
SELECT
DATE(CONVERT_TIMEZONE('UTC', 'Europe/Amsterdam', GETDATE())) AS import_date,
user_id,
COUNT(DISTINCT id) AS hits
FROM osb_analytics.hits
GROUP BY 1
{% endblock %}
A control structure refers to all those things that control the flow of a program - conditionals (i.e. if/elif/else), for-loops, as well as things like macros and blocks. With the default syntax, control structures appear inside {% ... %} blocks.
Loop over each item in a sequence. For example, to display a list of users provided in a variable called users:
SELECT
{% for key in deduplicate.keys %}
{{ key }},
{% endfor %}
FROM some_table
GROUP BY {% for key in deduplicate.keys %}
{{ key }},
{% endfor %}
With deduplicate.keys=[id, segment]
would result in this:
SELECT
id,
segment
FROM some_table
GROUP BY id,
segment
Read here for more information about the for loop.
The if statement is comparable with the Python if statement. In the simplest form, you can use it to test if a variable is defined, not empty and not false:
Well-know use-case of using it in a task with loop_by with 2 brands. When the loop_value is brand1, all data from the first Google Ads account is imported, when the loop_value is brand2, all data from the second Google Ads account is imported
task:
type: from_google_ads
start_date: today
end_date: today
loop_by: list
loop_list: [brand1, brand2]
extract:
conn_id: google_cloud_default
google_ads_conn_id: google_ads
type: query
login_customer_id: 1234567890 # MCC Customer id
customer_id: >
{%- if task.loop_value == 'brand1' -%}
8274829493
{%- elif task.loop_value == 'brand2' -%}
2938474635
{%- endif -%}
The task file for loop_value=brand1 would result in the task file below.
task:
type: from_google_ads
start_date: today
end_date: today
loop_by: list
loop_list: [brand1, brand2]
loop_value: brand1
loop_index: 0
extract:
conn_id: google_cloud_default
google_ads_conn_id: google_ads
type: query
login_customer_id: 1234567890 # MCC Customer id
customer_id: >
{%- if task.loop_value == 'brand1' -%}
8274829493
{%- elif task.loop_value == 'brand2' -%}
2938474635
{%- endif -%}
Read here for more information about the if statement.
Inside code blocks, you can also assign values to variables. Assignments at top level (outside of blocks, macros or loops) are exported from the template like top level macros and can be imported by other templates.
Assignments use the set tag and can have multiple targets. Below is the content
of an example data definitions file data_definitions/transactions.sql.tmpl
.
It resides in the includes
directory of your Github repository.
{% set is_bruto = CASE WHEN transaction = 1 THEN True ELSE False END %}
{% set is_netto = CASE WHEN transaction = 1 AND delivered = 1 THEN True ELSE False END %}
Then import the variables in your SQL templates and use them like in the example below.
{% import 'data_definitions/transactions.sql.tmpl' as c %}
SELECT DATE(order_date) AS order_date
, COUNT(DISTINCT CASE WHEN ({{ c.is_bruto }}) = 1 THEN order_id END) AS orders_bruto
, COUNT(DISTINCT CASE WHEN ({{ c.is_netto }}) = 1 THEN order_id END) AS orders_netto
FROM transactions
GROUP BY 1
Read here for more information about Assignments..