Advanced templates¶
This section covers advanced usage of our datafile system when using the Tinybird CLI. Before reading this page, you should be familiar with query parameters.
Reusing templates¶
When developing multiple use cases, it's very common to want to reuse certain parts or steps of an analysis, such as data filters or similar table operations. We're going to use the following repository for this purpose:
Clone demo
git clone https://github.com/tinybirdco/ecommerce_data_project_advanced.git cd ecommerce_data_project_advanced
File structure
ecommerce_data_project/ datasources/ events.datasource mv_top_per_day.datasource products.datasource fixtures/ events.csv products.csv endpoints/ sales.pipe top_products_between_dates.pipe top_products_last_week.pipe includes/ only_buy_events.incl top_products.incl pipes/ top_product_per_day.pipe
First, let's take a look at the sales.pipe
API Endpoint and the top_product_per_day.pipe
Pipe that materializes to a mv_top_per_day
Data Source. They both make use of the same Node: only_buy_events
:
includes/only_buy_events.incl
NODE only_buy_events SQL > SELECT toDate(timestamp) date, product, joinGet('products_join_by_id', 'color', product) as color, JSONExtractFloat(json, 'price') as price FROM events where action = 'buy'
endpoints/sales.pipes
INCLUDE "../includes/only_buy_events.incl" NODE endpoint DESCRIPTION > return sales for a product with color filter SQL > % select date, sum(price) total_sales from only_buy_events where color in {{Array(colors, 'black')}} group by date
pipes/top_product_per_day.pipe
INCLUDE "../includes/only_buy_events.incl" NODE top_per_day SQL > SELECT date, topKState(10)(product) top_10, sumState(price) total_sales from only_buy_events group by date TYPE materialized DATASOURCE mv_top_per_day ENGINE AggregatingMergeTree ENGINE_SORTING_KEY date
When using INCLUDE files to reuse logic in .datasource files, the extension of the file must be .datasource.incl
. This is used by CLI commands as tb fmt
to identify the type of file and apply the correct formatting.
Include variables¶
Using variables¶
It is possible to include variables in a Node template. The main reason to do that is to have a very similar Node or Nodes that can be reused with slight differences. For instance, in our example, we want to have two API Endpoints to display the 10 top products, each filtered by different date intervals:
includes/top_products.incl
NODE endpoint DESCRIPTION > returns top 10 products for the last week SQL > select date, topKMerge(10)(top_10) as top_10 from top_product_per_day {% if '$DATE_FILTER' = 'last_week' %} where date > today() - interval 7 day {% else %} where date between {{Date(start)}} and {{Date(end)}} {% end %} group by date
endpoints/top_products_last_week.pipe
INCLUDE "../includes/top_products.incl" "DATE_FILTER=last_week"
endpoints/top_products_between_dates.pipe
INCLUDE "../includes/top_products.incl" "DATE_FILTER=between_dates"
As you can see, the variable DATE_FILTER
is sent to the top_products
include, where the variable content is retrieved using the $
prefix with the DATE_FILTER
reference.
It is also possible to assign an array of values to an include variable. To do this, the variable needs to be parsed properly using function templates, as explained in the following section.
Variables vs parameters¶
Note the difference between variables and parameters. Parameters are indeed variables whose value can be changed by the user through the API Endpoint request parameters. Variables only live in the template and can be set when declaring the INCLUDE
or with the set
template syntax:
Using 'set' to declare a variable
{% set my_var = 'default' %}
By default, variables will be interpreted as parameters. In order to prevent variables or private parameters from appearing in the auto-generated API Endpoint documentation, they need to start with _
. Example:
Define private variables
% SELECT date FROM my_table WHERE a > 10 {% if defined(_private_param) %} and b = {{Int32(_private_param)}} {% end %}
This is also needed when using variables in template functions.
Template functions¶
This is the list of the available functions that can be used in a template:
defined¶
defined(param)
: check if a variable is defined
defined function
% SELECT date FROM my_table {% if defined(param) %} WHERE ... {% end %}
column¶
column(name)
: get the column by its name from a variable
column function
% {% set var_1 = 'name' %} SELECT {{column(var_1)}} FROM my_table
columns¶
columns(names)
: get columns by their name from a variable
columns function
% {% set var_1 = 'name,age,address' %} SELECT {{columns(var_1)}} FROM my_table
date_diff_in_seconds¶
date_diff_in_seconds(date_1, date_2, [date_format], [backup_date_format], [none_if_error])
: gets the abs value of the difference in seconds between two datetimes.
date_format
is optional and defaults to '%Y-%m-%d %H:%M:%S'
, so you can pass DateTimes as YYYY-MM-DD hh:mm:ss
when calling the function:
date_diff_in_seconds('2022-12-19 18:42:22', '2022-12-19 19:42:34')
Other formats are supported and need to be explicitly passed, like
date_diff_in_seconds('2022-12-19T18:42:23.521Z', '2022-12-19T18:42:23.531Z', date_format='%Y-%m-%dT%H:%M:%S.%fz')
For questions regarding the format, check python strftime-and-strptime-format-codes.
For a deep dive into timestamps, time zones and Tinybird, see the docs on working with time.
date_diff_in_seconds function
% SELECT date, events {% if date_diff_in_seconds(date_end, date_start, date_format="%Y-%m-%dT%H:%M:%Sz") < 3600 %} FROM my_table_raw {% else %} FROM my_table_hourly_agg {% end %} WHERE date BETWEEN parseDateTimeBestEffort({{String(date_start,'2023-01-11T12:24:04Z')}}) AND parseDateTimeBestEffort({{String(date_end,'2023-01-11T12:24:05Z')}})
backup_date_format
is optional and it allows to specify a secondary format as a backup when the provided date does not match the primary format. This is useful when your default input format is a datetime (2022-12-19 18:42:22
) but you receive a date (2022-12-19
).
date_diff_in_seconds('2022-12-19 18:42:22', '2022-12-19', backup_date_format='%Y-%m-%d')
none_if_error
is optional and defaults to False
. If set to True
, the function will return None
if the provided date does not match any of the provided formats. This is useful to provide an alternate logic in case any of the dates are specified in a different format.
date_diff_in_seconds function using none_if_error
% SELECT * FROM employees {% if date_diff_in_seconds(date_start, date_end, none_if_error=True) is None %} WHERE starting_date BETWEEN now() - interval 4 year AND now() {% else %} WHERE starting_date BETWEEN parseDateTimeBestEffort({{String(date_start, '2023-12-01')}}) AND parseDateTimeBestEffort({{String(date_end, '2023-12-02')}}) {% end %}
date_diff_in_minutes¶
Same behavior as date_diff_in_seconds
with returning the difference in minutes.
date_diff_in_hours¶
Same behavior as date_diff_in_seconds
with returning the difference in hours.
date_diff_in_days¶
date_diff_in_days(date_1, date_2, [date_format])
: gets the absolute value of the difference in seconds between two dates or datetimes.
date_diff_in_days function
% SELECT date FROM my_table {% if date_diff_in_days(date_end, date_start) < 7 %} WHERE ... {% end %}
date_format
is optional and defaults to '%Y-%m-%d
so you can pass DateTimes as YYYY-MM-DD
when calling the function.
As with date_diff_in_seconds
, date_diff_in_minutes
, and date_diff_in_hours
, other date_formats are supported.
split_to_array¶
split_to_array(arr, default, separator=',')
: splits comma separated values into an array
split_to_array function
% SELECT arrayJoin(arrayMap(x -> toInt32(x), {{split_to_array(code, '')}})) as codes FROM my_table
split_to_array with a custom separator function
% SELECT {{split_to_array(String(param, 'hi, how are you|fine thanks'), separator='|')}}
enumerate_with_last¶
enumerate_with_last(arr, default)
: creates an iterable array, returning a boolean value that allows to check if the current element is the last element in the array. It can be used along with the split_to_array
function.
symbol¶
symbol(x, quote)
: get the value of a variable
enumerate_with_last function
% SELECT {% for _last, _x in enumerate_with_last(split_to_array(attr, 'amount')) %} sum({{symbol(_x)}}) as {{symbol(_x)}} {% if not _last %}, {% end %} {% end %} FROM my_table
sql_and¶
sql_and(<column>__<op>=<transform_type_function> [, ...] )
: creates a list of "WHERE" clauses, along with "AND" separated filters, that checks if a field (<column>) is or isn't (<op>) in a list/tuple (<transform_type_function>).
- The parameter <column> is any column in the table.
- <op> is one of:
in
,not_in
,gt
(>),lt
(<),gte
(>=),lte
(<=) - <transform_type_function> is any of the transform type functions (
Array(param, 'Int8')
,String(param)
, etc.). If one parameter is not specified, then the filter is ignored.
sql_and function
% SELECT * FROM my_table WHERE 1 {% if defined(param) or defined(param2_not_in) %} AND {{sql_and( param__in=Array(param, 'Int32', defined=False), param2__not_in=Array(param2_not_in, 'String', defined=False))}} {% end %}
If this is queried with param=1,2
and param2_not_in=ab,bc,cd
, then it translates to:
sql_and function - generated sql
SELECT * FROM my_table WHERE 1 AND param IN [1,2] AND param2 NOT IN ['ab','bc','cd']
If this is queried just with param=1,2
, but param2_not_in
is not specified, then it translates to:
sql_and function - generated sql param missing
SELECT * FROM my_table WHERE 1 AND param IN [1,2]
Transform types functions¶
Boolean(x)
DateTime64(x)
DateTime(x)
Date(x)
Float32(x)
Float64(x)
Int8(x)
Int16(x)
Int32(x)
Int64(x)
Int128(x)
Int256(x)
UInt8(x)
UInt16(x)
UInt32(x)
UInt64(x)
UInt128(x)
UInt256(x)
String(x)
Array(x)