Filters
Introduction to filters
A filter is a type of calculation that can be used as part of filtering expression.
Filters are pushed down automatically by the engine.
Metadata
Filters may include metadata such as their owner, business description, labels. See metadata section for more details.
Building Filters
Filters follow the same structure and rules as calculated attributes.
For example, this filter in orders
entity identifies shipped orders:
orders.status = 'O'
Filters SQL expressions may combine entities of same granularity or lower - see combining entities in calculated attributes.
Using Filters
Filter Dynamic Datasets
Dynamic Datasets can include filters as part of their components.
Every filter in a dynamic dataset is applied (AND
).
Filtering is pushed down - every filter is pushed down to apply as early as possible, resulting in higher performance.
The engine pushes down the filter to the entity where it can be fully computed.
This has performance implications - for example, a filter like
orders.status = 'O' AND customers.nationkey = 23
will be executed on orders
after joining customers
.
In some cases, it is better to separate filters by entity and combine them only in a dynamic dataset or a SQL query. If using two filters as:
orders.status = 'O'
And separately
customers.nationkey = 23
The latter filter will be pushed down to customers
before joining it to orders
, resulting in a better performance.
As calculated attributes
Filters can be used in calculated attributes or metrics expressions. If a filter is used in an expression context, it is automatically converted to a boolean calculated attribute.
You can group by a metric by a filter: use a filter as an ad-hoc attribute in a dynamic dataset
With filtered metrics
Filters can also be used as part of filtered metrics expression, with a FILTER (WHERE ...)
expression.
Filtering Joins
Data can be filtered as a result of joins when applying filters.
For example, if looking for all customers customers.custkey
and apply this filter of shipped orders:
orders.status = 'O'
Then will only get customers which have a shipped order, because orders
filters customers
after the JOIN between them.
Whether a filter on an entity applies to a different entity depends on the relationship:
Join Type between entities A and B | Does a filter on A filters B ? | Does a filter on B filters A ? |
---|---|---|
INNER | Yes | Yes |
LEFT (A LEFT JOIN B) | No | Yes |
RIGHT (A RIGHT JOIN B) | Yes | No |
OUTER | No | No |
The reason is that if the relationship between A
and B
is LEFT - i.e., take all rows in A
regardless of whether they match a row in B
or not, then any filter on B
will not affect the results.
YAML Schema
Every filter is backed by a text file in git that defines it, and keeps history of every change.
The schema for a filter is:
type: filter
entity: [entity name]
name: [filter name]
display_name: [display name]
owner: [owner]
labels: [...]
folder: [folder]
hidden: [True/False/Yes/No]
desciption: |-
[description]
sql: |-
[sql statement]
Fields:
entity
: Name of entity the attribute belongs toname
: Name of attributedisplay_name
,owner
,description
,labels
,folder
,hidden
: Metadatasql
: SQL statement of the filter calculation