Introduction to filters

A filter is a type of calculation that can be used as part of filtering expression.

Filters are pushed down automatically by the engine.

Metadata

Filters may include metadata such as their owner, business description, labels. See metadata section for more details.

Building Filters

Filters follow the same structure and rules as calculated attributes.

For example, this filter in orders entity identifies shipped orders:

orders.status = 'O'

Filters SQL expressions may combine entities of same granularity or lower - see combining entities in calculated attributes.

Using Filters

Filter Dynamic Datasets

Dynamic Datasets can include filters as part of their components.

Every filter in a dynamic dataset is applied (AND).

Filtering is pushed down - every filter is pushed down to apply as early as possible, resulting in higher performance.

The engine pushes down the filter to the entity where it can be fully computed.

This has performance implications - for example, a filter like

orders.status = 'O' AND customers.nationkey = 23

will be executed on orders after joining customers.

In some cases, it is better to separate filters by entity and combine them only in a dynamic dataset or a SQL query. If using two filters as:

orders.status = 'O'

And separately

customers.nationkey = 23

The latter filter will be pushed down to customers before joining it to orders, resulting in a better performance.

As calculated attributes

Filters can be used in calculated attributes or metrics expressions. If a filter is used in an expression context, it is automatically converted to a boolean calculated attribute.

You can group by a metric by a filter: use a filter as an ad-hoc attribute in a dynamic dataset

With filtered metrics

Filters can also be used as part of filtered metrics expression, with a FILTER (WHERE ...) expression.

Filtering Joins

Data can be filtered as a result of joins when applying filters.

For example, if looking for all customers customers.custkey and apply this filter of shipped orders:

orders.status = 'O'

Then will only get customers which have a shipped order, because orders filters customers after the JOIN between them.

Whether a filter on an entity applies to a different entity depends on the relationship:

Join Type between entities A and BDoes a filter on A filters B?Does a filter on B filters A ?
INNERYesYes
LEFT (A LEFT JOIN B)NoYes
RIGHT (A RIGHT JOIN B)YesNo
OUTERNoNo

The reason is that if the relationship between A and B is LEFT - i.e., take all rows in A regardless of whether they match a row in B or not, then any filter on B will not affect the results.

YAML Schema

Every filter is backed by a text file in git that defines it, and keeps history of every change.

The schema for a filter is:

type: filter
entity: [entity name]
name: [filter name]
display_name: [display name]
owner: [owner]
labels: [...]
folder: [folder]
hidden: [True/False/Yes/No]
desciption: |-
  [description]
sql: |-
  [sql statement]

Fields:

  • entity: Name of entity the attribute belongs to
  • name: Name of attribute
  • display_name, owner, description, labels, folder, hidden: Metadata
  • sql: SQL statement of the filter calculation