Filtering and Sampling
Honeydew pushes filters down to data, in order to reduce data scans.Use domain filters to reduce scanned data in ad-hoc exploration
Domain filters limit the data that is accessed by a user query, which improves performance. For example, can add a filter to a domain, that limits data in a fact table to the last week only. Any user query, regardless of the choice of attributes and metrics, will then be limited to the last week’s data. Honeydew engine will push down the filter to reduce data access and improve performance. In some cases of logical data partitioning (for example, having a column indicating a tenant in a multi-tenant environment, or a column indicating data sampling level) a domain source filter can be further applied to improve performance.Enforce performance filters with conditional filtering
Domain filters can be conditional, applying unless a user overrides them. For example, may by default filter to last week only, unless the user asked for more data. To create --
Create an entity
dim_data_cutoff
using Custom SQL with possible cutoff values: - Add a source filter to your domain filtering the data:
When source filters are used, calculations will only see reduced data. If there is a calculation that uses data that is
out of the filtering boundaries (i.e
revenue_ytd
calculating year-to-date revenue) it will return wrong values.Use domain filters to sample data
Domain source filters can be used to implement data sampling:- In a large fact table to be sampled, create a column with sampling level in the fact table. Filtering on that field will allow to select a data sample - for example, 0.1% of events will have the value “0.1”. Related events are commonly sampled together (i.e. events from the same user/session).
- Add the sampling level column to Snowflake clustering settings of the fact table.
- Create a “sampled data domain” with a source filter on the sampling level column.
Use parameters to reduced scanned data in dynamic datasets deployed as VIEWs
Dynamic Datasets can be deployed as a Snowflake VIEW. When they are deployed as a VIEW, all metrics and attributes are part of VIEW query, so improvements such as reducing JOINs or pushing down filters are not possible. However, performance of VIEWs can be increased by combining parameters and filters. If a dynamic dataset is defined with a parameterized filter, any access to the view can be controlled with the view parameter. For example, if a view with the following filterorders.o_orderdate = $date_to_explore
was defined, then can do: