Overview
Performance Optimization
Performance is a concern in few scenarios:
- Complex multi-stage computations
- Large data amounts for dashboard aggregations
- Ad-hoc exploration over large datasets
Honeydew automatically optimizes any query it processes (see below). In addition, depending on the use case, users can leverage:
- Filtering and sampling to reduce data scanning in ad-hoc workloads
- Aggregate aware caching to reduce computation with dynamic dashboards
- Dataset materialization when the dataset can be fully pre-computed
- Entity Caching to reduce computation with complex calculated attributes
Different approaches benefit different use cases.
Workload | Queries | Performance approach |
---|---|---|
Ad-hoc data exploration | Unpredictable | Filtering and Sampling |
Dynamic dashboards and BI | Predictable | Aggregate Aware Caching |
Datasets for data science or 3rd party integration | Fully known | Materialization |
Massive data transformation (ELT in Honeydew) | Batch | Entity Caching |
Snowflake Source Tables
Honeydew does not control the Snowflake configuration of source data. It is advised to follow Snowflake’s best practices for clustering and table structure, as they will benefit any query generated by Honeydew.
Automatic Query Optimizations
All Honeydew generated queries are optimized for high performance and low cost.
Push down filters
If there are filters in the query, they will be pushed down to the data, limiting the amount of data scanned. See order of filtering for more details.
Filter pushdown works best when it matches the table clustering in Snowflake.
Join pruning
After pushing down filters and using caches, Honeydew engine will only join the tables that are actually needed for a specific user query.
Using a normalized schema coupled with join pruning is often faster than using pre-joined denormalized wide tables.
Aggregate aware caching
If a there is a valid pre-aggregate cache that matches the specific groups, filters and aggregations used in a query - Honeydew engine will leverage it instead of computing the aggregation.
Honeydew can detect partial aggregations that leverage roll-up automatically. For example, if there is a pre-aggregated cache for revenue by day, Honeydew can use it to calculate revenue by year.
Pre-aggregate caches help best with workloads than operate on a repeating set of groups and metrics, such as BI dashboards or pivot tables.
Entity caching
If an entity calculated attribute is used in a query, and a valid cache exists for the entity, then the cache will be used instead of recomputing the calculated attribute.
Filters, if applicable, are pushed down to the entity cache as well.
Entity calculated attributes that are using window functions or aggregations benefit the most from an entity cache.
Was this page helpful?