Performance Optimization

Performance is a concern in few scenarios:

  1. Complex multi-stage computations
  2. Large data amounts for dashboard aggregations
  3. Ad-hoc exploration over large datasets

Honeydew automatically optimizes any query it processes (see below). In addition, depending on the use case, users can leverage:

  1. Filtering and sampling to reduce data scanning in ad-hoc workloads
  2. Aggregate aware caching to reduce computation with dynamic dashboards
  3. Dataset materialization when the dataset can be fully pre-computed
  4. Entity Caching to reduce computation with complex calculated attributes

Different approaches benefit different use cases.

WorkloadQueriesPerformance approach
Ad-hoc data explorationUnpredictableFiltering and Sampling
Dynamic dashboards and BIPredictableAggregate Aware Caching
Datasets for data science or 3rd party integrationFully knownMaterialization
Massive data transformation (ELT in Honeydew)BatchEntity Caching

Snowflake Source Tables

Honeydew does not control the Snowflake configuration of source data. It is advised to follow Snowflake’s best practices for clustering and table structure, as they will benefit any query generated by Honeydew.

Automatic Query Optimizations

All Honeydew generated queries are optimized for high performance and low cost.

Push down filters

If there are filters in the query, they will be pushed down to the data, limiting the amount of data scanned. See order of filtering for more details.

Filter pushdown works best when it matches the table clustering in Snowflake.

Join pruning

After pushing down filters and using caches, Honeydew engine will only join the tables that are actually needed for a specific user query.

Using a normalized schema coupled with join pruning is often faster than using pre-joined denormalized wide tables.

Aggregate aware caching

If a there is a valid pre-aggregate cache that matches the specific groups, filters and aggregations used in a query - Honeydew engine will leverage it instead of computing the aggregation.

Honeydew can detect partial aggregations that leverage roll-up automatically. For example, if there is a pre-aggregated cache for revenue by day, Honeydew can use it to calculate revenue by year.

Pre-aggregate caches help best with workloads than operate on a repeating set of groups and metrics, such as BI dashboards or pivot tables.

Entity caching

If an entity calculated attribute is used in a query, and a valid cache exists for the entity, then the cache will be used instead of recomputing the calculated attribute.

Filters, if applicable, are pushed down to the entity cache as well.

Entity calculated attributes that are using window functions or aggregations benefit the most from an entity cache.