Performance Optimization

Performance is a concern in few scenarios:

  1. Complex multi-stage computations
  2. Large data amounts for dashboard aggregations
  3. Ad-hoc exploration over large datasets

Honeydew automatically optimizes any query it processes (see below). In addition, depending on the use case, users can leverage:

  1. Caching to reduce computation with dynamic dashboards
  2. Filtering and sampling to reduce data scanning in ad-hoc workloads
  3. Dataset materialization when the dataset can be fully pre-computed

Different approaches benefit different use cases.

WorkloadQueriesPerformance approach
Ad-hoc data explorationUnpredictableFiltering and sampling
Dynamic dashboards and BIPredictableCaching
Datasets for data science or 3rd party integrationFully knownMaterialization

Automatic Query Optimizations

Push down filters

If there are filters in the query, they will be pushed down to the data, limiting the amount of data scanned. See order of filtering for more details.

Filter pushdown works best when it matches the table clustering in Snowflake.

Entity caches

If an entity calculated attribute is used in a query, and a valid cache exists for the entity, then the cache will be used instead of recomputing the calculated attribute.

Filters, if applicable, are pushed down to the entity cache as well.

Entity calculated attributes that are using window functions or aggregations benefit the most from an entity cache.

Join pruning

After pushing down filters and using caches, Honeydew engine will only join the tables that are actually needed for a specific user query.

Using a normalized schema coupled with join pruning is often faster than using pre-joined denormalized wide tables.

Aggregation caches

Coming Soon!

If a there is a valid pre-aggregate cache that matches the specific groups, filters and aggregations used in a query - Honeydew engine will leverage it instead of computing the aggregation.

Pre-aggregate caches help best with dashboards than have pre-set, static aggregations.