Performance is a concern in few scenarios:
- Complex multi-stage computations (that might require many JOINs)
- Large data amounts (such as event tables)
- Ad-hoc exploration
Honeydew has best practices to handle all of the above.
Accelerating Data Exploration
Ad-hoc exploration means that the question to ask is unknown, and does not have a pre-computed answer.
The best practice for faster exploration is to limit data for exploration with filtering or sampling.
Exploration with dynamic datasets deployed with a SQL interface
SQL interface is the most flexible way to do fast data exploration.
Based on user query, Honeydew will:
- Push down any filters so they run close on the source table
- Build only required JOINs for user query
- Optimize the query structure for performance
For example, can build for exploration a dynamic dataset that includes 1000 attributes and 300 metrics. If a user it to access just one metric and one dimension, Honeydew for that query will not compute anything else but the user request.
To improve performance further, apply a filter on the dynamic dataset that limits the amount of data.
For example, can add to the dynamic dataset above a filter that limits data only to the last week. Any user query, regardless of the choice of attributes and metrics, will then be limited on to the last week. Honeydew engine will push down the filter to reduce data access and improve performance.
Exploration with dynamic datasets deployed as VIEWs
Dynamic Datasets can be deployed as a Snowflake VIEW.
When they are deployed as a VIEW, all metrics and attributes are part of VIEW query, so improvements such as reducing JOINs or pushing down filters are not possible.
However, can increase the performance of VIEWs by combining parameters and filters.
If a dynamic dataset is defined with a parameterized filter, any access to the view can be controlled with the view parameter.
For example, if you defined a view that has the following filter
orders.o_orderdate = $date_to_explore then can do:
-- snowflake query SET date_to_explore='1998-01-01'::DATE; -- will only process a single chosen date SELECT * FROM honeydew_view;
Entities can have calculated attributes that are expensive to compute - such as aggregations and JOINs.
By default, every access to a calculated attribute recomputes it, which can be slow.
For example, an attribute in the
customer entity that, for each customer, counts the orders of the customer, requires a JOIN to a large order table and an aggregation over all of it.
To improve performance of those, Honeydew can cache the computed entities in Snowflake, and leverage the computed value when applicable.
Entity Materialization is in Preview. Contact us to enable.
Accelerating Dynamic Datasets
Dynamic Datasets that are deployed to Snowflake can be materialization by making them to a TABLE or a DYNAMIC TABLE.
This is important when the dataset processed a large amount of data or has many multi-stage complex computations.
See materialization for more details.