Entity Caching

Entities may have calculated attributes that are expensive to compute - with aggregations, JOINs or large table scans.

By default, every access to a calculated attribute recomputes it. However, for most calculated attributes recomputation is not necessary.

Entity cache enables materializing entities, in order to avoid re-computation. The cache is updated upon triggers or according to a set schedule.

By employing calculated attributes for multi-step calculations can build complex ELT data transformations directly in Honeydew.

The cache is stored as tables in your Snowflake, and is recomputed based on configuration.

In addition to Snowflake tables, entities can also be cached in supported BI tools that have memory capacity for cached data.​

See Power BI data caching for entity caching in Power BI.

When entity cache is enabled, Honeydew will automatically leverage data in the cache when applicable.

Entity caches are singlular and are shared across domains and branches.

Configuration

Set up in Entity YAML schema cache delivery settings for Snowflake:

type: entity
# ... entity configuration
delivery:
  # enable Snowflake as cache
  use_for_cache: snowflake
  snowflake:
    enabled: true
    # snowflake delivery settings (where the entity cache resides)
    name: <name_of_table>
    schema: <name_of_schema>
    target: table/view

Configuration when using dbt as orchestrator

Set up in Entity YAML schema cache delivery settings for dbt:

type: entity
# ... entity configuration
delivery:
  # enable dbt materialization as cache
  use_for_cache: dbt
  dbt:
    enabled: true
    # dbt settings (name of dbt model that creates the table in Snowflake)
    dbt_model: name_of_model

Orchestration

Entity Cache refresh relies on external orchestration (with dbt or otherwise).

Set up orchestration in an external tool

Use the Snowflake Native Application API to get SQL for entity cache:

select SEMANTIC_LAYER_ENTERPRISE_EDITION.API.GET_SQL_FOR_ENTITY(
        -- workspace & branch
        'workspace_name', 'branch_name',
        -- entity name
        'entity_name'
    );

Create the table using that SQL in Snowflake. Honeydew uses the table update time to detect cache validity.

Set up with dbt

To set up dbt as a cache orchestrator:

  1. In dbt, create an entity cache model by using the Honeydew dbt cache macro
  2. In dbt, use the config macro to set up materialization settings such as clustering or dynamic tables
  3. In Honeydew, set the entity’s dbt delivery settings to the chosen dbt model name

For example, this can be the customers model in dbt:

-- Set up materialization parameters for cache
{{ config(materialized='table') }}

-- Set up any additional dependencies in dbt with
-- depends_on: {{ ref('upstream_parent_model') }}

-- Cache for customers entity
{{ honeydew.get_entity_sql('customers') }}

is_incremental() dbt function may be used in combination with the Honeydew SQL macro for incremental caches. However, make sure to check whether the computation itself had changed between runs, to avoid mixing different version of logic in the same table.

See more in Honeydew dbt documentation