Skip to main content

Documentation Index

Fetch the complete documentation index at: https://honeydew.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Pre-aggregated caches are typically rebuilt in full on each refresh. For large aggregates, a full rebuild can be slow or expensive. Incremental updates let you compute only the changed rows and merge them into the existing table.
Read Aggregate Aware Caching first - this page assumes you already have a working aggregate dynamic dataset.

How it works

The incremental pattern uses two dynamic datasets:
DatasetPurpose
Base aggregateThe permanent pre-aggregated table, deployed once in full
Delta aggregateA copy of the base with an extra date filter - deployed on each incremental run
After each delta deployment, you merge or append the delta rows into the base table using a MERGE or INSERT INTO statement.

Setup

1. Create the base aggregate

Create a dynamic dataset with the dimension keys and metrics you want to cache. Enable it as a pre-aggregate cache:
type: perspective
name: preagg_orders
owner: owner@example.com
attributes:
  - date.date
  - location.location_id
  - menu.menu_item_id
metrics:
  - order_detail.count
  - order_detail.order_cost
  - order_detail.revenue
filters: []
delivery:
  use_for_cache: snowflake
  snowflake:
    enabled: true
    name: <name of table>
    database: <name of database>
    schema: <name of schema>
    target: table

2. Create the delta aggregate

Duplicate the base aggregate and add a date filter for the incremental window (e.g. last day or last 7 days). The delta is a staging table - do not set use_for_cache on it.
type: perspective
name: preagg_orders_delta
owner: owner@example.com
attributes:
  - date.date
  - location.location_id
  - menu.menu_item_id
metrics:
  - order_detail.count
  - order_detail.order_cost
  - order_detail.revenue
filters:
  - date.date >= CURRENT_DATE - 1
delivery:
  snowflake:
    enabled: true
    name: <name of table>
    database: <name of database>
    schema: <name of schema>
    target: table
Both tables share the same schema. The delta is a filtered subset of the base: Base aggregate (preagg_orders) - full history:
date.datelocation.location_idmenu.menu_item_idorder_detail.countorder_detail.order_costorder_detail.revenue
2024-01-01101540300.00500.00
2024-01-01102720150.00200.00
2024-01-02101535280.00450.00
2024-01-02102715120.00180.00
Delta (preagg_orders_delta) - yesterday only (date.date >= CURRENT_DATE - 1):
date.datelocation.location_idmenu.menu_item_idorder_detail.countorder_detail.order_costorder_detail.revenue
2024-01-03101550400.00600.00
2024-01-03102725200.00250.00
Base aggregate (preagg_orders) - after incremental run:
date.datelocation.location_idmenu.menu_item_idorder_detail.countorder_detail.order_costorder_detail.revenue
2024-01-01101540300.00500.00
2024-01-01102720150.00200.00
2024-01-02101535280.00450.00
2024-01-02102715120.00180.00
2024-01-03101550400.00600.00
2024-01-03102725200.00250.00
Do not set use_for_cache on the delta aggregate. If the delta is registered as a cache, queries may be served from partial data.

3. Initial deployment

Deploy the base aggregate once to populate the full historical table, using the Honeydew UI or deployment API.

Incremental update strategies

Both phases run entirely inside your orchestrator (Airflow, dbt, or similar):
  1. Deploy the delta - your orchestrator calls the Honeydew API to compute and write the delta dataset to a staging table. See Set up with ETL tools.
  2. Merge or append - your orchestrator then runs the SQL below to move the delta rows into the base aggregate table.
StrategyWhen to use
AppendSource records are immutable after writing - no late-arriving updates
MergeSource records can change after the fact - use a window wide enough to cover all possible late changes

Append - new records only

Use this when source records are immutable after they are written (e.g. event logs where past dates never change). 1. Call the Honeydew API to compute and write the delta dataset for yesterday. 2. Your orchestration tool runs:
INSERT INTO preagg_orders
SELECT * FROM preagg_orders_delta;

Merge - backfill and append

Use this when source records can change after the fact (e.g. orders that are updated for up to 7 days). Widen the delta filter to cover the backfill window, then use MERGE to update existing rows and insert new ones. 1. Update the filter in preagg_orders_delta to date.date >= CURRENT_DATE - 7. 2. Call the Honeydew API to compute and write the delta dataset. 3. Your orchestration tool executes:
-- List all columns explicitly - wildcards are not supported in MERGE statements
MERGE INTO preagg_orders target
USING preagg_orders_delta delta
ON (
    -- Match condition: identify existing rows by their dimension keys
    -- Use IS NOT DISTINCT FROM so NULL keys match correctly
    target."date.date" IS NOT DISTINCT FROM delta."date.date"
    AND target."location.location_id" IS NOT DISTINCT FROM delta."location.location_id"
    AND target."menu.menu_item_id" IS NOT DISTINCT FROM delta."menu.menu_item_id"
)
-- Update metrics only - dimensions are assumed stable and are used as the match key
WHEN MATCHED THEN UPDATE SET
    target."order_detail.count"      = delta."order_detail.count",
    target."order_detail.order_cost" = delta."order_detail.order_cost",
    target."order_detail.revenue"    = delta."order_detail.revenue"
-- Insert the full row (dimensions + metrics) for new dimension combinations
WHEN NOT MATCHED THEN INSERT (
    "date.date",
    "location.location_id",
    "menu.menu_item_id",
    "order_detail.count",
    "order_detail.order_cost",
    "order_detail.revenue"
) VALUES (
    delta."date.date",
    delta."location.location_id",
    delta."menu.menu_item_id",
    delta."order_detail.count",
    delta."order_detail.order_cost",
    delta."order_detail.revenue"
);
Honeydew column names follow the format entity.attribute. Use double quotes in Snowflake ("date.date") and backticks in Databricks (`date.date`). List all columns explicitly in MERGE statements - wildcards are not supported.
Use IS NOT DISTINCT FROM instead of = in the ON clause so that NULL dimension values match correctly across rows.