Intro

A Dynamic Dataset is a way to look at data using Honeydew. A typical Honeydew deployment has many thousands of pieces of logic. An analysis might need just a subset, for example, the orders of a particular customer.

The simplest dataset is “all the data of an entity”.

Dynamic datasets are built:

  1. Attributes
  2. Metrics
  3. Filters

The components can come from fields defined in the semantic layer (attributes and metrics) or defined at ad-hoc field SQL expressions.

Unlike attributes or metrics defined in Entities, ad-hoc calculations defined in a dataset is not reusable by other calculations.

Metadata

Dynamic datasets may include metadata such as their owner, business description, labels. See metadata section for more details.

Git Schema

Every dynamic dataset is backed by a text file in git that defines it, and keeps history of every change. See schema for more details on the underlying representation.

Example

Build a new dynamic dataset, and select from lineitem two fields: l_linestatus (an attribute) and sum_qty (a metric). The sums up item quantity per their status delivery status.

The schema for it is -

attributes:
- lineitem.l_linestatus
metrics:
- lineitem.sum_qty

Interface

Accessing the data

Dynamic datasets can power different delivery modes:

  • Snowflake - with a view, a table or a dynamic table
  • SQL interface - using a JDBC connector for live data access
  • BI connection - combining SQL interface with metadata specific to a BI tool such as Looker or Tableau
  • Python - with the honeydew hdapi python module
  • 3rd party such as dbt model - using honeydew APIs

Filtering Order

Filters defined as part of a dynamic dataset definition will apply on every query.

Other filters (such as those defined in a filtered metric) apply as well with the following order of evaluation:

  1. Source table filters - defined as part of entity source table
  2. Filters of filtered metrics used in calculated attributes
  3. Dynamic Datasets filters
  4. Filters that come from a live SQL interface
  5. Filters of filtered metrics referenced in the dynamic dataset

Parameterized Datasets

Dynamic datasets can be parameterized with Parameters.

Parameters can be specific to a dynamic dataset, or can overrule global parameters.

Materialization in Snowflake

When a dynamic dataset is delivered to Snowflake, all fields are created in the chosen Snowflake objects (a view or a table):

  • Attributes as columns.
  • Metrics as columns. Metrics are aggregated and grouped by default by all the attributes (unless specified otherwise with a GROUP BY in the metric definition).
  • All filters of the dataset apply.

Dynamically changing the level of granularity is not possible after delivery - if need to explore the same dataset at different levels of granularity use the SQL interface.

Views

Datasets can be deployed as a Snowflake VIEW.

Honeydew will maintain the underlying query and will detect if its logic different from the shared source of truth.

Use views for:

  • Build live datasets for sharing
  • Integration with tools that can’t use live JDBC connection
  • Real-time data

Parameterized Datasets can be deployed as VIEWs. Parameters can be controlled with the Snowflake SET SQL command.

Tables

Datasets can be deployed as a Snowflake TABLE.

Honeydew will NOT automatically detect if the logic that built a TABLE has changed.

Use tables for:

  • “Freeze” a dataset for an ad-hoc analysis on it
  • Use cases where a dynamic table is impossible to use

Dynamic Tables

Datasets can be deployed as a Snowflake Dynamic Table.

Honeydew will maintain the underlying query and will detect if its logic different from the shared source of truth.

Following configuration is possible:

  • Target database and schema
  • Virtual data warehouse to use
  • Dynamic table lag units (seconds / minutes / hours / days)
  • Dynamic table lag amount
  • Should it refresh downstream dynamic tables

Use dynamic tables for:

  • Large data or complex queries that require performance
  • Continuous materialization with automatic refresh
Dynamic Tables support is currently in Preview. Contact us to enable.

Live access with a SQL interface

A dynamic dataset is accessible through a JDBC connection for interactive exploration.

  • Attributes in the dataset are accessible as SQL columns.
  • Metrics in the dataset are accessible with the AGG command.
  • All filters of the dataset apply.

In addition a live SQL query may add filters of its own and ad-hoc calculations on top of the dynamic dataset.

When a dataset is accessed through the SQL interface it behaves according to the user query:

  • Group by given attributes selected in GROUP BY, as long as they are using fields defined in the dataset.
  • Aggregate by given metrics (with AGG) or aggregations functions (SUM, COUNT, etc.)
  • Filter by given WHERE and HAVING conditions

As long as the expressions are using fields defined in the dataset.

Dynamic datasets used in live SQL interface can only include named fields (metrics and attributes) coming from the semantic layer.

Ad-hoc attributes or metrics defined in the dynamic dataset are not possible with a SQL interface.

If need ad-hoc attributes or metrics, can add them in the live SQL query.

Example

A dataset called example that include:

  • The attributes lineitem.l_linestatus and orders.o_orderdate
  • The metrics lineitem.sum_qty and lineitem.revenue
  • A filter such as lineitem.l_commitdate < date('1997-01-01')

It allows exploration on two attributes and two metrics, over a filtered set of data.

The following live SQL query will:

SELECT
    "lineitem.l_linestatus",
    AGG("lineitem.revenue") as revenue,
    MIN("orders.o_orderdate") as min_order_date
WHERE
    "orders.o_orderdate" > date('1997-01-01')
FROM dynamic_datasets.example
GROUP BY 1

return a table with 3 columns:

  • lineitem.l_linestatus unique values
  • revenue aggregated named metric grouped by lineitem.l_linestatus
  • min_order_date ad-hoc metric grouped by lineitem.l_linestatus

and will filter the data with two filters:

  • "orders.o_orderdate" > date('1997-01-01') given in the query
  • lineitem.l_commitdate < date('1997-01-01') defined in the dynamic dataset

Was this page helpful?