Intro

A Dynamic Dataset is a way to look at data using Honeydew. A typical Honeydew deployment has many thousands of pieces of logic. An analysis might need just a subset, for example, the orders of a particular customer.

The simplest dataset is “all the data of an entity”.

Dynamic datasets are built:

  1. Attributes
  2. Metrics
  3. Filters

The components can come from fields defined in the semantic layer (attributes and metrics) or defined at ad-hoc field SQL expressions.

Unlike attributes or metrics defined in Entities, ad-hoc calculations defined in a dataset is not reusable by other calculations.

Metadata

Dynamic datasets may include metadata such as their owner, business description, labels. See metadata section for more details.

Example

Build a new dynamic dataset, and select from lineitem two fields: l_linestatus (an attribute) and sum_qty (a metric). The sums up item quantity per their status delivery status.

The schema for it is -

attributes:
- lineitem.l_linestatus
metrics:
- lineitem.sum_qty

Interface

Accessing the data

Dynamic datasets can power different delivery modes:

  • Snowflake - with a view, a table or a dynamic table
  • SQL interface - using a JDBC connector for live data access
  • BI connection - combining SQL interface with metadata specific to a BI tool such as Looker or Tableau
  • Python - with the honeydew hdapi python module
  • 3rd party such as dbt model - using honeydew APIs

Filtering Order

Filters defined as part of a dynamic dataset definition will apply to its query.

For more details, see order of computation

Parameterized Datasets

Dynamic datasets can be parameterized with Parameters.

Parameters can be specific to a dynamic dataset, or can overrule global parameters.

Domains

Dynamic datasets operate with the context of a domain if set and default domain otherwise.

Filters, roles and any other domain configuration will apply to the dynamic dataset.

Materialization in Snowflake

When a dynamic dataset is delivered to Snowflake, all fields are created in the chosen Snowflake objects (a view or a table):

  • Attributes as columns.
  • Metrics as columns. Metrics are aggregated and grouped by default by all the attributes (unless specified otherwise with a GROUP BY in the metric definition).
  • All filters of the dataset apply.

Dynamically changing the level of granularity is not possible after delivery - if need to explore the same dataset at different levels of granularity use the SQL interface.

Views

Datasets can be deployed as a Snowflake VIEW.

Honeydew will maintain the underlying query and will detect if its logic different from the shared source of truth.

Use views for:

  • Build live datasets for sharing
  • Integration with tools that can’t use live JDBC connection
  • Real-time data

Parameterized Datasets can be deployed as VIEWs. Parameters can be controlled with the Snowflake SET SQL command.

Tables

Datasets can be deployed as a Snowflake TABLE.

Honeydew will NOT automatically detect if the logic that built a TABLE has changed.

Use tables for:

  • “Freeze” a dataset for an ad-hoc analysis on it
  • Use cases where a dynamic table is impossible to use

Dynamic Tables

Datasets can be deployed as a Snowflake Dynamic Table.

Honeydew will maintain the underlying query and will detect if its logic different from the shared source of truth.

Following configuration is possible:

  • Target database and schema
  • Virtual data warehouse to use
  • Dynamic table lag units (seconds / minutes / hours / days)
  • Dynamic table lag amount
  • Should it refresh downstream dynamic tables

Use dynamic tables for:

  • Large data or complex queries that require performance
  • Continuous materialization with automatic refresh

Example

A dataset called example that include:

  • The attributes lineitem.l_linestatus and orders.o_orderdate
  • The metrics lineitem.sum_qty and lineitem.revenue
  • A filter such as lineitem.l_commitdate < date('1997-01-01')

It allows exploration on two attributes and two metrics, over a filtered set of data.

The following live SQL query will:

SELECT
    "lineitem.l_linestatus",
    AGG("lineitem.revenue") as revenue,
    MIN("orders.o_orderdate") as min_order_date
WHERE
    "orders.o_orderdate" > date('1997-01-01')
FROM dynamic_datasets.example
GROUP BY 1

return a table with 3 columns:

  • lineitem.l_linestatus unique values
  • revenue aggregated named metric grouped by lineitem.l_linestatus
  • min_order_date ad-hoc metric grouped by lineitem.l_linestatus

and will filter the data with two filters:

  • "orders.o_orderdate" > date('1997-01-01') given in the query
  • lineitem.l_commitdate < date('1997-01-01') defined in the dynamic dataset

YAML Schema

The schema for a dynamic dataset is:

type: perspective
name: [dataset name]
owner: [owner]
domain: [domain name]
description: [description]
labels: [label list]
folder: [folder]
display_name: [display name]
hidden: [yes/no/true/false]
attributes:
  - [attribute expression]
  - ...
metrics:
  - [metric expression]
  - ...
filters:
  - [filter expression]
  - ...
parameters:
  - name: [parameter name]
    value: [parameter value]
  - ...

Fields:

  • name: Name of the dataset
  • owner, description, labels, folder, display_name, hidden (all optional): Metadata
  • domain (optional): Domain in which context to run
  • attributes (optional): List of attribute expressions (can be named attributes or ad-hoc expressions)
  • metrics (optional): List of metric expressions (can be named metrics or ad-hoc expressions)
  • filters (optional): List of filter expressions (can be named filters or ad-hoc expressions)
  • parameters (optional): List of parameters and values (override workspace values if exist)