Honeydew Managed Pipelines

Honeydew is a repository of shared business logic such as standard metrics and entity definitions.

Data Pipeline Architecture

Honeydew integrates with dbt to allow an end-to-end analytics pipeline.

Honeydew enables any user tool to consume standard metrics and entities. When a tool like Tableau sends a query, Honeydew transforms it on-the-fly to the right set of JOINs and aggregations based on a shared data model definition.

By integrating Honeydew and dbt, you can create a single source of truth from ingestion to Snowflake tables to BI that is based on the same standard definitions.

Pipeline Stages

A typical data pipeline of Honeydew with Snowflake and dbt consists of the following stages:

  1. Ingestion: Bringing raw data into Snowflake. Usually, with a tool such as Fivetran or similar.

  2. Staging: An incremental periodic process that transforms raw data into analytic-ready granular data. The following transformations are typically performed at this stage:

    1. Data cleanups, normalization, deduplication
    2. Data merging, change data capture deconstruction
    3. Dimensional modeling, incremental updates

    Implemented using dbt or other transformation tools. Executed by an orchestrator (dbt Cloud, Airflow, etc.) based on a period (i.e. hourly) or change triggers.

  3. Entity Layer (Data Mart): A periodic process that transforms granular data into a domain data mart with business entities, metrics and aggregated datasets.

    The following transformations are typically performed at this stage:

    1. Business Entities: Extend granular data with logic properties, such as
      1. Calculated columns (build an “Age Group” category out of “date of birth”)
      2. Aggregated properties (first order data, total revenue per customer, etc.)
    2. Metric Datasets: Build datasets such as:
      1. Aggregated tables (“Monthly KPIs”)
      2. Denormalized fact tables

    Both are implemented using Honeydew metrics and can be materialized in Snowflake by Honeydew.

    For materialization, Honeydew can leverage your dbt pipeline by creating Honeydew-managed dbt models, in the same dbt repository as your staging models.

    Data Pipeline Details

  4. Metric Layer: On-the fly transformation of user queries to JOINs and aggregations over entity tables using standard metrics like “Revenue” or “Active Customer Count”.

Example

  1. Ingestion: raw customer tables and raw order events are loaded into Snowflake.
  2. Staging: dbt models that transform those into two tables:
    1. dim_customers: A customer dimension table, with a row per known customer. Dbt preparation includes clean-up and merging from raw data.
    2. fact_orders: An incremental model, with a row per unique order event. Dbt preparation includes handling backfilling and time stamp conversion.
  3. Data Mart: Honeydew-managed models:
    1. Entity models, such as customers: A customer entity table. Includes additional properties:
      1. “Age Group” - a CASE WHEN that maps ages to categories
      2. “First Order Data” - an aggregation of order facts that finds first order
      3. “Lifetime Revenue” - the revenue standard metric by customer
    2. Dataset models, such as monthly_kpis: An aggregated table:
      1. “Order Month” - based on order timestamp
      2. “Monthly Revenue” - the revenue standard metric by order month
      3. “Monthly # Customers” - a count of customers active in a month

Data Pipeline Example