This guide helps people who are familiar with building SQL queries or dbt models to understand the concepts of Honeydew.

If you are familiar with semantic layers in BI or are not familiar with SQL, head over to the introduction instead.

Introduction

Honeydew is a standalone semantic layer.

It has many similarities to semantic layers that are embedded in BI tools such as Tableau, Looker or Power BI. However, unlike semantic layers embedded in a particular BI tool, Honeydew can be used by any data tool or SQL client.

Semantics include the following concepts:

  1. How business entities (such as “customer”) are mapped to data sources (which tables to read from)
  2. How data tables are related (which joins to use)
  3. How metrics are calculated (which aggregations to perform)
  4. Metadata (such as descriptions, display names, formatting) associated with presenting those objects to business users.

Those metrics are then used in by BI tools, reports, notebooks, and AI agents for data queries.

Most SQL queries or dbt models for a specific use case would include (1) - (3): join and aggregate the data to a level a BI tool can use.

BI tools with dashboard for that use case would include metadata (4), and some aggregation logic (3) - when aggregating data further for a specific plot or a widget.

For example, consider a Tableau report on sales activity:

  • Most commonly it would load data from a SQL view or a SQL table built for it with aggregated data relevant for the report: daily sales metrics, broken down by many dimensions such as customer segments or geographies.
  • The Tableau workbooks on top of that table would present widgets like daily growth. Those widgets would aggregate data from the table for the widget - i.e. sum the total sales metrics across all customer segments to present total growth.

The query that builds the sales activity data for the dashboard might share logic with queries that build other reports like a sales pipeline report.

The core job of a semantic compiler such as Honeydew is to act is an intermediate layer between data and user tools, separating the semantics into their own layer.

Building a semantic layer follows the DRY principle: “do not repeat yourself”. Do not define multiple times how a customer table is connected to a sales table, or how unique users are counted for different grains.

VIEWs vs Semantic Layer

A VIEW or a dbt model built with SQL has a static structure: specific metrics, grouped by specific dimensions, with specific filters applied.

A semantic layer on the other hand, is dynamic: every query can change filters or groupings of a metric.

SQL Queries for BI

For example, consider a unique user count metric built in SQL:

-- daily user count
SELECT
    sales.order_date,
    COUNT(DISTINCT sales.customer_id) AS user_count
FROM sales
GROUP BY 1

This query prepares the user count on a daily level. However, if would need distinct user count per month this query would not suffice. Instead, would need to create a different query:

-- monthly user count
SELECT
    DATE_TRUNC(month, sales.order_date) as order_month,
    COUNT(DISTINCT sales.customer_id) AS user_count
FROM sales
GROUP BY 1

More complexity would create more queries:

-- a basic user activity report for Tableau

CREATE VIEW user_report AS
-- user counts by geography and day
SELECT
    customer.geography,
    sales.order_date,
    COUNT(DISTINCT sales.customer_id) AS user_count
FROM sales
JOIN customer
ON sales.customer_id = customer.customer_id
GROUP BY customer.geography, sales.order_date
UNION ALL
-- total user counts by day across all geographies
SELECT
    NULL as geography,
    sales.order_date,
    COUNT(DISTINCT sales.customer_id) AS user_count
FROM sales
GROUP BY sales.order_date

Semantic Layer for BI

The Honeydew semantic layer operates differently:

  1. Set the relationship between sales and customer, connected on customer_id.
  2. Create a user_count metric defined as COUNT(DISTINCT sales.customer_id). Note the definition of a metric is a standalone function that can aggregate data.
  3. Connect Tableau via a virtual table

Instead of operating on the user_report view, Tableau would now operate on a virtual table called world.

Honeydew presents itself to any user tool as a virtual “flat” table.

A widget showing users by day and geography will run

-- customer count by day and geography
SELECT
    "customer.geography",
    "sales.order_date",
    SUM("sales.user_count")
-- world is a virtual table from Honeydew
FROM world
GROUP BY 1,2

It would actually run a query that joins sales and customer and builds the user_count aggregation accordingly.

Note that the query uses SUM as an aggregation. The user_count metric is actually a distinct count, but the tool that sends the SUM does not know that.

That means that user_count is a virtual aggregation - Honeydew translates the SUM to the right implementation automatically.

Another widget that only wants monthly totals would call

-- customer count by day and geography
SELECT
    DATE_TRUNC(month, "sales.order_date") as order_month,
    SUM("sales.user_count")
-- world is a virtual table from Honeydew
FROM world
GROUP BY 1

With a semantic layer, build the components of a view (the aggregations, the joins) instead of the view itself.

That means that those components can be reused for infinite number of ad-hoc questions, instead of the specific one that a view query was built for.

Differences in Modeling

When modeling a SQL query, it is typical to think of modeling as a pipeline:

  1. Transform the staging data to do X
  2. Transform that result to do Y
  3. Transform that result to do what the user asked for

When modeling a semantic layer, is it typical to think the other way around:

  1. What metrics & dimensions that users wants?
  2. What components can be used to build them?
  3. What data is missing to build those?

From SQL to Honeydew Entities

SELECT

A select statement in SQL creates a single query. In a semantic layer however you build the components of a SELECT (its joins, sources, calculations) separately, and then re-use them.

To create a specific SELECT query from components, use the Playground in Honeydew UI or a user entity in Honeydew must have like a SQL client connected to Honeydew.

It is very common to build components (like the calculation of user count) and then test them (by day, by month, by geography) in the playground.

Entities: FROM …

A granular table used in a SQL query corresponds to an Entity in Honeydew.

Entities can be based on a table or a view in the data warehouse, or on a custom SQL query.

An entity in Honeydew must have a primary key defined.

Users typically map into Honeydew only tables with a unique key, such as facts and dimensions.

If there isn’t a primary or a unique key present in a table, must create a key as part of the data preparation.

UNION

Honeydew does not support unions as an entity source.

However, it is possible to create an entity with a custom SQL query that performs a UNION:

SELECT * FROM usa_inventory
UNION ALL
SELECT * FROM ind_inventory

Relationships: JOIN ON …

SQL queries join multiple tables together.

Joins in Honeydew are defined once as Relationships between entities.

Relationships have similar properties to join such as:

  • Join fields: which fields connect the tables. Note there might be multiple fields (called a composite connection), or a calculation used for the join. Honeydew allows to use multiple fields to join and/or build and use calculated attributes to join on calculations.

    Honeydew does not support range joins, only equality joins.

  • Join Type: part of the relationship definition in Honeydew is the type of join - INNER, LEFT, RIGHT or FULL OUTER.

    Unlike queries, once a join type is set in the semantic layer all joins will use the chosen join type.

    Customizing join type in a specific metric calculation is a planned future extension.

There are also properties that have no equivalence to a query JOIN condition.

Those properties affect how the calculation is done after the JOIN is made:

  • Direction: whether its a one-to-many or a many-to-one join.

    • The direction of the join lets Honeydew know which rows will get duplicated after the join was done. Using that knowledge, Honeydew can correctly account for duplicated rows when aggregating the data.
    • Many-to-many relations require a connecting table.
  • Cross-filtering:

    • The cross filtering) of a join lets Honeydew know whether filters travel through that join.

    • Typically, in SQL, filters (WHERE) are performed after joins, so a field in one table can filter other tables. However, in a semantic layer might want to control whether filters on table X affect table Y or not.

      One particular application is when joining facts and dimensions: should a filter on one dimension affect another dimension when joined through the fact?

      The default is “both” which behaves like a SQL query where all filters affect all tables.

From SQL to Honeydew Attributes

Attributes: … AS field

SQL queries include calculations, for example:

SELECT
    DATE_TRUNC(month, sales.sales_date) as sales_month
FROM sales

Those calculation create new columns within a query. The equivalent in a semantic layer is a calculated attribute in a Honeydew entity.

Calculated attributes as SQL snippets that create “virtual columns” in entities:

sales.sales_month_num = MONTH(sales.sales_date)

Those fields can be reused for other fields, for example:

sales.is_q4 = sales.sales_month_num BETWEEN 10 AND 12

Window Functions

Calculated attributes can include window functions. Those window functions treat their entity as the table:

sales.cumulative_customer_sales =
SUM(sales.sales_amount) OVER (PARTITION BY sales.customer_id ORDER BY sales.order_date)

Window functions serve specific use cases that requires calculations between rows in the same entity, such as building sessions or cumulative sums.

Most aggregation use cases are served better by building metrics - see below.

Multi-table attributes

Calculated attributes in Honeydew can operate over more than one table.

For example, if there is a product entity that is related (directly or indirectly) to a sales entity, then can create:

sales.discount = product.unit_price - sales.net_price

Relationships and joins in Honeydew are automatically resolved.

Every expression can assume that all entities are accessible to it, and Honeydew will join as needed.

Expression that are impossible to compute (because there is no possible join to perform) would result in an error

From SQL to Honeydew Metrics

Metrics is where SQL and Honeydew differ the most.

Basic Aggregations

SQL does not have the concept of a standalone aggregation function. Aggregations in SQL always operate in a specific context (of a GROUP BY or a WHERE)

In a semantic layer like Honeydew a metric is a first-class citizen - a reusable custom aggregation function.

A basic metric is built like this:

sales.total_sales = SUM(sales.sales_amount)

Multi-table Metrics

Metrics in Honeydew can use more than one table (entity).

For example this metrics uses both the sales entity and the product entity:

sales.total_cost = SUM(sales.quantity * product.unit_price)

Honeydew resolves relationships automatically, performing joins as needed.

Derived Metrics

If a basic metric in Honeydew is an aggregation, a derived metric is one that is built from other metrics.

For example, can create this set of metrics:

-- basic metrics
sales.total_cost = SUM(sales.quantity * product.unit_price)

sales.total_amount = SUM(sales.sales_amount)

-- a profit metric is composed from two other metrics
sales.profit = sales.total_amount - sales.total_cost

There is no limit on metric composition: stack and reuse metrics in any way

User Context

Metrics are sensitive to the user query context.

That means that the computed result of a metric would depend on the query - the filters and groups in use.

User context includes:

  • Filtering - filters that come from a user query
  • Grouping - groups that come from a user query

Metrics in Honeydew can change the user context for their calculation, applying additional filters or changing group by.

It’s easy to think of each metric calculated in a Honeydew generated query as its own CTE.

By default, the CTE to calculate a metric would use the GROUP BY the user has asked for in the query, and would operate on the data as it was filtered by the user.

But what if want a particular metric to be calculated with a different GROUP BY? Or have another WHERE condition?

That is changing the user context for the metric.

SUM(CASE WHEN …) - Filtered Metrics

A common pattern in SQL is to use CASE WHEN to create a filtered aggregation:

sales.red_sales_amount =
SUM(CASE WHEN product.color = 'Red' THEN sales.sales_amount ELSE 0 END)

A filtered aggregation adds a filter on top of the user context.

Honeydew offers a different way called a filter qualifier to apply filters to metrics that are already defined:

-- a basic metric
sales.total_amount = SUM(sales.sales_amount)

-- a derived filtered metric
sales.red_sales_amount = sales.sales_amount FILTER (WHERE product.color = 'Red')

Filter qualifiers enable to create filtered metrics over any aggregation.

The default value for SUM and COUNT aggregations where there is no data available is automatically 0.

Other aggregations will have NULL as the value for an empty dataset.

Default values can also be controlled explicitly using the DEFAULT qualifier -

sales.red_sales_amount =
sales.sales_amount FILTER (WHERE product.color = 'Red') DEFAULT 0

Filter qualifiers can also apply directly to aggregations:

-- a filtered metric on an aggregation
sales.red_sales_amount =
SUM(sales.sales_amount) FILTER (WHERE product.color = 'Red')

It’s always recommended to use FILTER (WHERE ...) rather than a CASE WHEN in an aggregation - as that allows Honeydew to sometimes construct a more efficient query

Build Ratios by Combining Filtered Metrics

A common pattern in SQL is using the same aggregation with different filters in order to build a combined metric, like a ratio.

Consider the following SQL statement:

-- SQL query for sales of red products out of all product sales
SELECT
    sales.order_date,
    DIV0(
        -- Aggregate over red rows
        SUM(CASE WHEN product.color = 'Red' THEN sales.sales_amount END),
        -- Aggregate over all rows
        SUM(sales.sales_amount)
    ) as red_sales_out_of_all_sales
FROM sales
LEFT JOIN product
ON sales.product_id = product.product_id
GROUP BY order_date

The equivalent in Honeydew would be a derived metric:

-- a ratio metric for red sales
sales.red_sales_ratio =
DIV0(
    sales.total_amount FILTER (WHERE product.color = 'Red'),
    sales.total_amount
)

Can also use the sales.red_sales_amount defined above for the same result:

sales.red_sales_ratio =
-- a ratio metric for red sales
DIV0(
    sales.red_sales_amount,
    sales.total_amount
)

DIV0 is usually preferred to a division operator as it can gracefully handle division by zero.

This derived metric perform two actions:

  1. The numerator is aggregating the sales amount only on red product rows (note that happens after applying any user filters supplied in the query)
  2. The denominator is aggregating the sales amount over all products

The actual aggregation (how sales.total_amount is calculated) is defined once, and reused in the derived metric twice.

GROUP BY - Control Grouping

Metrics in Honeydew can also change the user grouping context.

Consider the following SQL statement that compares a total aggregation to an aggregation over a partial segment:

-- SQL query for product % contribution to total sales

WITH
total_sales AS (
    SELECT
        SUM(sales.sales_amount) as total_sales
    FROM sales
),
product_sales AS (
    SELECT
        product.product_id,
        SUM(sales.sales_amount) as product_sales
    FROM sales
    LEFT JOIN product
    ON sales.product_id = product.product_id
    GROUP BY product_id
)
SELECT product_id,
       DIV0(product_sales, total_sales) as product_contribution_to_total
FROM total_sales, product_sales

How would you build a similar metric in Honeydew?

Let’s build a metric in Honeydew that is even more dynamic: the % contribution to total sales of any user group. It can contribution of a product, or it can be any other group.

That metric would be:

-- Honeydew metric
sales.contribution_to_total =
DIV0(
    sales.total_amount,
    sales.total_amount GROUP BY ()
)

This derived metric perform two actions:

  1. The numerator is grouped by the user’s grouping (can be by product or otherwise)
  2. The denominator is grouped by nothing, removing any user groups

The GROUP BY (...) qualifier sets the grouping for a metric. It can be nothing (as in the example above), or it can be specific:

-- Honeydew metric
sales.contribution_to_yearly_sales =
DIV0(
    sales.total_amount,
    sales.total_amount GROUP BY (date.year)
)

Nested (Multi-Level) Aggregation

Another way to change the user grouping context is nesting aggregations.

Consider the following SQL statement:

-- SQL query for average daily sales for each product
WITH
daily_product_sales AS (
    SELECT
        product.product_id,
        sales.order_date,
        SUM(sales.sales_amount) as daily_product_sales
    FROM sales
    LEFT JOIN product
    ON sales.product_id = product.product_id
    GROUP BY product_id, order_date
)
SELECT
    product_id,
    AVG(daily_product_sales) as daily_avg_product_sales
FROM daily_product_sales

How would you build a similar metric in Honeydew?

But one that is more dynamic: one that can adjust to any user group

That metric would be:

-- Honeydew metric
sales.avg_daily_sales = AVG( sales.total_sales GROUP BY (*, sales.order_date) )

This derived metric performs two actions:

  1. The internal aggregation is grouped by the user’s grouping + date
  2. The external aggregation is grouped by the user’s grouping only

If that metric is grouped by product, the result would be average daily product sales.

If that metric is grouped by customer segment, the result would be average daily segment sales.

If that metric is grouped by product and customer segment, the result would be average daily sales per product, per segment.

HAVING - Using Metrics as Filters

Metric in Honeydew can also be used as filters for further calculations, similar to how HAVING operates in SQL.

Consider the following SQL statement:

-- SQL query for average sale amount, ignoring products with less than $100 in daily sales

-- list products that had more than $100 in sales per day
WITH daily_active_products AS (
    SELECT
        sales.order_date,
        sales.product_id,
    FROM sales
    GROUP BY ALL
    HAVING SUM(sales.sales_amount) > 100
)
-- average sale amount by state, only for products that had >$100 daily sales volume
SELECT
    location.state,
    AVG(sales.sales_amount) as avg_sales_of_active_products
FROM sales
INNER JOIN daily_active_products ON sales.product_id = daily_active_products.product_id AND sales.order_date = daily_active_products.order_date
LEFT JOIN location ON sales.location_id = location.location_id

That calculates average sales of active products per state.

The equivalent Honeydew metric would be dynamic to any user choice of a grouping, allowing to slice average sales of active products by location or any other user grouping choice.

That expression would be a filtered metric based on a metric:

sales.avg_sales_of_active_products =
-- Honeydew metric for average sale amount for active products
AVG(sales.sales_amount)
FILTER (WHERE
    sales.total_amount
    GROUP BY (sales.order_date, sales.product_id) > 100
)
  1. The sales.total_amount metric is used in the filter with a specific grouping (daily products) and a filter (over $100 in sales)
  2. The external aggregation is an AVG that operates on any data that passed the internal filter.

If the metric above is used in a query that groups by location.state then it would return it grouped by state.

It can also be used with any other grouping!

WITH … (CTEs)

CTEs in SQL are commonly used to create aggregated data as part of a larger calculation, for example:

-- sales per customer
WITH customer_sales AS (
    SELECT
        customer_id,
        SUM(sales.sales_amount) as customer_sales
    FROM sales
    JOIN customer
    ON sales.customer_id = sales.customer_id
    GROUP BY customer_id
)

-- average customer sales by customer type
-- customer type - VIP for customers with >$1000 of total sales
SELECT
    CASE WHEN cs.customer_sales > 1000 THEN "VIP" ELSE "NOT VIP" END AS customer_type
    AVG(cs.customer_sales) as avg_customer_sales
FROM customer
JOIN customer_sales
ON customer.customer_id = customer.customer_id
GROUP BY 1

Calculated attributes in Honeydew can use metrics, which is a very common translation for CTEs.

The equivalent for that expression would be a calculated attribute in customer:

-- calculated attribute in Honeydew customer entity
customer.customer_sales = SUM(sales.sales_amount)

If a calculated attribute in entity uses an aggregation, then that aggregation is grouped to the entity primary key (customer_id in the example above).

And a derived attribute

-- calculated attribute in Honeydew customer entity
customer.customer_type = CASE customer.customer_sales  > 1000 THEN "VIP" ELSE "NOT VIP" END

A better deconstruction for that CTE would build a metric for sales:

--- metric in Honeydew sales entity
sales.total_sales = SUM(sales.sales_amount)

And reuse the metric in the customer entity

--- calculated attribute in Honeydew customer entity
customer.customer_sales = sales.total_sales

That would behave the same way as using the aggregation.