From SQL to Honeydew
This guide helps people who are familiar with building SQL queries or dbt models to understand the concepts of Honeydew.
If you are familiar with semantic layers in BI or are not familiar with SQL, head over to the introduction instead.
Introduction
Honeydew is a standalone semantic layer.
It has many similarities to semantic layers that are embedded in BI tools such as Tableau, Looker or Power BI. However, unlike semantic layers embedded in a particular BI tool, Honeydew can be used by any data tool or SQL client.
Semantics include the following concepts:
- How business entities (such as “customer”) are mapped to data sources (which tables to read from)
- How data tables are related (which joins to use)
- How metrics are calculated (which aggregations to perform)
- Metadata (such as descriptions, display names, formatting) associated with presenting those objects to business users.
Those metrics are then used in by BI tools, reports, notebooks, and AI agents for data queries.
Most SQL queries or dbt models for a specific use case would include (1) - (3): join and aggregate the data to a level a BI tool can use.
BI tools with dashboard for that use case would include metadata (4), and some aggregation logic (3) - when aggregating data further for a specific plot or a widget.
For example, consider a Tableau report on sales activity:
- Most commonly it would load data from a SQL view or a SQL table built for it with aggregated data relevant for the report: daily sales metrics, broken down by many dimensions such as customer segments or geographies.
- The Tableau workbooks on top of that table would present widgets like daily growth. Those widgets would aggregate data from the table for the widget - i.e. sum the total sales metrics across all customer segments to present total growth.
The query that builds the sales activity data for the dashboard might share logic with queries that build other reports like a sales pipeline report.
The core job of a semantic compiler such as Honeydew is to act is an intermediate layer between data and user tools, separating the semantics into their own layer.
Building a semantic layer follows the DRY principle: “do not repeat yourself”. Do not define multiple times how a customer table is connected to a sales table, or how unique users are counted for different grains.
VIEWs vs Semantic Layer
A VIEW or a dbt model built with SQL has a static structure: specific metrics, grouped by specific dimensions, with specific filters applied.
A semantic layer on the other hand, is dynamic: every query can change filters or groupings of a metric.
SQL Queries for BI
For example, consider a unique user count metric built in SQL:
This query prepares the user count on a daily level. However, if would need distinct user count per month this query would not suffice. Instead, would need to create a different query:
More complexity would create more queries:
Semantic Layer for BI
The Honeydew semantic layer operates differently:
- Set the relationship between
sales
andcustomer
, connected oncustomer_id
. - Create a
user_count
metric defined asCOUNT(DISTINCT sales.customer_id)
. Note the definition of a metric is a standalone function that can aggregate data. - Connect Tableau via a virtual table
Instead of operating on the user_report
view, Tableau would now operate on a virtual table called world
.
Honeydew presents itself to any user tool as a virtual “flat” table.
A widget showing users by day and geography will run
It would actually run a query that joins sales
and customer
and builds the user_count
aggregation accordingly.
Note that the query uses SUM
as an aggregation. The user_count
metric is actually a distinct count, but the tool that sends the SUM
does not know that.
That means that user_count
is a virtual aggregation - Honeydew translates the SUM
to the right implementation automatically.
Another widget that only wants monthly totals would call
With a semantic layer, build the components of a view (the aggregations, the joins) instead of the view itself.
That means that those components can be reused for infinite number of ad-hoc questions, instead of the specific one that a view query was built for.
Differences in Modeling
When modeling a SQL query, it is typical to think of modeling as a pipeline:
- Transform the staging data to do X
- Transform that result to do Y
- Transform that result to do what the user asked for
When modeling a semantic layer, is it typical to think the other way around:
- What metrics & dimensions that users wants?
- What components can be used to build them?
- What data is missing to build those?
From SQL to Honeydew Entities
SELECT
A select statement in SQL creates a single query. In a semantic layer however you build the components of a SELECT (its joins, sources, calculations) separately, and then re-use them.
To create a specific SELECT query from components, use the Playground in Honeydew UI or a user entity in Honeydew must have like a SQL client connected to Honeydew.
It is very common to build components (like the calculation of user count) and then test them (by day, by month, by geography) in the playground.
Entities: FROM …
A granular table used in a SQL query corresponds to an Entity in Honeydew.
Entities can be based on a table or a view in the data warehouse, or on a custom SQL query.
An entity in Honeydew must have a primary key defined.
Users typically map into Honeydew only tables with a unique key, such as facts and dimensions.
If there isn’t a primary or a unique key present in a table, must create a key as part of the data preparation.
UNION
Honeydew does not support unions as an entity source.
However, it is possible to create an entity with a custom SQL query that performs a UNION:
Relationships: JOIN ON …
SQL queries join multiple tables together.
Joins in Honeydew are defined once as Relationships between entities.
Relationships have similar properties to join such as:
-
Join fields: which fields connect the tables. Note there might be multiple fields (called a composite connection), or a calculation used for the join. Honeydew allows to use multiple fields to join and/or build and use calculated attributes to join on calculations.
Honeydew does not support range joins, only equality joins.
-
Join Type: part of the relationship definition in Honeydew is the type of join -
INNER
,LEFT
,RIGHT
orFULL OUTER
.Unlike queries, once a join type is set in the semantic layer all joins will use the chosen join type.
Customizing join type in a specific metric calculation is a planned future extension.
There are also properties that have no equivalence to a query JOIN condition.
Those properties affect how the calculation is done after the JOIN is made:
-
Direction: whether its a one-to-many or a many-to-one join.
- The direction of the join lets Honeydew know which rows will get duplicated after the join was done. Using that knowledge, Honeydew can correctly account for duplicated rows when aggregating the data.
- Many-to-many relations require a connecting table.
-
Cross-filtering:
-
The cross filtering) of a join lets Honeydew know whether filters travel through that join.
-
Typically, in SQL, filters (
WHERE
) are performed after joins, so a field in one table can filter other tables. However, in a semantic layer might want to control whether filters on table X affect table Y or not.One particular application is when joining facts and dimensions: should a filter on one dimension affect another dimension when joined through the fact?
The default is “both” which behaves like a SQL query where all filters affect all tables.
-
From SQL to Honeydew Attributes
Attributes: … AS field
SQL queries include calculations, for example:
Those calculation create new columns within a query. The equivalent in a semantic layer is a calculated attribute in a Honeydew entity.
Calculated attributes as SQL snippets that create “virtual columns” in entities:
Those fields can be reused for other fields, for example:
Window Functions
Calculated attributes can include window functions. Those window functions treat their entity as the table:
Window functions serve specific use cases that requires calculations between rows in the same entity, such as building sessions or cumulative sums.
Most aggregation use cases are served better by building metrics - see below.
Multi-table attributes
Calculated attributes in Honeydew can operate over more than one table.
For example, if there is a product
entity that is related (directly or indirectly) to a sales
entity, then can create:
Relationships and joins in Honeydew are automatically resolved.
Every expression can assume that all entities are accessible to it, and Honeydew will join as needed.
Expression that are impossible to compute (because there is no possible join to perform) would result in an error
From SQL to Honeydew Metrics
Metrics is where SQL and Honeydew differ the most.
Basic Aggregations
SQL does not have the concept of a standalone aggregation function. Aggregations in SQL always operate in a specific context (of a GROUP BY or a WHERE)
In a semantic layer like Honeydew a metric is a first-class citizen - a reusable custom aggregation function.
A basic metric is built like this:
Multi-table Metrics
Metrics in Honeydew can use more than one table (entity).
For example this metrics uses both the sales
entity and the product
entity:
Honeydew resolves relationships automatically, performing joins as needed.
Derived Metrics
If a basic metric in Honeydew is an aggregation, a derived metric is one that is built from other metrics.
For example, can create this set of metrics:
There is no limit on metric composition: stack and reuse metrics in any way
User Context
Metrics are sensitive to the user query context.
That means that the computed result of a metric would depend on the query - the filters and groups in use.
User context includes:
- Filtering - filters that come from a user query
- Grouping - groups that come from a user query
Metrics in Honeydew can change the user context for their calculation, applying additional filters or changing group by.
It’s easy to think of each metric calculated in a Honeydew generated query as its own CTE.
By default, the CTE to calculate a metric would use the GROUP BY
the user has asked for in the query, and would operate
on the data as it was filtered by the user.
But what if want a particular metric to be calculated with a different GROUP BY
? Or have another WHERE
condition?
That is changing the user context for the metric.
SUM(CASE WHEN …) - Filtered Metrics
A common pattern in SQL is to use CASE WHEN
to create a filtered aggregation:
A filtered aggregation adds a filter on top of the user context.
Honeydew offers a different way called a filter qualifier to apply filters to metrics that are already defined:
Filter qualifiers enable to create filtered metrics over any aggregation.
The default value for SUM
and COUNT
aggregations where there is no data available is automatically 0.
Other aggregations will have NULL
as the value for an empty dataset.
Default values can also be controlled explicitly using the DEFAULT
qualifier -
Filter qualifiers can also apply directly to aggregations:
It’s always recommended to use FILTER (WHERE ...)
rather than a CASE WHEN
in an aggregation - as that allows Honeydew
to sometimes construct a more efficient query
Build Ratios by Combining Filtered Metrics
A common pattern in SQL is using the same aggregation with different filters in order to build a combined metric, like a ratio.
Consider the following SQL statement:
The equivalent in Honeydew would be a derived metric:
Can also use the sales.red_sales_amount
defined above for the same result:
DIV0
is usually preferred to a division operator as it can gracefully handle division by zero.
This derived metric perform two actions:
- The numerator is aggregating the sales amount only on red product rows (note that happens after applying any user filters supplied in the query)
- The denominator is aggregating the sales amount over all products
The actual aggregation (how sales.total_amount
is calculated) is defined once, and reused in the derived metric twice.
GROUP BY - Control Grouping
Metrics in Honeydew can also change the user grouping context.
Consider the following SQL statement that compares a total aggregation to an aggregation over a partial segment:
How would you build a similar metric in Honeydew?
Let’s build a metric in Honeydew that is even more dynamic: the % contribution to total sales of any user group. It can contribution of a product, or it can be any other group.
That metric would be:
This derived metric perform two actions:
- The numerator is grouped by the user’s grouping (can be by product or otherwise)
- The denominator is grouped by nothing, removing any user groups
The GROUP BY (...)
qualifier sets the grouping for a metric. It can be nothing (as in the example above),
or it can be specific:
Nested (Multi-Level) Aggregation
Another way to change the user grouping context is nesting aggregations.
Consider the following SQL statement:
How would you build a similar metric in Honeydew?
But one that is more dynamic: one that can adjust to any user group
That metric would be:
This derived metric performs two actions:
- The internal aggregation is grouped by the user’s grouping + date
- The external aggregation is grouped by the user’s grouping only
If that metric is grouped by product, the result would be average daily product sales.
If that metric is grouped by customer segment, the result would be average daily segment sales.
If that metric is grouped by product and customer segment, the result would be average daily sales per product, per segment.
HAVING - Using Metrics as Filters
Metric in Honeydew can also be used as filters for further calculations, similar to how HAVING
operates in SQL.
Consider the following SQL statement:
That calculates average sales of active products per state.
The equivalent Honeydew metric would be dynamic to any user choice of a grouping, allowing to slice average sales of active products by location or any other user grouping choice.
That expression would be a filtered metric based on a metric:
- The
sales.total_amount
metric is used in the filter with a specific grouping (daily products) and a filter (over $100 in sales) - The external aggregation is an
AVG
that operates on any data that passed the internal filter.
If the metric above is used in a query that groups by location.state
then it would return it grouped by state.
It can also be used with any other grouping!
WITH … (CTEs)
CTEs in SQL are commonly used to create aggregated data as part of a larger calculation, for example:
Calculated attributes in Honeydew can use metrics, which is a very common translation for CTEs.
The equivalent for that expression would be a calculated attribute in customer:
If a calculated attribute in entity uses an aggregation, then that aggregation is grouped to the
entity primary key (customer_id
in the example above).
And a derived attribute
A better deconstruction for that CTE would build a metric for sales:
And reuse the metric in the customer entity
That would behave the same way as using the aggregation.
Was this page helpful?