This guide helps people who are familiar with building SQL queries or dbt models to understand the concepts of Honeydew.If you are familiar with semantic layers in BI or are not familiar with SQL, head over to the introduction instead.
Introduction
Honeydew is a standalone semantic layer. It has many similarities to semantic layers that are embedded in BI tools such as Tableau, Looker or Power BI. However, unlike semantic layers embedded in a particular BI tool, Honeydew can be used by any data tool or SQL client. Semantics include the following concepts:- How business entities (such as “customer”) are mapped to data sources (which tables to read from)
- How data tables are related (which joins to use)
- How metrics are calculated (which aggregations to perform)
- Metadata (such as descriptions, display names, formatting) associated with presenting those objects to business users.
For example, consider a Tableau report on sales activity:
- Most commonly it would load data from a SQL view or a SQL table built for it with aggregated data relevant for the report: daily sales metrics, broken down by many dimensions such as customer segments or geographies.
- The Tableau workbooks on top of that table would present widgets like daily growth. Those widgets would aggregate data from the table for the widget - i.e. sum the total sales metrics across all customer segments to present total growth.
VIEWs vs Semantic Layer
A VIEW or a dbt model built with SQL has a static structure: specific metrics, grouped by specific dimensions, with specific filters applied. A semantic layer on the other hand, is dynamic: every query can change filters or groupings of a metric.SQL Queries for BI
For example, consider a unique user count metric built in SQL:Semantic Layer for BI
The Honeydew semantic layer operates differently:- Set the relationship between
sales
andcustomer
, connected oncustomer_id
. - Create a
user_count
metric defined asCOUNT(DISTINCT sales.customer_id)
. Note the definition of a metric is a standalone function that can aggregate data. - Connect Tableau via a virtual table
user_report
view, Tableau would now operate on a virtual table called world
.
Honeydew presents itself to any user tool as a virtual “flat” table.
sales
and customer
and builds the user_count
aggregation accordingly.
Note that the query uses
SUM
as an aggregation. The user_count
metric is actually a distinct count, but the tool that sends the SUM
does not know that.That means that user_count
is a virtual aggregation - Honeydew translates the SUM
to the right implementation automatically.With a semantic layer, build the components of a view (the aggregations, the joins) instead of the view itself.That means that those components can be reused for infinite number of ad-hoc questions, instead of the specific one that a view query was built for.
Differences in Modeling
When modeling a SQL query, it is typical to think of modeling as a pipeline:- Transform the staging data to do X
- Transform that result to do Y
- Transform that result to do what the user asked for
- What metrics & dimensions that users wants?
- What components can be used to build them?
- What data is missing to build those?
From SQL to Honeydew Entities
SELECT
A select statement in SQL creates a single query. In a semantic layer however you build the components of a SELECT (its joins, sources, calculations) separately, and then re-use them.To create a specific SELECT query from components, use the Playground in Honeydew UI or a user entity in Honeydew must have like a SQL client connected to Honeydew.It is very common to build components (like the calculation of user count) and then test them (by day, by month, by geography) in the playground.
Entities: FROM …
A granular table used in a SQL query corresponds to an Entity in Honeydew. Entities can be based on a table or a view in the data warehouse, or on a custom SQL query.An entity in Honeydew must have a primary key defined.Users typically map into Honeydew only tables with a unique key, such as facts and dimensions.If there isn’t a primary or a unique key present in a table, must create a key as part of the data preparation.
UNION
Honeydew does not support unions as an entity source. However, it is possible to create an entity with a custom SQL query that performs a UNION:Relationships: JOIN ON …
SQL queries join multiple tables together. Joins in Honeydew are defined once as Relationships between entities. Relationships have similar properties to join such as:-
Join fields: which fields connect the tables. Note there might be multiple fields (called a composite connection), or a calculation used for the join.
Honeydew allows to use multiple fields to join and/or build and use calculated attributes to join on calculations.
Honeydew does not support range joins, only equality joins.
-
Join Type: part of the relationship definition in Honeydew is the type of join -
INNER
,LEFT
,RIGHT
orFULL OUTER
.Unlike queries, once a join type is set in the semantic layer all joins will use the chosen join type.Customizing join type in a specific metric calculation is a planned future extension.
-
Direction: whether its a one-to-many or a many-to-one join.
- The direction of the join lets Honeydew know which rows will get duplicated after the join was done. Using that knowledge, Honeydew can correctly account for duplicated rows when aggregating the data.
- Many-to-many relations require a connecting table.
-
Cross-filtering:
- The cross filtering) of a join lets Honeydew know whether filters travel through that join.
-
Typically, in SQL, filters (
WHERE
) are performed after joins, so a field in one table can filter other tables. However, in a semantic layer might want to control whether filters on table X affect table Y or not. One particular application is when joining facts and dimensions: should a filter on one dimension affect another dimension when joined through the fact? The default is “both” which behaves like a SQL query where all filters affect all tables.
From SQL to Honeydew Attributes
Attributes: … AS field
SQL queries include calculations, for example:Window Functions
Calculated attributes can include window functions. Those window functions treat their entity as the table:Window functions serve specific use cases that requires calculations between rows in the same entity, such as building sessions
or cumulative sums.Most aggregation use cases are served better by building metrics - see below.
Multi-table attributes
Calculated attributes in Honeydew can operate over more than one table. For example, if there is aproduct
entity that is related (directly or indirectly) to a sales
entity, then can create:
Relationships and joins in Honeydew are automatically resolved.Every expression can assume that all entities are accessible to it, and Honeydew will join as needed.Expression that are impossible to compute (because there is no possible join to perform) would result in an error
From SQL to Honeydew Metrics
Metrics is where SQL and Honeydew differ the most.
Basic Aggregations
SQL does not have the concept of a standalone aggregation function. Aggregations in SQL always operate in a specific context (of a GROUP BY or a WHERE) In a semantic layer like Honeydew a metric is a first-class citizen - a reusable custom aggregation function. A basic metric is built like this:Multi-table Metrics
Metrics in Honeydew can use more than one table (entity). For example this metrics uses both thesales
entity and the product
entity:
Honeydew resolves relationships automatically, performing joins as needed.
Derived Metrics
If a basic metric in Honeydew is an aggregation, a derived metric is one that is built from other metrics. For example, can create this set of metrics:There is no limit on metric composition: stack and reuse metrics in any way
User Context
Metrics are sensitive to the user query context. That means that the computed result of a metric would depend on the query - the filters and groups in use. User context includes:- Filtering - filters that come from a user query
- Grouping - groups that come from a user query
It’s easy to think of each metric calculated in a Honeydew generated query as its own CTE.By default, the CTE to calculate a metric would use the
GROUP BY
the user has asked for in the query, and would operate
on the data as it was filtered by the user.But what if want a particular metric to be calculated with a different GROUP BY
? Or have another WHERE
condition?That is changing the user context for the metric.SUM(CASE WHEN …) - Filtered Metrics
A common pattern in SQL is to useCASE WHEN
to create a filtered aggregation:
The default value for
SUM
and COUNT
aggregations where there is no data available is automatically 0.Other aggregations will have NULL
as the value for an empty dataset.Default values can also be controlled explicitly using the DEFAULT
qualifier -It’s always recommended to use
FILTER (WHERE ...)
rather than a CASE WHEN
in an aggregation - as that allows Honeydew
to sometimes construct a more efficient queryBuild Ratios by Combining Filtered Metrics
A common pattern in SQL is using the same aggregation with different filters in order to build a combined metric, like a ratio. Consider the following SQL statement:sales.red_sales_amount
defined above for the same result:
DIV0
is usually preferred to a division operator as it can gracefully handle division by zero.- The numerator is aggregating the sales amount only on red product rows (note that happens after applying any user filters supplied in the query)
- The denominator is aggregating the sales amount over all products
The actual aggregation (how
sales.total_amount
is calculated) is defined once, and reused in the derived metric twice.GROUP BY - Control Grouping
Metrics in Honeydew can also change the user grouping context. Consider the following SQL statement that compares a total aggregation to an aggregation over a partial segment:- The numerator is grouped by the user’s grouping (can be by product or otherwise)
- The denominator is grouped by nothing, removing any user groups
The
GROUP BY (...)
qualifier sets the grouping for a metric. It can be nothing (as in the example above),
or it can be specific:Nested (Multi-Level) Aggregation
Another way to change the user grouping context is nesting aggregations. Consider the following SQL statement:- The internal aggregation is grouped by the user’s grouping + date
- The external aggregation is grouped by the user’s grouping only
If that metric is grouped by product, the result would be average daily product sales.If that metric is grouped by customer segment, the result would be average daily segment sales.If that metric is grouped by product and customer segment, the result would be average daily sales per product, per segment.
HAVING - Using Metrics as Filters
Metric in Honeydew can also be used as filters for further calculations, similar to howHAVING
operates in SQL.
Consider the following SQL statement:
- The
sales.total_amount
metric is used in the filter with a specific grouping (daily products) and a filter (over $100 in sales) - The external aggregation is an
AVG
that operates on any data that passed the internal filter.
If the metric above is used in a query that groups by
location.state
then it would return it grouped by state.It can also be used with any other grouping!WITH … (CTEs)
CTEs in SQL are commonly used to create aggregated data as part of a larger calculation, for example:If a calculated attribute in entity uses an aggregation, then that aggregation is grouped to the
entity primary key (
customer_id
in the example above).A better deconstruction for that CTE would build a metric for sales:And reuse the metric in the customer entityThat would behave the same way as using the aggregation.