Metrics
Introduction to Metrics
A metric is an aggregation that is counted on many rows (unique instances) of an entity.
Unlike attributes that are like a “virtual column” a metric has no direct equivalent in SQL: it is a standalone aggregation, without a GROUP BY
.
Metrics are the most important building block of a semantic layer, as most business analytic use cases are centered around KPIs.
A metric is a highly reusable building block:
- It can be sliced and diced in any way a user wants with a Dynamic Dataset or a BI tool using JDBC
- It can be used to build new metrics that combine other metrics
- It can be used to build new attributes that aggregate to a specific entity
- It can used to build derived filtered metrics by applying a filter qualifier
- It can used to build derived metrics that control grouping to create nested aggregations or dynamic ratio metrics
See examples below.
How Metrics and Entities relate?
An entity defines a granularity (like a customer or an order). A metric on an entity is aggregated on that granularity (like counting customers, or summing up orders).
The entity a metric is defined on is very important. Every time a metric is calculated Honeydew will make sure it is calculated at the right level.
In Honeydew, metrics must be associated with a granularity they are counted on.
This is different than a grouping (“Level of Detail”)! A count of orders
can be monthly or weekly, by customer or by order type. But regardless to how it is grouped, it always counts orders.
Metadata
Metrics may include metadata such as their owner, business description, labels. See metadata section for more details.
Sensitivity to User Context
Metrics are functions that are sensitive to the user query context.
That means that the computed result of a metric would depend on the query - the filters and groups in use.
User context includes:
- Filtering - filters can come from a user query or from a domain.
- Grouping - groups can come from the user query.
For example, when using the SQL interface to query the semantic layer, the following query:
Applies the following context to the lineitem.revenue
metric:
- Filter only customers in the machinery market segment
- Group them by order date
Revenue is thus calculated per date, only for machinery customers.
Metrics by default operate by the full user query context.
However, some metrics can change how they interact with the user context - for example, disregard a chosen grouping or add a filtering.
That is achieved by using metric qualifiers such as FILTER (WHERE ...)
and GROUP BY
. See more below.
Basic metric
A basic metric only depends on attributes in its entity.
For example, we can define a revenue
metric on that counts the total revenue:
This code is standard SQL. Anything in SQL that is allowed to be used in an aggregation (like functions, nested data unpacking, type conversions, etc..) just works.
It is a reusable aggregation, that can be grouped by, filtered, and reused by the user in different ways.
But unlike a calculated attribute, a metric is dynamic - it only has value with a grouping.
Things like are not possible in an SQL aggregation (like window functions) are not possible as-is in metric definition as well. However, any logic that is encapsuled in an attribute is possible - so it is possible to use a window function as part of a calculated attribute definition, and then aggregate on that.
Count or distinct count?
Metrics are automatically calculated of the level of their entity granularity key. When different metrics are combined, Honeydew will make sure to calculate each on its right granularity.
For example, consider a count
metric in customers
that is defined as COUNT(customers.custkey)
. Since this is defined on customers, it will always count distinct customers. There is no need to ask for COUNT(**DISTINCT** customers.custkey)
,
Furthermore Every aggregation (like SUM
, MEDIAN
, AVG
, …) on customers
will always be calculated on distinct customers automatically.
If you build COUNT(DISTINCT ...)
aggregations on an ID field that has no corresponding entity, consider making a virtual entity for
it and corresponding attributes. Then use its count
metric instead.
This practice of creating new dimension tables from a fact table is sometimes called normalizing a schema. Separating denormalized fields to a separate virtual entity would allow to add more properties and metrics deduplicated to the grain of that ID.
Which attributes a metric can use?
A metric can use in its calculation every attribute from the same or lower granularity, similar to a calculated attribute. See the combining entities section of calculated attributes for details.
Derived Metrics
Like calculated attributes, metrics can also be reused as building blocks.
To reuse a metric, refer to it by name within an expression.
For example, can build a profit metric using the revenue metric:
The metric above takes the revenue
metric as a function, and uses it as part of a new metric definition.
Break your derived metrics into named components for maximal reusability.
For example, can write the metric as two metrics:
- A
cost_of_goods
metric, defined as:
- A
profit
metric, defined as:
Default Value
What is the value of a metric when there is no data?
For example, the SUM
for no data is typically expected to be 0
, not NULL
.
When a metric is aggregated within a group, there are a few cases which can be considered as having no data:
- Outer joins that create empty (
NULL
) data in rows where the JOIN has a valid key only on one side of the join. - Filtered metrics that create groups that have no data due to the applied filter
- Missing data - when the values of all aggregated rows from the source data are
NULL
Metrics as such would typically receive the value NULL
. To support other default values, can use the DEFAULT
qualifier:
The default value for a SUM
or a COUNT
aggregation is automatically 0. There is no need to explicitly set it up.
See more below on automatic default values.
This qualifier can also be applied directly to a metric when creating a derived metric:
Can use any literal for default value and not only numbers. For example:
Automatic Default Value
Honeydew automatically applies the default value of 0 for SUM
and COUNT
aggregations.
To disable the default value, use DEFAULT NULL
:
No Data vs. Missing Data
Honeydew treats a NULL
result from a metric calculation the same as an empty group when determining a default value.
That means that missing data (all NULL
rows) and no data (empty group) are handled the same way.
To handle only missing data, you can use COALESCE
within the metric definition.
For example, this expression sets the average of all-NULL data to be 0:
If all data in a group is NULL
, the expression above will return a value of 0.
However, if there is no data in a group by, the value of the expression above would be NULL
.
The reason: if there are no rows to aggregate then the expression will not be executed at all.
To deal with empty groups AND missing data use DEFAULT
:
Changing User Context
By default, metrics are filtered and grouped according to the user query context.
However there are qualifiers that change the context:
- A
FILTER
qualifier will change which rows are used for aggregation - A
GROUP BY
qualifier will change how the metric is grouped
Those qualifiers can be added to any aggregation, for example:
When a metric is reused with in a derived metric, can apply qualifiers to change how it behaves.
The equivalent derived metric (if orders.min_order_date
is already defined):
Derived metrics can behave differently from their parents.
Filtered metrics
In the TPC-H sample dataset, the attribute part.type
that described the type of a part sold in the store will include the word “PROMO” is that part is a promotion.
Consider the following metric that calculates revenue without promotions called revenue_from_promotions
:
This is a filtered metric, that uses the revenue metric as a building block.
Note that this metric on lineitems
used an attribute from parts
. Metrics
can span multiple entities! Any attribute of same or lower granularity can be
used. See more in combining entities for calculated
attributes
Filtered metrics with attribute lookup
What if the potential values for a filter come from an attribute, instead of being defined in the filtered metric expression?
For example, assume there is a table promotion_part_types
that has a list of every promotion type.
A promotion revenue is one that has a part type matching the
list of values on that table.
This is called a lookup filter:
The expression above would generate a sub-query to get the values from promotion_part_types
and
then filter on the results of that sub-query.
Filters over the promotion_part_types
entity can further reduce
the possible values if they run before the metric computation (see
filtering order for more details)
Filtered metrics in ratios
Metrics can be combined together, for example to create a ratio metric.
For example, the promo_revenue
metric is defined as the ratio of revenue that is coming from promotions. Its code is:
Or can use the revenue_from_promotions
defined above:
Time-Based Metrics
Most metrics operate on time and can build a time series or time related calculations (i.e. “year-to-date revenue”).
When building metrics, can define how a metric is related to a time spine. See time metrics for more details.
Controlling Grouping
A metric is normally grouped by the user definition (as set in a Dynamic Dataset or via a SQL query on JDBC).
But there are many cases where would want to reuse the same metric while controlling the grouping, such as:
- Nested aggregations (monthly average daily customer count)
- Ratio metrics (daily customer count as a ratio of monthly customers count)
And many others.
Grouping of a metric is sometimes called its “Level of Detail”.
Given a user context grouping (the GROUP BY as asked for by a user in a query), there are different ways to control grouping:
- From the user query context (the default)
- Fixed (use a preset group, ignoring what the user chose)
- Include (add a group to the user context grouping)
- Exclude (remove a group from the user context grouping)
See examples and use cases for each below:
Fixed Grouping
This metric is always grouped by the date
This is useful for ratio metrics such as “the revenue as a ratio of daily revenue”
Fixed Empty Grouping
A specific case of fixed grouping is to group by nothing with GROUP BY ()
.
For example this metric given the percentage of revenue out of all total revenue
The query above operates after user context filters, so that metric will give the percentage out of revenue after filtering.
For example, if used in a query that is grouped by customer segment and filtered for the year 2000 only, then the result would be the percentage of revenue per segment out of all segments, in 2000 only.
Calculated attributes using metrics with fixed grouping
Fixed grouping can be combined with calculated attributes (see above).
By default, a metric in a calculated attribute is grouped by the granularity of its entity.
However, can change this with an explicit GROUP BY
.
An attribute in customer
, is computed at the customer granularity:
It is the same as defining it in customer
grouped by the customer key
But can remember per customer the metric aggregated by other groups:
This will add to a customer a calculated attribute of the total revenue in the customer segment.
When creating calculated attributes with a grouping that is different than the entity key, note that the values will repeat with a different grain that the enclosing entity.
For example, when grouping by customer market segment in the customer entity, then all customers in the machinery market segment will have the same revenue.
This can be dangerous if used for a nested aggregation - for example if need the average revenue by market segment calculating it over a field in the customer entity would result in a biased result.
See below how to build nested aggregations correctly.
Dynamic Grouping: Removing Groups
By default, a metric is grouped by the user query context.
With the GROUP BY (NOT ...)
qualifier can remove a group from the context.
For example,
Removes grouping per customer key.
Private preview functionality: remove all the groupings on an entity.
Dynamic Grouping: Adding Groups and Nested Aggregation
It is possible to add to the user context groups additional groups, like this lineitems.daily_revenue
metric:
This would be grouped by user choice + the date of the order.
The main use case for inclusive grouping is nested aggregation.
For example, given daily_revenue
, this metric calculates the average daily revenue, grouped to the user context:
The way it works:
- Given a user context grouping (for example customer segment)
- The inner aggregation is grouped by user choice AND date. Here that would be revenue by date and customer segment.
- The outer aggregation (
AVG
) would be grouped by user choice only. The AVG will average over dates, while keeping the customer segment grouping.
The expression above can be built as a single expression:
Note that aggregations can be nested more than once, changing groups at every step.
Calculated attributes using metrics
When a metric or an aggregation is used in a calculated attribute it is computed at the granularity of the entity.
For example, you may want to look for each customer its total lifetime revenue. There is a revenue
metric in lineitems
that sums up the revenue.
To project it to a specific customer, make a calculated attribute in customers
with the following SQL:
When it is a calculated attribute it means “For each row of customer that has a unique customer id, calculate the total lineitems.revenue
”.
Once a metric or aggregation is referenced in a calculated attribute it loses its dynamic behavior. The total_lifetime_revenue
of a customer is the same regardless of the user query.
Most use cases that involve multiple computation steps will be calculated attributes on metrics.
Another example would be to have the first order date of a customer as one its properties.
For that, build as a calculated attribute in customer
entity.
Can reference a metric by name (such as lineitems.revenue
) or just write the aggregation SQL. Both work!
Usually would make a named metric if it is reusable in other scenarios, and an ad-hoc aggregation expression otherwise.
Multi-level Metrics using Calculated Attributes
Metrics and calculated attributes can be used in sequence. For example. a metric to count the average orders of a customer.
Build as a metric in customer
entity over a calculated attribute in same entity.
-
Build a per-customer order count calculated attribute as above, call it
customer.customer_order_count
-
Build a simple metric in
customer
on it to take the average
Calculated attribute are not dynamic, and ignore user context such as chosen filters and groups.
Dynamic multi-level metrics can be built using dynamic grouping to be context sensitive at every level of aggregation.
I have an aggregation, is it a Calculated Attribute or a Metric?
Depends. Should it be dynamic to the user or a static attribute of an entity?
If the aggregation logic is reused for different GROUP BYs and filters, then make it a metric.And if the aggregation is reused to derive other metrics (like ratios), then make it a metric.
If the aggregation result per entity row is reused for computations, then its a calculated attribute.
When in doubt, make it a metric. It easy to convert a metric to a calculated attribute when needed.
YAML Schema
Every metric is backed by a text file in git that defines it, and keeps history of every change.
The schema for a metric is:
Fields:
entity
: Name of entity the attribute belongs toname
: Name of attributedisplay_name
,owner
,description
,labels
,folder
,hidden
: Metadatarollup
: Automatic rollup behavior (see aggregate awareness)datatype
: Data type of metric, as exposed to JDBC (number by default)sql
: SQL statement of the metric (must be aggregative)metadata
: Additional metadata for the metric (see examples for Tableau and Power BI)
For example, a metric counting done orders
Was this page helpful?