Introduction to Metrics
A metric is an aggregation that is counted on many rows (unique instances) of an entity.
Unlike attributes that are like a “virtual column” a metric has no direct equivalent in SQL: it is a standalone aggregation, without a
Metrics are the most important building block of a semantic layer, as most business analytic use cases are centered around KPIs.
A metric is a highly reusable building block:
- It can be sliced and diced in any way a user wants with a Dynamic Dataset or a BI tool using JDBC
- It can be used to build new metrics that combine other metrics
- It can be used to build new attributes that aggregate to a specific entity
- It can used to build derived filtered metrics by applying a filter qualifier
- It can used to build derived metrics that change a level of detail by applying a partial grouping qualifier
See examples below.
How Metrics and Entities relate?
An entity defines a granularity (like a customer or an order). A metric on an entity is aggregated on that granularity (like counting customers, or summing up orders).
The entity a metric is defined on is very important. Every time a metric is calculated Honeydew will make sure it is calculated at the right level.
In Honeydew, metrics must be associated with a granularity they are counted on.
This is different than a grouping (“Level of Detail”)! A count of
orders can be monthly or weekly, by customer or by order type. But regardless to how it is grouped, it always counts orders.
Metrics may include metadata such as their owner, business description, labels. See metadata section for more details.
Every metric is backed by a text file in git that defines it, and keeps history of every change. See schema for more details on the underlying representation.
A basic metric only depends on attributes in its entity.
For example, we can define a
revenue metric on that counts the total revenue:
This code is standard SQL. Anything in SQL that is allowed to be used in an aggregation (like functions, nested data unpacking, type conversions, etc..) just works.
It is a reusable aggregation, that can be grouped by, filtered, and reused by the user in different ways.
But unlike a calculated attribute, a metric is dynamic - it only has value with a grouping.
Things like are not possible in an SQL aggregation (like window functions) are not possible as-is in metric definition as well. However, any logic that is encapsuled in an attribute is possible - so it is possible to use a window function as part of a calculated attribute definition, and then aggregate on that.
Count or distinct count?
Metrics are automatically calculated of the level of their entity granularity key. When different metrics are combined, Honeydew will make sure to calculate each on its right granularity.
For example, consider a
count metric in
customers that is defined as
COUNT(customers.custkey). Since this is defined on customers, it will always count distinct customers. There is no need to ask for
Furthermore Every aggregation (like
AVG, …) on
customers will always be calculated on distinct customers automatically.
Which attributes a metric can use?
A metric can use in its calculation every attribute from the same or lower granularity, similar to a calculated attribute. See the combining entities section of calculated attributes for details.
Calculated attributes using metrics
When a metric or an aggregation is used in a calculated attribute it is computed at the granularity of the entity.
For example, you may want to look for each customer its total lifetime revenue. There is a
revenue metric in
lineitems that sums up the revenue.
To project it to a specific customer, make a calculated attribute in
customers with the following SQL:
When it is a calculated attribute it means “For each row of customer that has a unique customer id, calculate the total
Once a metric or aggregation is referenced in a calculated attribute it loses its dynamic behavior. The
total_lifetime_revenue of a customer is the same regardless of the user query.
Most use cases that involve multiple computation steps will be calculated attributes on metrics.
Another example would be to have the first order date of a customer as one its properties.
For that, build as a calculated attribute in
Can reference a metric by name (such as
lineitems.revenue) or just write the aggregation SQL. Both work!
Usually would make a named metric if it is reusable in other scenarios, and an ad-hoc aggregation expression otherwise.
Multi-level metrics change granularities. For example. a metric to count the average orders of a customer.
Build as a metric in
customer entity over a calculated attribute in same entity.
Build a per-customer order count calculated attribute as above, call it
Build a simple metric in
customeron it to take the average
Like calculated attributes, metrics can also be reused as building blocks.
To reuse a metric, refer to it by name within an expression. When a metric is reused, can apply qualifiers to change how it behaves:
FILTERqualifier will change which rows are used for aggregation
GROUP BYqualifier will change how the metric is grouped
Using qualifiers allows to easily build very complex business calculations.
In the TPC-H sample dataset, the attribute
part.type that described the type of a part sold in the store will include the word “PROMO” is that part is a promotion.
Consider the following metric that calculates revenue without promotions called
lineitems.revenue FILTER (WHERE parts.type like 'PROMO%')
This is a filtered metric, that uses the revenue metric as a building block.
Note that this metric on
lineitems used an attribute from
can span multiple entities! Any attribute of same or lower granularity can be
used. See more in combining entities for calculated
Metrics can be combined together, for example to create a ratio metric.
For example, the
promo_revenue metric is defined as the ratio of revenue that is coming from promotions. Its code is:
100 * lineitems.revenue FILTER (WHERE parts.type like 'PROMO%') / lineitems.revenue
Or can use the
revenue_from_promotions defined above:
100 * lineitems.revenue_from_promotions / lineitems.revenue
Controlling Level of Detail
A metric is normally grouped by the user definition (as set in a Dynamic Dataset or via a SQL query on JDBC).
But there are many cases where would want to reuse the same metric while controlling the level of detail, such as:
- Nested aggregations (monthly average daily customer count)
- Ratio metrics (daily customer count as a ratio of monthly customers count)
And many others.
Some BI systems call “Metric Dimensionality” or “Multi-Level metrics”. In Honeydew, a metric has “Granularity” (the entity it is counted on) and “Level of Detail” (the grouping).
There are different ways to control level of detail:
- User set (the default of metric)
- Fixed (use a preset group, ignoring what the user chose)
- Include (add user grouping to a preset group)
- Exclude (remove user grouping from a preset group)
See examples and use cases for each below:
Fixed Level of Detail
This metric is always grouped by the date
lineitems.revenue GROUP BY (orders.orderdate)
This is useful for ratio metrics such as “the revenue as a ratio of daily revenue”
lineitems.revenue / lineitems.revenue GROUP BY (orders.date)
Calculated attributes using metrics with fixed level of detail
Fixed level of detail can be combined with calculated attributes (see above).
By default, a metric in a calculated attribute is grouped by the granularity of its entity.
However, can change this with an explicit
This, as an attribute in customer, is computed at the customer granularity:
That is, it is the same as defining it grouped by the customer key
lineitem.revenue GROUP BY (customer.c_custkey)
But can remember per customer the metric aggregated by other groups:
lineitem.revenue GROUP BY (customer.c_mktsegment)
This will add to a customer a calculated attribute of the total revenue in the customer segment.
It can be further used, for example
-- customer revenue as ratio of revenue in segment 100.0*lineitem.revenue GROUP BY (customer.c_custkey) / lineitem.revenue GROUP BY (customer.c_mktsegment)
Dynamic Level of Detail
By default, a metric is grouped by user choice.
But it is possible to mix user-set groups with fixed groups, like this
lineitems.revenue GROUP BY (*, orders.orderdate)
This would be grouped by user choice + the date of the order
This is useful for multi-level aggregative metric such as “average daily revenue”
This metric can then be grouped by the user query (for example to get average daily revenue by customer segment)
The way it works is that the outer aggregation (
AVG) would be grouped by user choice (customer segment), and the inner aggregation by user choice AND date. The AVG will average our over dates, while keeping the grouping selection the user has provided (customer segment).
I have an aggregation, is it a Calculated Attribute or a Metric?
Depends. Should it be dynamic to the user or a static attribute of an entity?
If the aggregation logic is reused for different GROUP BYs and filters, then make it a metric.And if the aggregation is reused to derive other metrics (like ratios), then make it a metric.
If the aggregation result per entity row is reused for computations, then its a calculated attribute.
When in doubt, make it a metric. It easy to convert a metric to a calculated attribute when needed.