Metrics
Introduction to Metrics
A metric is an aggregation that is counted on many rows (unique instances) of an entity.
Unlike attributes that are like a “virtual column” a metric has no direct equivalent in SQL: it is a standalone aggregation, without a GROUP BY
.
Metrics are the most important building block of a semantic layer, as most business analytic use cases are centered around KPIs.
A metric is a highly reusable building block:
- It can be sliced and diced in any way a user wants with a Dynamic Dataset or a BI tool using JDBC
- It can be used to build new metrics that combine other metrics
- It can be used to build new attributes that aggregate to a specific entity
- It can used to build derived filtered metrics by applying a filter qualifier
- It can used to build derived metrics that change a level of detail by applying a partial grouping qualifier
See examples below.
How Metrics and Entities relate?
An entity defines a granularity (like a customer or an order). A metric on an entity is aggregated on that granularity (like counting customers, or summing up orders).
The entity a metric is defined on is very important. Every time a metric is calculated Honeydew will make sure it is calculated at the right level.
In Honeydew, metrics must be associated with a granularity they are counted on.
This is different than a grouping (“Level of Detail”)! A count of orders
can be monthly or weekly, by customer or by order type. But regardless to how it is grouped, it always counts orders.
Metadata
Metrics may include metadata such as their owner, business description, labels. See metadata section for more details.
Basic metric
A basic metric only depends on attributes in its entity.
For example, we can define a revenue
metric on that counts the total revenue:
This code is standard SQL. Anything in SQL that is allowed to be used in an aggregation (like functions, nested data unpacking, type conversions, etc..) just works.
It is a reusable aggregation, that can be grouped by, filtered, and reused by the user in different ways.
But unlike a calculated attribute, a metric is dynamic - it only has value with a grouping.
Things like are not possible in an SQL aggregation (like window functions) are not possible as-is in metric definition as well. However, any logic that is encapsuled in an attribute is possible - so it is possible to use a window function as part of a calculated attribute definition, and then aggregate on that.
Count or distinct count?
Metrics are automatically calculated of the level of their entity granularity key. When different metrics are combined, Honeydew will make sure to calculate each on its right granularity.
For example, consider a count
metric in customers
that is defined as COUNT(customers.custkey)
. Since this is defined on customers, it will always count distinct customers. There is no need to ask for COUNT(**DISTINCT** customers.custkey)
,
Furthermore Every aggregation (like SUM
, MEDIAN
, AVG
, …) on customers
will always be calculated on distinct customers automatically.
Which attributes a metric can use?
A metric can use in its calculation every attribute from the same or lower granularity, similar to a calculated attribute. See the combining entities section of calculated attributes for details.
Calculated attributes using metrics
When a metric or an aggregation is used in a calculated attribute it is computed at the granularity of the entity.
For example, you may want to look for each customer its total lifetime revenue. There is a revenue
metric in lineitems
that sums up the revenue.
To project it to a specific customer, make a calculated attribute in customers
with the following SQL:
When it is a calculated attribute it means “For each row of customer that has a unique customer id, calculate the total lineitems.revenue
”.
Once a metric or aggregation is referenced in a calculated attribute it loses its dynamic behavior. The total_lifetime_revenue
of a customer is the same regardless of the user query.
Most use cases that involve multiple computation steps will be calculated attributes on metrics.
Another example would be to have the first order date of a customer as one its properties.
For that, build as a calculated attribute in customer
entity.
Can reference a metric by name (such as lineitems.revenue
) or just write the aggregation SQL. Both work!
Usually would make a named metric if it is reusable in other scenarios, and an ad-hoc aggregation expression otherwise.
Multi-level Metrics
Multi-level metrics change granularities. For example. a metric to count the average orders of a customer.
Build as a metric in customer
entity over a calculated attribute in same entity.
-
Build a per-customer order count calculated attribute as above, call it
customer.customer_order_count
-
Build a simple metric in
customer
on it to take the average
Derived Metrics
Like calculated attributes, metrics can also be reused as building blocks.
To reuse a metric, refer to it by name within an expression. When a metric is reused, can apply qualifiers to change how it behaves:
- A
FILTER
qualifier will change which rows are used for aggregation - A
GROUP BY
qualifier will change how the metric is grouped
Using qualifiers allows to easily build very complex business calculations.
Filtered metrics
In the TPC-H sample dataset, the attribute part.type
that described the type of a part sold in the store will include the word “PROMO” is that part is a promotion.
Consider the following metric that calculates revenue without promotions called revenue_from_promotions
:
This is a filtered metric, that uses the revenue metric as a building block.
Note that this metric on lineitems
used an attribute from parts
. Metrics
can span multiple entities! Any attribute of same or lower granularity can be
used. See more in combining entities for calculated
attributes
Filtered metrics with attribute lookup
What if the potential values for a filter come from an attribute, instead of being defined in the filtered metric expression?
For example, assume there is a table promotion_part_types
that has a list of every promotion type.
A promotion revenue is one that has a part type matching the
list of values on that table.
This is called a lookup filter:
The expression above would generate a sub-query to get the values from promotion_part_types
and
then filter on the results of that sub-query.
Filters over the promotion_part_types
entity can further reduce
the possible values if they run before the metric computation (see
filtering order for more details)
Combining metrics
Metrics can be combined together, for example to create a ratio metric.
For example, the promo_revenue
metric is defined as the ratio of revenue that is coming from promotions. Its code is:
Or can use the revenue_from_promotions
defined above:
Time-Based Metrics
Most metrics operate on time and can build a time series or time related calculations (i.e. “year-to-date revenue”).
When building metrics, can define how a metric is related to a time spine. See time metrics for more details.
Controlling Level of Detail
A metric is normally grouped by the user definition (as set in a Dynamic Dataset or via a SQL query on JDBC).
But there are many cases where would want to reuse the same metric while controlling the level of detail, such as:
- Nested aggregations (monthly average daily customer count)
- Ratio metrics (daily customer count as a ratio of monthly customers count)
And many others.
Some BI systems call “Metric Dimensionality” or “Multi-Level metrics”. In Honeydew, a metric has “Granularity” (the entity it is counted on) and “Level of Detail” (the grouping).
There are different ways to control level of detail:
- User set (the default of metric)
- Fixed (use a preset group, ignoring what the user chose)
- Include (add user grouping to a preset group)
- Exclude (remove user grouping from a preset group)
See examples and use cases for each below:
Fixed Level of Detail
This metric is always grouped by the date
This is useful for ratio metrics such as “the revenue as a ratio of daily revenue”
Calculated attributes using metrics with fixed level of detail
Fixed level of detail can be combined with calculated attributes (see above).
By default, a metric in a calculated attribute is grouped by the granularity of its entity.
However, can change this with an explicit GROUP BY
.
This, as an attribute in customer, is computed at the customer granularity:
That is, it is the same as defining it grouped by the customer key
But can remember per customer the metric aggregated by other groups:
This will add to a customer a calculated attribute of the total revenue in the customer segment.
It can be further used, for example
Dynamic Level of Detail
By default, a metric is grouped by user choice.
But it is possible to mix user-set groups with fixed groups, like this lineitems.daily_revenue
metric:
This would be grouped by user choice + the date of the order
This is useful for multi-level aggregative metric such as “average daily revenue”
This metric can then be grouped by the user query (for example to get average daily revenue by customer segment)
The way it works is that the outer aggregation (AVG
) would be grouped by user choice (customer segment), and the inner aggregation by user choice AND date. The AVG will average our over dates, while keeping the grouping selection the user has provided (customer segment).
I have an aggregation, is it a Calculated Attribute or a Metric?
Depends. Should it be dynamic to the user or a static attribute of an entity?
If the aggregation logic is reused for different GROUP BYs and filters, then make it a metric.And if the aggregation is reused to derive other metrics (like ratios), then make it a metric.
If the aggregation result per entity row is reused for computations, then its a calculated attribute.
When in doubt, make it a metric. It easy to convert a metric to a calculated attribute when needed.
YAML Schema
Every metric is backed by a text file in git that defines it, and keeps history of every change.
The schema for a metric is:
Fields:
entity
: Name of entity the attribute belongs toname
: Name of attributedisplay_name
,owner
,description
,labels
,folder
,hidden
: Metadatarollup
: Automatic rollup behavior (see aggregate awareness)datatype
: Data type of metric, as exposed to JDBC (number by default)sql
: SQL statement of the metric (must be aggregative)metadata
: Additional metadata for the metric (see examples for Tableau and Power BI)
For example, a metric counting done orders
Was this page helpful?