Introduction to Metrics

A metric is an aggregation that is counted on many rows (unique instances) of an entity.

Unlike attributes that are like a “virtual column” a metric has no direct equivalent in SQL: it is a standalone aggregation, without a GROUP BY.

Metrics are the most important building block of a semantic layer, as most business analytic use cases are centered around KPIs.

A metric is a highly reusable building block:

  • It can be sliced and diced in any way a user wants with a Dynamic Dataset or a BI tool using JDBC
  • It can be used to build new metrics that combine other metrics
  • It can be used to build new attributes that aggregate to a specific entity
  • It can used to build derived filtered metrics by applying a filter qualifier
  • It can used to build derived metrics that control grouping to create nested aggregations or dynamic ratio metrics

See examples below.

How Metrics and Entities relate?

An entity defines a granularity (like a customer or an order). A metric on an entity is aggregated on that granularity (like counting customers, or summing up orders).

The entity a metric is defined on is very important. Every time a metric is calculated Honeydew will make sure it is calculated at the right level.

In Honeydew, metrics must be associated with a granularity they are counted on.

This is different than a grouping (“Level of Detail”)! A count of orders can be monthly or weekly, by customer or by order type. But regardless to how it is grouped, it always counts orders.

Metadata

Metrics may include metadata such as their owner, business description, labels. See metadata section for more details.

Sensitivity to User Context

Metrics are functions that are sensitive to the user query context.

That means that the computed result of a metric would depend on the query - the filters and groups in use.

User context includes:

  • Filtering - filters can come from a user query or from a domain.
  • Grouping - groups can come from the user query.

For example, when using the SQL interface to query the semantic layer, the following query:

SELECT
    "orders.o_orderdate",
    SUM("lineitem.revenue")
FROM world.world
WHERE "customer.c_mktsegment" = 'MACHINERY'
GROUP BY 1

Applies the following context to the lineitem.revenue metric:

  1. Filter only customers in the machinery market segment
  2. Group them by order date

Revenue is thus calculated per date, only for machinery customers.

Metrics by default operate by the full user query context.

However, some metrics can change how they interact with the user context - for example, disregard a chosen grouping or add a filtering.

That is achieved by using metric qualifiers such as FILTER (WHERE ...) and GROUP BY. See more below.

Basic metric

A basic metric only depends on attributes in its entity.

For example, we can define a revenue metric on that counts the total revenue:

SUM(lineitems.price*(1-lineitems.discount))

This code is standard SQL. Anything in SQL that is allowed to be used in an aggregation (like functions, nested data unpacking, type conversions, etc..) just works.

It is a reusable aggregation, that can be grouped by, filtered, and reused by the user in different ways.

But unlike a calculated attribute, a metric is dynamic - it only has value with a grouping.

Things like are not possible in an SQL aggregation (like window functions) are not possible as-is in metric definition as well. However, any logic that is encapsuled in an attribute is possible - so it is possible to use a window function as part of a calculated attribute definition, and then aggregate on that.

Count or distinct count?

Metrics are automatically calculated of the level of their entity granularity key. When different metrics are combined, Honeydew will make sure to calculate each on its right granularity.

For example, consider a count metric in customers that is defined as COUNT(customers.custkey). Since this is defined on customers, it will always count distinct customers. There is no need to ask for COUNT(**DISTINCT** customers.custkey),

Furthermore Every aggregation (like SUM, MEDIAN, AVG, …) on customers will always be calculated on distinct customers automatically.

If you build COUNT(DISTINCT ...) aggregations on an ID field that has no corresponding entity, consider making a virtual entity for it and corresponding attributes. Then use its count metric instead.

This practice of creating new dimension tables from a fact table is sometimes called normalizing a schema. Separating denormalized fields to a separate virtual entity would allow to add more properties and metrics deduplicated to the grain of that ID.

Which attributes a metric can use?

A metric can use in its calculation every attribute from the same or lower granularity, similar to a calculated attribute. See the combining entities section of calculated attributes for details.

Derived Metrics

Like calculated attributes, metrics can also be reused as building blocks.

To reuse a metric, refer to it by name within an expression.

For example, can build a profit metric using the revenue metric:

-- profit metric
lineitem.revenue - sum(partsupp.ps_supplycost * lineitem.l_quantity)

The metric above takes the revenue metric as a function, and uses it as part of a new metric definition.

Break your derived metrics into named components for maximal reusability.

For example, can write the metric as two metrics:

  1. A cost_of_goods metric, defined as:
sum(partsupp.ps_supplycost * lineitem.l_quantity)
  1. A profit metric, defined as:
lineitem.revenue - lineitem.cost_of_goods

Default Value

What is the value of a metric when there is no data?

For example, the SUM for no data is typically expected to be 0, not NULL.

When a metric is aggregated within a group, there are a few cases which can be considered as having no data:

  1. Outer joins that create empty (NULL) data in rows where the JOIN has a valid key only on one side of the join.
  2. Filtered metrics that create groups that have no data due to the applied filter
  3. Missing data - when the values of all aggregated rows from the source data are NULL

Metrics as such would typically receive the value NULL. To support other default values, can use the DEFAULT qualifier:

-- set 0 as the default value for empty groups or missing data for revenue
SUM(lineitems.price*(1-lineitems.discount)) DEFAULT 0

The default value for a SUM or a COUNT aggregation is automatically 0. There is no need to explicitly set it up.

See more below on automatic default values.

This qualifier can also be applied directly to a metric when creating a derived metric:

-- set 0 as the default value for empty groups or missing data for revenue
order_lines.revenue DEFAULT 0

Can use any literal for default value and not only numbers. For example:

-- Comma-separated list of customer segments, concatenated
LISTAGG(customer.mgtsegment, ', ') DEFAULT 'Unknown'

Automatic Default Value

Honeydew automatically applies the default value of 0 for SUM and COUNT aggregations.

To disable the default value, use DEFAULT NULL:

-- revenue with no default value set
order_lines.revenue DEFAULT NULL

No Data vs. Missing Data

Honeydew treats a NULL result from a metric calculation the same as an empty group when determining a default value.

That means that missing data (all NULL rows) and no data (empty group) are handled the same way.

To handle only missing data, you can use COALESCE within the metric definition.

For example, this expression sets the average of all-NULL data to be 0:

COALESCE(AVG(entity.value), 0)

If all data in a group is NULL, the expression above will return a value of 0.

However, if there is no data in a group by, the value of the expression above would be NULL.

The reason: if there are no rows to aggregate then the expression will not be executed at all.

To deal with empty groups AND missing data use DEFAULT:

AVG(entity.value) DEFAULT 0

Changing User Context

By default, metrics are filtered and grouped according to the user query context.

However there are qualifiers that change the context:

  • A FILTER qualifier will change which rows are used for aggregation
  • A GROUP BY qualifier will change how the metric is grouped

Those qualifiers can be added to any aggregation, for example:

-- first promotion date as a filtered metric
MIN(orders.o_orderdate) FILTER (WHERE parts.type like 'PROMO%')

When a metric is reused with in a derived metric, can apply qualifiers to change how it behaves.

The equivalent derived metric (if orders.min_order_date is already defined):

-- first promotion date as a filtered metric
-- based on the min_order_date metric
orders.min_order_date FILTER (WHERE parts.type like 'PROMO%')

Derived metrics can behave differently from their parents.

Filtered metrics

In the TPC-H sample dataset, the attribute part.type that described the type of a part sold in the store will include the word “PROMO” is that part is a promotion.

Consider the following metric that calculates revenue without promotions called revenue_from_promotions:

lineitems.revenue FILTER (WHERE parts.type like 'PROMO%')

This is a filtered metric, that uses the revenue metric as a building block.

Note that this metric on lineitems used an attribute from parts. Metrics can span multiple entities! Any attribute of same or lower granularity can be used. See more in combining entities for calculated attributes

Filtered metrics with attribute lookup

What if the potential values for a filter come from an attribute, instead of being defined in the filtered metric expression?

For example, assume there is a table promotion_part_types that has a list of every promotion type. A promotion revenue is one that has a part type matching the list of values on that table.

This is called a lookup filter:

lineitems.revenue FILTER (WHERE parts.type IN promotion_part_types.value)

The expression above would generate a sub-query to get the values from promotion_part_types and then filter on the results of that sub-query.

Filters over the promotion_part_types entity can further reduce the possible values if they run before the metric computation (see filtering order for more details)

Filtered metrics in ratios

Metrics can be combined together, for example to create a ratio metric.

For example, the promo_revenue metric is defined as the ratio of revenue that is coming from promotions. Its code is:

100 * lineitems.revenue FILTER (WHERE parts.type like 'PROMO%') / lineitems.revenue

Or can use the revenue_from_promotions defined above:

100 * lineitems.revenue_from_promotions / lineitems.revenue

Time-Based Metrics

Most metrics operate on time and can build a time series or time related calculations (i.e. “year-to-date revenue”).

When building metrics, can define how a metric is related to a time spine. See time metrics for more details.

Controlling Grouping

A metric is normally grouped by the user definition (as set in a Dynamic Dataset or via a SQL query on JDBC).

But there are many cases where would want to reuse the same metric while controlling the grouping, such as:

  • Nested aggregations (monthly average daily customer count)
  • Ratio metrics (daily customer count as a ratio of monthly customers count)

And many others.

Grouping of a metric is sometimes called its “Level of Detail”.

Given a user context grouping (the GROUP BY as asked for by a user in a query), there are different ways to control grouping:

  1. From the user query context (the default)
  2. Fixed (use a preset group, ignoring what the user chose)
  3. Include (add a group to the user context grouping)
  4. Exclude (remove a group from the user context grouping)

See examples and use cases for each below:

Fixed Grouping

This metric is always grouped by the date

lineitems.revenue GROUP BY (orders.orderdate)

This is useful for ratio metrics such as “the revenue as a ratio of daily revenue”

lineitems.revenue / lineitems.revenue GROUP BY (orders.date)

Fixed Empty Grouping

A specific case of fixed grouping is to group by nothing with GROUP BY ().

For example this metric given the percentage of revenue out of all total revenue

lineitems.revenue / lineitems.revenue GROUP BY ()

The query above operates after user context filters, so that metric will give the percentage out of revenue after filtering.

For example, if used in a query that is grouped by customer segment and filtered for the year 2000 only, then the result would be the percentage of revenue per segment out of all segments, in 2000 only.

Calculated attributes using metrics with fixed grouping

Fixed grouping can be combined with calculated attributes (see above).

By default, a metric in a calculated attribute is grouped by the granularity of its entity.

However, can change this with an explicit GROUP BY.

An attribute in customer, is computed at the customer granularity:

lineitem.revenue

It is the same as defining it in customer grouped by the customer key

lineitem.revenue GROUP BY (customer.c_custkey)

But can remember per customer the metric aggregated by other groups:

lineitem.revenue GROUP BY (customer.c_mktsegment)

This will add to a customer a calculated attribute of the total revenue in the customer segment.

When creating calculated attributes with a grouping that is different than the entity key, note that the values will repeat with a different grain that the enclosing entity.

For example, when grouping by customer market segment in the customer entity, then all customers in the machinery market segment will have the same revenue.

This can be dangerous if used for a nested aggregation - for example if need the average revenue by market segment calculating it over a field in the customer entity would result in a biased result.

See below how to build nested aggregations correctly.

Dynamic Grouping: Removing Groups

By default, a metric is grouped by the user query context.

With the GROUP BY (NOT ...) qualifier can remove a group from the context.

For example,

lineitem.revenue GROUP BY (NOT customer.c_custkey)

Removes grouping per customer key.

Private preview functionality: remove all the groupings on an entity.

lineitem.revenue GROUP BY (NOT customer.*)

Dynamic Grouping: Adding Groups and Nested Aggregation

It is possible to add to the user context groups additional groups, like this lineitems.daily_revenue metric:

lineitems.revenue GROUP BY (*, orders.orderdate)

This would be grouped by user choice + the date of the order.

The main use case for inclusive grouping is nested aggregation.

For example, given daily_revenue, this metric calculates the average daily revenue, grouped to the user context:

AVG(lineitems.daily_revenue)

The way it works:

  1. Given a user context grouping (for example customer segment)
  2. The inner aggregation is grouped by user choice AND date. Here that would be revenue by date and customer segment.
  3. The outer aggregation (AVG) would be grouped by user choice only. The AVG will average over dates, while keeping the customer segment grouping.

The expression above can be built as a single expression:

AVG(lineitems.revenue GROUP BY (*, orders.orderdate))

Note that aggregations can be nested more than once, changing groups at every step.

Calculated attributes using metrics

When a metric or an aggregation is used in a calculated attribute it is computed at the granularity of the entity.

For example, you may want to look for each customer its total lifetime revenue. There is a revenue metric in lineitems that sums up the revenue. To project it to a specific customer, make a calculated attribute in customers with the following SQL:

lineitems.revenue

When it is a calculated attribute it means “For each row of customer that has a unique customer id, calculate the total lineitems.revenue”.

Once a metric or aggregation is referenced in a calculated attribute it loses its dynamic behavior. The total_lifetime_revenue of a customer is the same regardless of the user query.

Most use cases that involve multiple computation steps will be calculated attributes on metrics.

Another example would be to have the first order date of a customer as one its properties.

For that, build as a calculated attribute in customer entity.

MIN(orders.o_orderdate)

Can reference a metric by name (such as lineitems.revenue) or just write the aggregation SQL. Both work!

Usually would make a named metric if it is reusable in other scenarios, and an ad-hoc aggregation expression otherwise.

Multi-level Metrics using Calculated Attributes

Metrics and calculated attributes can be used in sequence. For example. a metric to count the average orders of a customer.

Build as a metric in customer entity over a calculated attribute in same entity.

  1. Build a per-customer order count calculated attribute as above, call it customer.customer_order_count

    orders.count
    
  2. Build a simple metric in customer on it to take the average

    AVG(customer.customer_order_count)
    

Calculated attribute are not dynamic, and ignore user context such as chosen filters and groups.

Dynamic multi-level metrics can be built using dynamic grouping to be context sensitive at every level of aggregation.

I have an aggregation, is it a Calculated Attribute or a Metric?

Depends. Should it be dynamic to the user or a static attribute of an entity?

If the aggregation logic is reused for different GROUP BYs and filters, then make it a metric.And if the aggregation is reused to derive other metrics (like ratios), then make it a metric.

If the aggregation result per entity row is reused for computations, then its a calculated attribute.

When in doubt, make it a metric. It easy to convert a metric to a calculated attribute when needed.

YAML Schema

Every metric is backed by a text file in git that defines it, and keeps history of every change.

The schema for a metric is:

type: metric
entity: [entity name]
name: [metric name]
display_name: [display name]
owner: [owner]
labels: [...]
folder: [folder]
hidden: [True/False/Yes/No]
rollup: [sum/min/max/.../no_rollup]
desciption: |-
  [description]
datatype: [datatype - number/string/bool/...]
sql: |-
  [sql statement]

Fields:

  • entity: Name of entity the attribute belongs to
  • name: Name of attribute
  • display_name, owner, description, labels, folder, hidden: Metadata
  • rollup: Automatic rollup behavior (see aggregate awareness)
  • datatype: Data type of metric, as exposed to JDBC (number by default)
  • sql: SQL statement of the metric (must be aggregative)
  • metadata: Additional metadata for the metric (see examples for Tableau and Power BI)

For example, a metric counting done orders

type: metric
entity: orders
name: count_done_orders
datatype: number
sql: |-
  COUNT(*) FILTER (WHERE orders.o_orderstatus = 'O')

Was this page helpful?