Introduction to Metrics

A metric is an aggregation that is counted on many rows (unique instances) of an entity.

Unlike attributes that are like a “virtual column” a metric has no direct equivalent in SQL: it is a standalone aggregation, without a GROUP BY.

Metrics are the most important building block of a semantic layer, as most business analytic use cases are centered around KPIs.

A metric is a highly reusable building block:

  • It can be sliced and diced in any way a user wants with a Dynamic Dataset or a BI tool using JDBC
  • It can be used to build new metrics that combine other metrics
  • It can be used to build new attributes that aggregate to a specific entity
  • It can used to build derived filtered metrics by applying a filter qualifier
  • It can used to build derived metrics that change a level of detail by applying a partial grouping qualifier

See examples below.

How Metrics and Entities relate?

An entity defines a granularity (like a customer or an order). A metric on an entity is aggregated on that granularity (like counting customers, or summing up orders).

The entity a metric is defined on is very important. Every time a metric is calculated Honeydew will make sure it is calculated at the right level.

In Honeydew, metrics must be associated with a granularity they are counted on.

This is different than a grouping (“Level of Detail”)! A count of orders can be monthly or weekly, by customer or by order type. But regardless to how it is grouped, it always counts orders.

Metadata

Metrics may include metadata such as their owner, business description, labels. See metadata section for more details.

Basic metric

A basic metric only depends on attributes in its entity.

For example, we can define a revenue metric on that counts the total revenue:

SUM(lineitems.price*(1-lineitems.discount))

This code is standard SQL. Anything in SQL that is allowed to be used in an aggregation (like functions, nested data unpacking, type conversions, etc..) just works.

It is a reusable aggregation, that can be grouped by, filtered, and reused by the user in different ways.

But unlike a calculated attribute, a metric is dynamic - it only has value with a grouping.

Things like are not possible in an SQL aggregation (like window functions) are not possible as-is in metric definition as well. However, any logic that is encapsuled in an attribute is possible - so it is possible to use a window function as part of a calculated attribute definition, and then aggregate on that.

Count or distinct count?

Metrics are automatically calculated of the level of their entity granularity key. When different metrics are combined, Honeydew will make sure to calculate each on its right granularity.

For example, consider a count metric in customers that is defined as COUNT(customers.custkey). Since this is defined on customers, it will always count distinct customers. There is no need to ask for COUNT(**DISTINCT** customers.custkey),

Furthermore Every aggregation (like SUM, MEDIAN, AVG, …) on customers will always be calculated on distinct customers automatically.

Which attributes a metric can use?

A metric can use in its calculation every attribute from the same or lower granularity, similar to a calculated attribute. See the combining entities section of calculated attributes for details.

Calculated attributes using metrics

When a metric or an aggregation is used in a calculated attribute it is computed at the granularity of the entity.

For example, you may want to look for each customer its total lifetime revenue. There is a revenue metric in lineitems that sums up the revenue. To project it to a specific customer, make a calculated attribute in customers with the following SQL:

lineitems.revenue

When it is a calculated attribute it means “For each row of customer that has a unique customer id, calculate the total lineitems.revenue“.

Once a metric or aggregation is referenced in a calculated attribute it loses its dynamic behavior. The total_lifetime_revenue of a customer is the same regardless of the user query.

Most use cases that involve multiple computation steps will be calculated attributes on metrics.

Another example would be to have the first order date of a customer as one its properties.

For that, build as a calculated attribute in customer entity.

MIN(orders.o_orderdate)

Can reference a metric by name (such as lineitems.revenue) or just write the aggregation SQL. Both work!

Usually would make a named metric if it is reusable in other scenarios, and an ad-hoc aggregation expression otherwise.

Multi-level Metrics

Multi-level metrics change granularities. For example. a metric to count the average orders of a customer.

Build as a metric in customer entity over a calculated attribute in same entity.

  1. Build a per-customer order count calculated attribute as above, call it customer.customer_order_count

    orders.count
    
  2. Build a simple metric in customer on it to take the average

    AVG(customer.customer_order_count)
    

Derived Metrics

Like calculated attributes, metrics can also be reused as building blocks.

To reuse a metric, refer to it by name within an expression. When a metric is reused, can apply qualifiers to change how it behaves:

  • A FILTER qualifier will change which rows are used for aggregation
  • A GROUP BY qualifier will change how the metric is grouped

Using qualifiers allows to easily build very complex business calculations.

Filtered metrics

In the TPC-H sample dataset, the attribute part.type that described the type of a part sold in the store will include the word “PROMO” is that part is a promotion.

Consider the following metric that calculates revenue without promotions called revenue_from_promotions:

lineitems.revenue FILTER (WHERE parts.type like 'PROMO%')

This is a filtered metric, that uses the revenue metric as a building block.

Note that this metric on lineitems used an attribute from parts. Metrics can span multiple entities! Any attribute of same or lower granularity can be used. See more in combining entities for calculated attributes

Combining metrics

Metrics can be combined together, for example to create a ratio metric.

For example, the promo_revenue metric is defined as the ratio of revenue that is coming from promotions. Its code is:

100 * lineitems.revenue FILTER (WHERE parts.type like 'PROMO%') / lineitems.revenue

Or can use the revenue_from_promotions defined above:

100 * lineitems.revenue_from_promotions / lineitems.revenue

Time-Based Metrics

Most metrics operate on time and can build a time series or time related calculations (i.e. “year-to-date revenue”).

When building metrics, can define how a metric is related to a time spine. See time metrics for more details.

Controlling Level of Detail

A metric is normally grouped by the user definition (as set in a Dynamic Dataset or via a SQL query on JDBC).

But there are many cases where would want to reuse the same metric while controlling the level of detail, such as:

  • Nested aggregations (monthly average daily customer count)
  • Ratio metrics (daily customer count as a ratio of monthly customers count)

And many others.

Some BI systems call “Metric Dimensionality” or “Multi-Level metrics”. In Honeydew, a metric has “Granularity” (the entity it is counted on) and “Level of Detail” (the grouping).

There are different ways to control level of detail:

  1. User set (the default of metric)
  2. Fixed (use a preset group, ignoring what the user chose)
  3. Include (add user grouping to a preset group)
  4. Exclude (remove user grouping from a preset group)

See examples and use cases for each below:

Fixed Level of Detail

This metric is always grouped by the date

lineitems.revenue GROUP BY (orders.orderdate)

This is useful for ratio metrics such as “the revenue as a ratio of daily revenue”

lineitems.revenue / lineitems.revenue GROUP BY (orders.date)

Calculated attributes using metrics with fixed level of detail

Fixed level of detail can be combined with calculated attributes (see above).

By default, a metric in a calculated attribute is grouped by the granularity of its entity.

However, can change this with an explicit GROUP BY.

This, as an attribute in customer, is computed at the customer granularity:

lineitem.revenue

That is, it is the same as defining it grouped by the customer key

lineitem.revenue GROUP BY (customer.c_custkey)

But can remember per customer the metric aggregated by other groups:

lineitem.revenue GROUP BY (customer.c_mktsegment)

This will add to a customer a calculated attribute of the total revenue in the customer segment.

It can be further used, for example

-- customer revenue as ratio of revenue in segment
100.0*lineitem.revenue GROUP BY (customer.c_custkey) / lineitem.revenue GROUP BY (customer.c_mktsegment)

Dynamic Level of Detail

By default, a metric is grouped by user choice.

But it is possible to mix user-set groups with fixed groups, like this lineitems.daily_revenue metric:

lineitems.revenue GROUP BY (*, orders.orderdate)

This would be grouped by user choice + the date of the order

This is useful for multi-level aggregative metric such as “average daily revenue”

AVG(lineitems.daily_revenue)

This metric can then be grouped by the user query (for example to get average daily revenue by customer segment)

The way it works is that the outer aggregation (AVG) would be grouped by user choice (customer segment), and the inner aggregation by user choice AND date. The AVG will average our over dates, while keeping the grouping selection the user has provided (customer segment).

I have an aggregation, is it a Calculated Attribute or a Metric?

Depends. Should it be dynamic to the user or a static attribute of an entity?

If the aggregation logic is reused for different GROUP BYs and filters, then make it a metric.And if the aggregation is reused to derive other metrics (like ratios), then make it a metric.

If the aggregation result per entity row is reused for computations, then its a calculated attribute.

When in doubt, make it a metric. It easy to convert a metric to a calculated attribute when needed.

YAML Schema

Every metric is backed by a text file in git that defines it, and keeps history of every change.

The schema for a metric is:

type: metric
entity: [entity name]
name: [metric name]
display_name: [display name]
owner: [owner]
labels: [...]
folder: [folder]
hidden: [True/False/Yes/No]
desciption: |-
  [description]
datatype: [datatype - number/string/bool/...]
sql: |-
  [sql statement]

Fields:

  • entity: Name of entity the attribute belongs to
  • name: Name of attribute
  • display_name, owner, description, labels, folder, hidden: Metadata
  • datatype: Data type of metric, as exposed to JDBC (number by default)
  • sql: SQL statement of the metric (must be aggregative)

For example, a metric counting done orders

type: metric
entity: orders
name: count_done_orders
datatype: number
sql: |-
  COUNT(*) FILTER (WHERE orders.o_orderstatus = 'O')