Introduction

A shared (also called conformed) dimension is a dimension that has the same meaning to every fact with which it relates.

For example, a user entity might appear in different facts:

  • logins fact table describing login events
  • visits fact table describing visits events

A possible query would be to get the count of visit and login events per user, counting separately on two facts and then joining on the user id. Or get a list of all user events (login and visit) together, requiring to filter the facts and then unioning the result of two facts.

Honeydew automatically detects shared dimensions and supports building the correct queries automatically.

You can see them in the entity graph as entities that have more than one arrow of a higher granularity pointing to them.

The dynamic dataset for counting visits and logins per user will look like:

  • Attributes: user.id, user.name
  • Metrics: logins.count, visits.count

Honeydew engine will detect that user is a shared dimension and build the query accordingly.

Not every combination of fields is possible with shared dimensions.

For example, a user can have both associated visits and associated logins from two fact tables that conform to user. However, without including a user, a visit by itself does not have related login events (a many to many relationship). Thus a query that asks for count of logins per visit is impossible to compute. Including it in a dynamic dataset will lead to failure.

Metrics that combine multiple facts

A common use case with shared dimensions is querying one fact based on information from another fact with a connected shared dimension.

For example, would want to build a metric that counts the visits of all users that had a login last month.

Since there is no direct connection between visits and logins to join on, need to build an intermediate field per user in the users entity that takes data from the logins fact, so it can be used later on when joining users with visits.

To implement the question above:

  1. Build a calculated attribute in the users shared entity named logins_last_month defined as

    logins.count FILTER(WHERE logins.time >= CURRENT_DATE() - 30)
    

    This counts per user the logins (assuming logins has a count metric) in the last month (assuming a time attribute), at the granularity of a user (since the calculated attribute is part of the user entity). See also how metrics behave in calculated attributes.

  2. Can build a dynamic dataset that uses it as a filter to get list of visits to test:

    • Attributes of visits such as visits.visit_id.
    • The filter users.logins_last_month > 0.

Since that dataset only combines users with visits, it is possible to compute.

  1. Build a reusable metric in visits called count_visits_of_recently_logged_in_users defined as

    visits.count FILTER(WHERE users.logins_last_month > 0)
    

    This is a reusable metric based on data from two facts, counting first at the login fact granularity (logins last month) and then at the visit fact granularity (visits of users that logged in last month).