Relations
Introduction to relations
Usually, there is more than one entity in data. For example, TPC-H example has few “natural” entities in the data: customers, orders, line items of an order and parts in a line item.
Relations have directions - a customer may have many orders. Each order may have many line items.
A Relation in Honeydew connects entities. It can be a 1:1 or a 1:many relationship.
The “1” side of a relationship always goes to a Granularity Key of an entity.
For example, orders
and customers
are connected: custkey
of orders
to custkey
of customers
. custkey
is the granularity key of customers
.
Once there is a relationship defined, you can look at data together. For example, you may want to see all the orders of a customer. Honeydew will know how to connect the two.
Relationships control:
- How JOINs are performed between entities
- How filters on entities affect other entities
- How aggregations are performed across entities
Honeydew engine assumes that data fits the relations configuration:
- 1:1 relations are unique on both sides
- 1:many relations are unique on one side
It is a good practice to test the relationship 1 side column(s) for uniqueness during data preparation.
Join Properties
Join Type
Following types are supported:
- Inner
- Left
- Right
- Full Outer
By default, if not configured otherwise, Honeydew connects entities via LEFT join between many side (for example a fact table) and the 1 side (for example a dimension table)
Direction
Relationships typically convert to JOINs when are used to compute semantics.
The direction (1:many) affects how calculations are performed after the join, by telling the engine which rows are duplicated.
See mixing granularities for more information on how that is handled.
Honeydew does not support 1:1 relationships. When connecting entities with 1:1 relation choose a many:1 direction between them, and connect any other shared dimensions between both to the many side.
If a 1:1 semantics are a must then create a joint entity (can be done using custom SQL) and map that instead.
Connecting Columns
Relationship must include the choice for source and destination columns. Can define more than one for a composite join connection.
Honeydew validates that the columns match the join direction - i.e. the 1 side of a relationship must be a key.
Union
Honeydew does not support directly a UNION connection between entities.
If need to connect tables A and B, create a custom SQL query source data with the following sql:
Cross-Filtering
What happens when you query attributes from one entity (customers.name
), while filtering on another
entity (parts.brand = 'Brand#55'
) ? Would you get the list of customers that had orders of parts of that brand,
or would you get all customers?
The answer is based on whether a filter from parts
“flows” the path of parts
→ order_lines
→ orders
→ customers
or not. Each relationship can control in what directions can filters pass through (“cross-filter”).
Following settings are supported (given a 1:many relationship):
- Both directions (default)
- From one to many
- From many to one
- Not at all
Filters from many to one have a higher performance impact, as they can require more JOINs to execute.
Default Cross Filtering
Cross filtering options depend on join type:
Join Type (Between many side to 1 side) | Cross filtering options | Default |
---|---|---|
INNER, RIGHT | Both, One-to-many | Both |
LEFT, OUTER | Both, One-to-many, Many-to-one, None | Both |
Example Use Cases
Any supported combination can be used. Common use cases include:
-
Users look at data from dimensions and facts using filters on both: use
both
cross-filtering andLEFT
join type.In that case every filter used by the user will propagate to any entity regardless on where it comes from.
-
Same, but with referential integrity assumption (i.e. it is never desired to count orders of a non existing customer): use
both
cross-filtering andINNER
join type.In that case every filter used by the user will propagate to any entity regardless on where it comes from. Additionally, any values that do not exist in a dimension will be removed from the fact when joined.
-
Users look at metrics based on a fact table, and use a dimension entity only as filter or a group by: use
one-to-many
cross- filtering andRIGHT
join direction.In that all the values of that dimension always present in a group by, and can filter the fact.
-
Dimension entity that is only as part of a filtered metric (typically with attribute lookup): use
none
andOUTER
so no actions on either fact or dimension impact values directly, and only filter the metric.
Modeling many:many relations
Honeydew does not support a many:many relationship . However a many:many relationship is possible by building a connecting table (sometimes called a “junction table”):
For example authors and books is a many:many relation (authors may write many books, and a book may have multiple authors).
To model it, create a connection table that has as key book id and author id, and connects to both
Metrics and calculations that refer to both can be added on the connection table.
YAML Schema
Relations are defined in the entity YAML.
The schema for relation subsection is:
Fields:
target_entity
: entity to connect toconnection
: a list of fields for a join condition. Must fully cover the primary key of the entity on the “one” side of the relation.rel_type
: logical direction of the relationshiprel_join_type
(optional): the type of default join to applycross_filtering
(optional): the cross-filtering configuration to apply
Was this page helpful?