Introduction to relations
Usually, there is more than one entity in data. For example, TPC-H example has few “natural” entities in the data: customers, orders, line items of an order and parts in a line item.
Relations have directions - a customer may have many orders. Each order may have many line items.
A Relation in Honeydew connects entities. It can be a 1:1 or a 1:many relationship.
The “1” side of a relationship always goes to a Granularity Key of an entity.
customers are connected:
custkey is the granularity key of
Once there is a relationship defined, you can look at data together. For example, you may want to see all the orders of a customer. Honeydew will know how to connect the two.
Honeydew engine assumes data corresponds to the relations configuration:
- 1:1 relations are unique on both sides
- 1:many relations are unique on one side
Unexpected results may be otherwise.
It is good practice to test the relationship 1 side column(s) for uniqueness (automatic testing within Honeydew coming soon)
Relationship typically convert to JOINs when are used to compute semantics.
Join direction between relations can be configured per relation, or per dynamic dataset.
Default Join Type
Following types are supported:
- Full Outer
Unless other specified, the chosen join type will be used in queries.
By default, if not configured otherwise, Honeydew connects entities via INNER joins.
Overriding in Dynamic Datasets
Sometimes, in a specific query, you would want to change the join direction.
In Honeydew, this is possible in a dynamic dataset configuration.
Automatic Join Type selection
Coming Soon: Automatic join selection, and complex joins (exclusive, inclusive, semi) - based on entity inclusivity expression