Introduction to relations

Usually, there is more than one entity in data. For example, TPC-H example has few “natural” entities in the data: customers, orders, line items of an order and parts in a line item.

Relations have directions - a customer may have many orders. Each order may have many line items.

A Relation in Honeydew connects entities. It can be a 1:1 or a 1:many relationship.

The “1” side of a relationship always goes to a Granularity Key of an entity.

For example, orders and customers are connected: custkey of orders to custkey of customers. custkey is the granularity key of customers.

Relations in TPC-H example

Once there is a relationship defined, you can look at data together. For example, you may want to see all the orders of a customer. Honeydew will know how to connect the two.

Honeydew engine assumes data corresponds to the relations configuration:

  • 1:1 relations are unique on both sides
  • 1:many relations are unique on one side

Unexpected results may be otherwise.

It is good practice to test the relationship 1 side column(s) for uniqueness (automatic testing within Honeydew coming soon)

Join Direction

Relationship typically convert to JOINs when are used to compute semantics.

Join direction between relations can be configured per relation, or per dynamic dataset.

Default Join Type

Following types are supported:

  • Inner
  • Left
  • Right
  • Full Outer

Unless other specified, the chosen join type will be used in queries.

By default, if not configured otherwise, Honeydew connects entities via INNER joins.

Overriding in Dynamic Datasets

Sometimes, in a specific query, you would want to change the join direction.

In Honeydew, this is possible in a dynamic dataset configuration.

Automatic Join Type selection

Coming Soon: Automatic join selection, and complex joins (exclusive, inclusive, semi) - based on entity inclusivity expression