Entities

Entity

An Entity in Honeydew is defined as “a collection of stuff that shares the same granularity”. For example in TPC-H, we can make an entity called orders - all the data we have per orders. The orders entity is based on a TABLE in Snowflake, that has a unique key column in it (orderkey), and one row per order. All properties of a single order are in the columns.

When modeling a SQL data warehouse, entities like customers may be stored in tables called “Dimensions”, and events on them like payments in tables called “Facts”. If you start from a well-architected data warehouse then import each existing table in it (dimension or fact) as an entity in Honeydew. But don’t worry if you don’t! We are here to make modeling business entities easy.For more in-depth on modeling facts and dimensions, see here.

Granularity Key

The most important thing in an Entity is its “Granularity Key”. This is what defines what is a unique instance of the Entity - for example in orders entity, the key is orderkey defining a unique order.

Granularity Key can be a combination of a few attributes that are unique together. This is called composite or compound key. For example, in TPC-H lineitems entity, the key is a combination of orderkey and linenumber.

An entity can be key-less, but there is significant limitations to key-less entities, the most important is that it only be the higher granularity of every entity related to it.

Honeydew engine assumes entity keys are unique and non-null. Unexpected results may be otherwise.It is good practice to test the key column(s) for uniqueness (automatic testing within Honeydew coming soon).If the key (or any of its attributes) contains NULL values that you intend to use for key values and join operations, it is recommended to create a new calculated attribute that replaces the NULL values with placeholders (e.g., 'NULL' or 'N/A') using the COALESCE function.

Metadata

Entities may include metadata such as their owner, business description, labels. See metadata section for more details.

Source Data

Source data for an entity is set in its Source Data definition. Entities don’t have to be based on a physical table in the data warehouse. Following source data types are possible:

A physical table or a view
A custom SQL query that defines the data for the entity
A virtual entity, that is based on a calculation

The definition of the source table (regardless of its type) sets columns that become attributes of the entity.

When defining entities in the UI, the entity granularity key must come from its source table. If the entity key is based on a calculated attribute, then create a key-less entity first, make the calculated attribute, and then use it as its key.

Virtual Entity

An Entity is defined by its granularity key and source table. However, sometimes that granularity comes from attributes that exist in the semantic model.

Use Cases

There are few reasons to make a virtual entity:

Nested or Denormalized data tables that include few levels of granularity together.
Build 1:many relationships to a level of granularity that is not an entity key.

Interface

For a virtual entity, must define:

Source entity (it can be virtual as well)
Granularity key that comes from the source entity
Attributes that come from the source entity that are at the virtual entity granularity

For example an event table might include a user_id column and a user_name, that is per user_id and is duplicated in the event table. If you have an events entity (on the event table), can build from it a users entity with user_id as key, and user_name as an attribute.

Virtual entity key

Virtual entities have a single granularity key - multi-attribute granularity is not supported.

To create a composite key from a few fields, build a calculated attribute for the composition. For example if the key is order_id and line_id then can create a calculated attribute in orders for with SQL: HASH(order_lines.order_id, order_lines.line_id), and then use it as the key of the virtual entity.

NULL key values are filtered.

To create a filtered virtual entity that only has values for specific rows, build a calculated attribute that is not NULL for relevant rows only. For example for a virtual entity for valid order_id only, can create a calculated attribute in orders with SQL: CASE WHEN orders.valid THEN orders.order_id ELSE NULL END, and then use it as the key of the virtual entity.

YAML Schema

Every entity is backed by a text file in git that defines it, and keeps history of every change. The schema for entity is:

type: entity
name: [name]
display_name: [display name]
owner: [owner]
description: |-
  [description...]
labels: [...]
folder: [folder]
hidden: [True/False/Yes/No]
keys:
  - [key attr 1]
  - ...
key_dataset: [key dataset]
is_time_spine: [True/False/Yes/No]
relations: [relations - see relations for details]
metadata:
  [metadata]

Fields:

name: Name of entity
display_name, owner, description, labels, folder, hidden: Metadata
keys: Granularity keys of entity (can be multiple). Attributes must exist either in dataset or as calculated attributes.
key_dataset: Reference to source table (see below)
is_time_spine: If the entity is a time spine, used for time metrics
metadata: Additional metadata for the entity (see examples for AI)

Quick Start

Migration Guides

Reference

Integrations

Security

Release Notes

Entity

Granularity Key

Metadata

Source Data

Virtual Entity

Use Cases

Interface

Virtual entity key

YAML Schema

Quick Start

Migration Guides

Reference

Integrations

Security

Release Notes

​Entity

​Granularity Key

​Metadata

​Source Data

​Virtual Entity

​Use Cases

​Interface

​Virtual entity key

​YAML Schema

Entity

Granularity Key

Metadata

Source Data

Virtual Entity

Use Cases

Interface

Virtual entity key

YAML Schema