Creating Business Context

A user in Finance or Marketing on a semantic layer does not want or may not be allowed access to everything. They want to see semantics relevant to them, with their context applied.

That concept is called in Honeydew a “Domain”.

A Domain is a lightweight governance object that allows to build context for users, as well as maintain access control on data and on metadata.

That context includes:

  1. Selection of entities and sub-selection of fields, that are accessible when using the domain
  2. Selection of relationships when multiple ways to connect entities are available
  3. Selection of filters that must be applied on every query on the domain
  4. Selection of parameter overrides that apply in the domain context

Selecting entities

Domain allows to select entities that participate. For example, a domain that select a subset of entities from TPCH:

type: domain
name: orders_domain
description: Orders Tracing Domain

# Entities and fields that participate
entities:
  - name: lineitem
  - name: orders
  - name: partsupp
  - name: part

Filters

Domain may control how data is filtered.

Semantic Filters

Filters within a domain apply to every query on the domain. Think of them as filters every user must add to every query.

For example,

type: domain
name: orders_domain
description: Domestic Orders Tracing Domain

# Entities and fields that participate
entities:
  - name: lineitem
  - name: orders
  - name: partsupp
  - name: part

# Semantic Filters that always apply to every query on the domain
filters:
  - lineitem.l_shipmode in ('MAIL', 'RAIL', 'TRUCK')

Semantic filters can be ad-hoc expressions (as in the example above) or reference named predefined filters.

A semantic filter will be always added to a query in a domain context. That means that semantic filters may have a perfomrance impact when they introduce more JOINs.

For example, this SQL query on the domain above that asks for part count:

SELECT AGG("part.count") FROM domains.orders_domain

will include a JOIN to lineitems even though it was not directly referenced in the query to make sure only parts that where shipped over ground are included.

Filtering based on User Context

A common use case is filtering data based on a parameter that sets user context.

For example, if:

  1. Every user operates within a tenant context
  2. Data in Snowflake is partitioned by a tenant column (in the example below in a tenant dimension dim_tenant, though can be a column in every table)
  3. Users are only allowed to see data within their own tenant

Set $TENANT user parameter, and use it as a domain filter:

type: domain

# Entities that participate ..

filters:
 - dim_tenant.tenant_id = $TENANT

Apply the parameter in the manner appropriate for the user queries (whether it is through BI connection settings or Snowflake SET statements).

If filtering data through a filtering dimension, make sure it is cross filtering the data with a one-to-many filtering direction.

Source Filters

Source filters are filters that are applied at the source level (unlike semantic filters that apply to the semantic layer as a whole).

The main use case for source filter is improving performance with logically partitioned data by always pushing filters below calculated attributes.

Thus, source filters are only supported on attributes that come from an entity source table.

A source filter is only applied if the source is part of the query.

type: domain
name: orders_domain
description: Domestic Orders Tracing Domain

# Entities and fields that participate
entities:
  - name: lineitem
  - name: orders
  - name: partsupp
  - name: part

source_filters:
  - lineitem.l_shipdate >= '1994-01-01' and lineitem.l_shipdate < '1995-01-01'

Caution: Source filters apply before any other computation is done, which can change the values of calculated attributed. See filtering order for more details.

When in doubt, use a semantic filter, not a source filter.

Roles

Domain may control active relationships.

Active Roles

When there are multiple possible ways to connect entities (for example in TPCH, a nation might be connected to supplier or customer) domain allows to select the active role to use.

Active roles enable to choose a relation for entities that have multiple relations. But what if want to enable easily using more than one relation path? See aliases below.

Example (based on TPCH derived example) - include all entities relevant to supply chain, and connect nation and region always through supplier.

type: domain
name: tpch_supply_chain
entities:
  - name: lineitem
  - name: orders
  - name: partsupp
  - name: part
  - name: nation
  - name: region

active_roles:
  - supplier_nation
  1. Active roles must result in an acyclic graph
  2. If an entity that participates in the domain can assume more than one role, at most one of them may be set

Roles Aliases

Role Aliases in domains is in Beta - reach out to support@honeydew.ai to enable in your account

When a domain includes the same entity that connected via multiple relationships, may give an alias for each chosen role.

type: domain
name: tpch_example_when_nation_is_supplier
entities:
  - name: nation
    role_path: [supplier_nation]
    alias: supplier_nation
  - name: region
    role_path: [supplier_nation]
    alias: supplier_region
  - name: nation
    role_path: [customer_nation]
    alias: customer_nation
  - name: region
    role_path: [customer_nation]
    alias: customer_region

Notes:

  1. The entity under a chosen role(s) must have a unique alias (not a name of another entity)
  2. The entity should be able to assume the chosen role(s)
  3. Indirect entities are not allowed: when setting an explicit alias and role for an entity indirect to the role (i.e. for region when the role is on nation) , must also have an alias for every entity on the role path.

Interfaces

Domains on SQL interface

Domains are present as a flat table in the domains schema.

  • Attributes that are part of the domain are accessible as SQL columns.
  • Metrics that are part of the domain are accessible as SQL columns that can be aggregated on.
  • All filters of the domain apply (in addition to any filters in the SQL query).

See SQL interface for more details.

Domains as a context for dynamic datasets

A dynamic dataset can be associated with a domain. In that case, all domain configuration applies to the dynamic dataset query.

YAML Schema

Every domain is backed by a text file in git that defines it, and keeps history of every change.

The schema for a domain is:

type: domain
name: [name]
display_name: [display name]
owner: [owner]
description: |-
  [desciption]
labels: [...]
folder: [folder]
hidden: [True/False/Yes/No]
entities:
  - name: [entity name]
    fields:
      - name: [field name]
      - ...
    includes: [field pattern to include]
    excludes: [field pattern to exclude]
    role_path: [role]
    alias: [alias for role]
  - ...
filters:
  - [filter expression]
  - ...
source_filters:
  - [source filter expression]
  - ...
active_roles:
  - [active role]
  - ..
parameters:
  - name: [parameter name]
    value: [parameter value]
  - ...
short_term_aggregate_cache_ttl_seconds: [the duration for which a query result can be reused as part of the short-term aggregate cache]

Fields:

  • name: Name of domain
  • display_name, owner, description, labels, folder, hidden: Metadata
  • entities: List of entities that participate in the domain
    • name: Name of entity
    • fields: List of fields from the entity that participate in the domain
      • name: Name of field (attribute or metric)
    • includes: Optional field pattern to include
    • excludes: Optional field pattern to exclude
    • role_path: Optional role path
    • alias: Optional alias for entity
  • filters: List of semantic filters
  • source_filters: List of source filters
  • active_roles: List of active roles
  • parameters (optional): List of parameters and values (override workspace values if they exist)
  • short_term_aggregate_cache_ttl_seconds (optional): Control short term aggregate caching. See Automatic Aggregate Caching.