Introduction
A time spine entity is a list of times (like consecutive dates), that has relations to time columns
of other entities.
The main use cases for a time spine are:
-
Filling in missing data - for example when counting new users by day, a day without users should be counted as 0. But
without a time spine, if counting on the new users table on a missing date, then it will just not appear.
-
Providing a common time for different metrics - most metrics are based on time. When metrics are on different entities
how to count them to the same day? Connect those entities to a time spine, and count them to the date in the spine.
A time spine in Honeydew is modeled as an entity, marked as a time spine.
Building a time spine
The time spine is an entity. It can be based on a table in the data warehouse, or on a custom SQL query, such as:
-- Snowflake SQL for a per-day time spine spanning 2020-2030
SELECT
DATEADD(DAY, row_number() over (order by seq), '2020-01-01'::date)::DATE AS DATE,
DAYOFMONTH(DATE) as DAY_OF_MONTH,
TO_VARCHAR(DATE, 'dy') as DAY_OF_WEEK,
DAYOFWEEK(DATE) as DAY_OF_WEEK_INDEX,
DAYOFYEAR(DATE) as DAY_OF_YEAR,
DATE_TRUNC('month', DATE) as MONTH,
MONTHNAME(DATE) as MONTH_NAME,
MONTH(DATE) as MONTH_NUM,
DATE_TRUNC('quarter', DATE) as QUARTER,
'Q' || QUARTER(DATE) as QUARTER_OF_YEAR,
DATE_TRUNC('week', DATE) as WEEK,
WEEKOFYEAR(DATE) as WEEK_OF_YEAR,
YEAR(DATE) as YEAR,
FROM (SELECT SEQ4() as seq FROM TABLE(GENERATOR(ROWCOUNT=>365*10)))
- The time spine entity should be identified by setting the
is_time_spine metadata tag in its YAML configuration.
- Multiple time spines are not supported.
Always prefer creating time spines as tables and not as custom SQL entities.Snowflake does not cache queries that use table generators.
Using a time spine as custom SQL without materialization will disable caching for queries
that involve the time spine.
The time spine should have the granularity column as key - in a per-day spine, the date is the key.
Make sure to set the timegrain attribute for the granularity column, to enforce that granularity.
Connecting a time spine
time spines can be connected to entities via a relationship.
By convention, when connecting an entity to a time spines, the following relationship properties should be set:
- Key: Per-date granularity
- Join type:
many-to-one (a time spine is always a shared dimension)
- Join type:
right (a time spine is always filling missing dates)
- Cross filtering direction:
one-to-many (a time spine can filter other entities, but is never filtered by them)
When building a time spine from the UI, the default configuration is set automatically.
Default date field
When an entity is connected using a relationship to a time spine, the connecting field
is called the “default date field”.
For example, if orders is connected to a time spine on the orders.order_date = date.date field, then
order_date is the default date field for all metrics acting on orders.
Metrics within an entity that includes a default date field will use it for time metrics, utilizing the associated time spine.
In particular,
- For time metrics, Honeydew will populate any missing dates in the default date field.
For example, consider a user counting
TIME_METRIC(orders.count) by date.month (while date is a time spine and orders.order_date is
the default date field used to connect orders with date). If there are no valid order_date rows (i.e. no orders)
between March 1st and March 31st, then Honeydew will fill March as a month with zero orders.
- When using time offset syntax (
TIME_METRIC(orders.count, offset => '1 month')) the offset will be relative to
the default date field.
- When using period-to-date syntax the accumulation will be relative to the default field.
The Honeydew convention for a per-day time spine is to call the entity date