Slowly Changing Dimensions
Introduction
Slowly Changing Dimensions (SCDs) are a common data modeling technique used to manage historical changes in dimension data over time. This enables more accurate time-based analysis and reporting, such as understanding how KPIs were affected under previous attribute values.
SCDs are categorized into different types based on how they handle changes to dimension data:
SCD Type 0 - Fixed Dimensions
- No changes allowed. The data remains as it was when first inserted.
- Useful when historical accuracy is critical and the value should never change.
- Example: A product’s original launch date.
SCD Type 1 - Overwrite
- Changes overwrite existing data. No history is preserved.
- Simple to implement but loses historical context.
- Example: If a customer changes their email, the old one is replaced.
SCD Type 2 - Historical Tracking
- Each change creates a new record, often with start/end timestamps or versioning.
- Preserves full history of changes.
- Example: Tracking changes to a customer’s loyalty tier over time.
SCD Type 3 – Previous Value
- Stores only the previous value alongside the current one.
- Limited history, useful when only one change needs to be tracked.
- Example: Keeping a “current region” and a “previous region” field for a customer.
SCD0 and SCD3 are rarely used.
Modeling SCDs in Honeydew
In Honeydew, the modeling of SCDs of Types 0, 1, and 3 is straightforward. Joins between entities are defined using the standard foreign key relationships.
For SCD Type 2, Honeydew supports modeling SCDs using a combination of foreign keys and date ranges. Joins between entities are defined using a custom SQL expression that includes the date range logic.
Example: Fact and a Slowly Changing Dimension (SCD2)
Given two tables:
fact_sales
: a fact table tracking order transactions that has a foreign keycustomer_id
dim_customer
: a dimension table tracking customer information over time (SCD Type 2). It has multiple entries percustomer_id
with validity ranges.
Would want to connect the sales data with the correct historical customer information as it was at the time of the order.
Sample data
fact_sales
(Fact Table)
order_id | customer_id | order_date | amount |
---|---|---|---|
5001 | 101 | 2021-06-10 | 250 |
5002 | 101 | 2023-01-12 | 300 |
5003 | 102 | 2022-07-22 | 450 |
dim_customer
(SCD2 Dimension Table)
customer_id | customer_sk | name | region | valid_from | valid_to |
---|---|---|---|---|---|
101 | 1 | Alice Smith | East | 2021-01-01 | 2022-03-01 |
101 | 2 | Alice Smith | West | 2022-03-01 | 9999-12-31 |
102 | 3 | Bob Johnson | North | 2021-05-15 | 9999-12-31 |
The key for dim_customer
is not customer_id
(which is repeating across ranges), but rather a surrogate key (customer_sk
)
valid_from
/ valid_to
define the row’s effective period.
Also note that valid_to here is an infinity date (9999-12-31). In some settings it is used as NULL instead, in which case can adjust the join condition accordingly.
Relations
To associate each order with the correct customer version at that point in time, use a custom SQL expression on valid_from
and valid_to
:
And set a “many-to-one” relationship from fact_sales
to dim_customer
Example query
Result of a query on both:
order_id | customer_id | order_date | amount | name | region |
---|---|---|---|---|---|
5001 | 101 | 2021-06-10 | 250 | Alice Smith | East |
5002 | 101 | 2023-01-12 | 300 | Alice Smith | West |
5003 | 102 | 2022-07-22 | 450 | Bob Johnson | North |
Advanced: Multiple SCD2 (Fact and Dimension) + Point-in-Time Reference point
Advanced use cases for slowly changing dimensions allow to inspect the state of the world at any point in time (including “now”), while every data table has slowly changing dimension fields.
Here, the previous example is extended to support of consistent point-in-time queries on historical data where:
fact_sales
: a fact table with changing business logic over time (e.g. updated amount, revised status). It has multiple versions perorder_id
, each valid over a time range.dim_customer
: a dimension table with customer history over time (e.g. changed region), also with validity ranges.
A central dim_date
or dim_point_in_time
table is used to filter everything as of a specific point.
Users must filter on dim_point_in_time
to get correct results (whether for “today” or for any historical point of reference).
You can ensure a filter is always applied by configuring it directly within user-facing tools, such as BI dashboards.
This structure is used in auditable data models, financial snapshots, and analytics platforms.
Sample data
fact_sales
sample data:
order_sk | order_id | customer_id | order_date | amount | status | valid_from | valid_to |
---|---|---|---|---|---|---|---|
9001 | 5001 | 101 | 2021-06-10 | 250 | Pending | 2021-06-10 | 2021-07-01 |
9002 | 5001 | 101 | 2021-06-10 | 300 | Shipped | 2021-07-01 | 9999-12-31 |
9003 | 5002 | 102 | 2022-01-15 | 300 | Pending | 2022-01-15 | 9999-12-31 |
dim_customer
sample data:
customer_sk | customer_id | name | region | valid_from | valid_to |
---|---|---|---|---|---|
1 | 101 | Alice Smith | East | 2020-01-01 | 2022-03-01 |
2 | 101 | Alice Smith | West | 2022-03-01 | 9999-12-31 |
3 | 102 | Bob Johnson | North | 2021-05-01 | 9999-12-31 |
dim_point_in_time
: A joint reference point for all data
snapshot_date |
---|
2021-06-15 |
2022-01-01 |
2023-03-31 |
This is used to filter time centrally, so other joins respect that single reference point. This table can cover all possible dates.
Same approach can be extended for any type of data versioning - not only for point in time.
Relations
Fact to customers
To associate each order with the corresponding version of the customer that was valid at the time the order version was valid, use the following relation:
- Join on customer key and validity ranges
- Direction: Many to one (from
dim_customer
to point in time) - Cross-filtering is as needed (one-to-many or bi-directional)
Relation:
The result of that relation is that customers are resolved to the right appropriate customer to the time of the order, while keeping multiple versions of the order.
If a customer has multiple versions within the validity time of an order, it will not be resolved (i.e. will be resolved to NULL).
Entities to point in time reference
-
fact_sales
todim_point_in_time
: -
Many to one (from
fact_sales
to point in time) -
Cross-filtering is one-to-many (
dim_point_in_time
can filter the fact, but not vice versa)
Relation:
-
dim_customer
todim_point_in_time
: -
Many to one (from
dim_customer
to point in time) -
Cross-filtering is one-to-many (
dim_point_in_time
can filterdim_customer
, but not vice versa)
Relation:
The dim_point_in_time
is a shared dimension that can filter all associated entities.
Using cross-filtering one-to-many ensures that it will filter the entities, but will not be filtered by them.
Example query
Status of all orders given reference point of 2021-06-15
order_id | status | amount | name | region |
---|---|---|---|---|
5001 | Pending | 250 | Alice Smith | East |
- Only one valid version per
order_id
andcustomer_id
is active per point-in-time- Any rows not yet valid are excluded (e.g. 5002 is not visible on 2021-06-15)
Status of all orders given reference point of 2022-05-01
order_id | status | customer_id | order_date | amount | name | region |
---|---|---|---|---|---|---|
5001 | Shipped | 101 | 2021-06-10 | 300 | Alice Smith | East |
5002 | Pending | 102 | 2022-01-15 | 300 | Bob Johnson | North |
- Use
dim_point_in_time
to anchor the reference date - Join facts and dimensions based on SCD2 validity ranges
- Works seamlessly for time travel, reproducible snapshots, or data backfills
Was this page helpful?