fact_sales
: a fact table tracking order transactions that has a foreign key customer_id
dim_customer
: a dimension table tracking customer information over time (SCD Type 2). It has multiple entries
per customer_id
with validity ranges.fact_sales
(Fact Table)
order_id | customer_id | order_date | amount |
---|---|---|---|
5001 | 101 | 2021-06-10 | 250 |
5002 | 101 | 2023-01-12 | 300 |
5003 | 102 | 2022-07-22 | 450 |
dim_customer
(SCD2 Dimension Table)
customer_id | customer_sk | name | region | valid_from | valid_to |
---|---|---|---|---|---|
101 | 1 | Alice Smith | East | 2021-01-01 | 2022-03-01 |
101 | 2 | Alice Smith | West | 2022-03-01 | 9999-12-31 |
102 | 3 | Bob Johnson | North | 2021-05-15 | 9999-12-31 |
dim_customer
is not customer_id
(which is repeating across ranges), but rather a surrogate key (customer_sk
)
valid_from
/ valid_to
define the row’s effective period.Also note that valid_to here is an infinity date (9999-12-31). In some settings it is used as NULL instead, in which case can
adjust the join condition accordingly.valid_from
and valid_to
:
fact_sales
to dim_customer
order_id | customer_id | order_date | amount | name | region |
---|---|---|---|---|---|
5001 | 101 | 2021-06-10 | 250 | Alice Smith | East |
5002 | 101 | 2023-01-12 | 300 | Alice Smith | West |
5003 | 102 | 2022-07-22 | 450 | Bob Johnson | North |
fact_sales
: a fact table with changing business logic over time (e.g. updated amount, revised status). It has multiple versions per order_id
, each valid over a time range.dim_customer
: a dimension table with customer history over time (e.g. changed region), also with validity ranges.dim_date
or dim_point_in_time
table is used to filter everything as of a specific point.
dim_point_in_time
to get correct results (whether for “today” or for any historical point of reference).You can ensure a filter is always applied by configuring it directly within user-facing tools, such as BI dashboards.fact_sales
sample data:
order_sk | order_id | customer_id | order_date | amount | status | valid_from | valid_to |
---|---|---|---|---|---|---|---|
9001 | 5001 | 101 | 2021-06-10 | 250 | Pending | 2021-06-10 | 2021-07-01 |
9002 | 5001 | 101 | 2021-06-10 | 300 | Shipped | 2021-07-01 | 9999-12-31 |
9003 | 5002 | 102 | 2022-01-15 | 300 | Pending | 2022-01-15 | 9999-12-31 |
dim_customer
sample data:
customer_sk | customer_id | name | region | valid_from | valid_to |
---|---|---|---|---|---|
1 | 101 | Alice Smith | East | 2020-01-01 | 2022-03-01 |
2 | 101 | Alice Smith | West | 2022-03-01 | 9999-12-31 |
3 | 102 | Bob Johnson | North | 2021-05-01 | 9999-12-31 |
dim_point_in_time
: A joint reference point for all data
snapshot_date |
---|
2021-06-15 |
2022-01-01 |
2023-03-31 |
dim_customer
to point in time)fact_sales
to dim_point_in_time
:
fact_sales
to point in time)
dim_point_in_time
can filter the fact, but not vice versa)
dim_customer
to dim_point_in_time
:
dim_customer
to point in time)
dim_point_in_time
can filter dim_customer
, but not vice versa)
dim_point_in_time
is a shared dimension that can filter all associated entities.Using cross-filtering one-to-many ensures that it will filter the entities, but will not be filtered by them.2021-06-15
order_id | status | amount | name | region |
---|---|---|---|---|
5001 | Pending | 250 | Alice Smith | East |
Status of all orders given reference point of
- Only one valid version per
order_id
andcustomer_id
is active per point-in-time- Any rows not yet valid are excluded (e.g. 5002 is not visible on 2021-06-15)
2022-05-01
order_id | status | customer_id | order_date | amount | name | region |
---|---|---|---|---|---|---|
5001 | Shipped | 101 | 2021-06-10 | 300 | Alice Smith | East |
5002 | Pending | 102 | 2022-01-15 | 300 | Bob Johnson | North |
dim_point_in_time
to anchor the reference date