Introduction
A multi-grain table or a mixed-grain table mixes two or more such levels in the same structure. For instance:- A sales fact table that stores daily totals by store but also includes monthly totals in the same table.
- A web analytics table where some events are logged at the session level and others at the page-view level.
- A financial fact table that records both transaction-level entries and quarterly adjustments.
- Mixed-Grain: Each data point has a single grain, but they are mixed: some web events are at session-level and some are at page-view level.
- Multi-Grain: Each data point has multiple grains: store sales are recorded both daily and monthly.
Why it matters
- Complicates joins and aggregations: Queries can double-count if the analyst doesn’t know which grain they are working with.
- Harder governance: Business logic like “one row = one transaction” no longer holds.
Modeling Category 1: Different data, different grains (“Mixed-Grain”)
Each data point exists at a single grain, but grains differ across event types (e.g., session-level vs page-view-level events). To model different data recorded at different grains, separate grains to different fact tables. For example, for web analytics table that mixes session-level and page-view level events, separate to:fact_sessions
(grain = session)fact_pageviews
(grain = page view), typically connected to its session bysession_id
with a many:1 relationship.
Don’t denormalize into one wide table — it introduces double-counting traps and complex modeling.
Modeling Category 2: Same data, multiple versions (“Multi-Grain”)
Each data point exists at more than one grain (e.g., store sales recorded daily and monthly). The is a more complex case as that introduces duplicated data into the semantic data. Honeydew supports modeling multi-grain tables:- Ensure join consistency without row duplication.
- Adjust metric logic by the chosen grain.
- Hide the “multi-grain” data modeling complexity from the business user or AI agent.
Multiple grain tables for same data
While a multi-grain table is a single table with data at different grains, some data models have separate tables for the same data, each at a different grain, but not mixed together. For example:sales_daily_fact
: 1 row per store per day.sales_monthly_fact
: 1 row per store per month. The computation would include end-of-month adjustments, so it is not a simple transformation fromsales_daily_fact
.
Always use
UNION ALL
to union multiple grains rather than UNION
for significantly faster performanceHow to Handle Duplicate Grains
Consider a multi-grain factfact_sales
that records per store both the daily and the monthly sales volume, as indicated in the grain
column.
If naively counting per store sales volume as SUM(sales)
, the result would be wrong: it would double count daily and monthly totals.
When operating with multi grain data, must always deduplicate it.
There are two deduplication approaches:
- At the domain level, applying to all calculations at once
- At per-metric level, applying to only a particular metric
Domain-Level Deduplication
With domain-level deduplication, double counting is avoided by removing the duplicated data via a Domain Source Filter that enforces a choice of grain.Domain-level deduplication ensures any single user query sees only one grain of the data - the one it chose to show.Different queries can operate at different grains, but no query can see multiple grains at once.
- Create a table called
dim_grain
with the list of possible grains. That would be used by the user to choose current grain by filtering on it. - Create a source filter in a domain that enforces a specific grain is chosen:
For more information about
GET_FIELD_SELECTION
, see conditional filtering.Metric-Level Deduplication
With metric-level deduplication, double counting is avoided in metric definitions, but data itself stays duplicated.Metric level deduplication does not reduce data in the fact entity.It requires attention in every metric. Summing up sales without a filter (by mistake) would otherwise result in duplicated data.
For more information about
GET_FIELD_SELECTION
, see conditional filtering.