Introduction to calculated attributes
A calculated attribute (sometimes called a “dimension”) adds a virtual column to an entity. In other words, the attribute can be computed for each unique row of an entity. Attributes within entities can be simple computations (like aCASE WHEN
or multiplying a value). They can also be complex computations that span multiple related entities and aggregations.
Metadata
Attributes may include metadata such as their owner, business description, labels. See metadata section for more details.Basic calculated attribute
A basic calculated attribute only depends on other attributes in the same entity. For example, thelineitems
entity (which represents lines of an order), there is an attribute that is called extendedprice
. It represents the price not accounting for discount. You might want to have an attribute that reflects the actual price.
The definition of price
is:
This code is standard SQL. Anything in SQL (like window functions, nested data
unpacking, user-defined functions, and type conversions) just works.
Build the new calculated attribute
Build the new calculated attribute
Go to the
lineitems
entity, and press “+Attribute”. Call it price
, enter
the SQL calculation formula above and save it.Check it with Preview Data
Check it with Preview Data
You can preview data by going to the ‘Preview’ tab.
Combining entities
Calculated attributes can be based on multiple entities. Which entities can be used depends on their relationship and level of granularity: An attribute may use it its calculation- Any attribute from a related entity that has the same or lower level of granularity as its own.
- Any aggregation on a related entity that has a higher level of granularity
orders
entity has a value for each order.
If you want to combine it with attributes of other entities, it can use any attribute from the customers
entity, but not from the lineitem
entity:
See the relationship graph
See the relationship graph

- Each order has a single associated customer: the
customers
entity is related toorders
entity with a lower granularity (a many:1 relationship). Any attribute inorders
may refer to any attribute incustomers
. Technically, the row per order can be extended with columns coming from its customer. - Each order has many associated lineitems: the
lineitem
entity is related to theorders
entity with a higher granularity (a 1:many relationship). So the is no single lineitem per order to choose from for the calculation, so using it would result in an error.
lineitems
for an order: it possible to create an attribute per order that counts how many lineitems it had. See examples below.
Multi-entity attribute
The price of a line item is based on the quantity acquired times the price per unit (if you buy 3 apples for $2, then the price of the line is $6). The calculation would combineparts
and the lineitems
entity: every lineitem has an associated part that can be used in the calculation.
When a calculated attribute refers to multiple entities, the engine will verify it can build a JOIN between them to compute. This is possible when referring to related entities that have a many-to-one relationship.Since in this example, every row of
lineitems
has exactly one associated part from parts
this calculation is possible. If the relation was reversed (a line item would have multiple possible parts), it would result in an error as it would not be possible to choose a part.extendedprice
attribute of lineitems
. We can recreate the same calculation.
Build the new calculated attribute
Build the new calculated attribute
Go to the
lineitems
entity, and press “+Attribute”.
Call it extendedprice_calculated
, and enter the SQL calculation lineitems.quantity * parts.retailprice
and save it.Check it with a Dynamic Dataset
Check it with a Dynamic Dataset
Build a new dynamic dataset that has the following attributes:
lineitems.extendedprice
(the original from TPC-H)lineitems.extendedprice_calculated
(the new one you just built)
Update the `price` attribute to use it
Update the `price` attribute to use it
Go to the ToIt will now use the calculated attribute we just built
lineitems
entity, and edit the price
attribute.
Change its calculation fromLookup attributes
Assume there is a tablepromotion_part_types
that has a list of every promoted part type.
is_promotion
attribute for a part would be TRUE is it matches a value from that table.
This is called a lookup attribute:
An attribute lookup would be compiled to a
WHERE IN (SELECT ...)
over a
subquery of potential attribute values.Using aggregations
Attributes may use aggregations as part of the calculation. It is usually required when referring to entities with a higher granularity than the one of the attribute. For example, a customer has many orders, so an attribute incustomer
can’t refer to a single order, but it can look at any aggregation over them like a count.
See the relationship graph
See the relationship graph

While it is possible to use aggregations on entities of same or lower
granularity, it has little value: when referring to a lower granularity only a
single row matches (an order has a single customer), so aggregating over can
be meaningless.
- Ad-hoc aggregations (like a
SUM
or aCOUNT
) - Metrics - named aggregations (see metrics for more)
Attribute with aggregations
The total price of an order is the sum of the prices of every line item bought in that order.totalprice
attribute of orders
TPC-H sample data. We can recreate the same calculation as a calculated field.
Note this example uses the price
calculated attribute defined in the previous section.
Any aggregation defined in a calculated attribute of an entity is grouped by
that entity key. Here, every unique row of an order, gets the sum of related
line items.
Build the total price attribute
Build the total price attribute
Go to the
orders
entity, and press “+Attribute”.
Call it totalprice_calculated
, enter the SQL calculation formula:Build the order size classification attribute
Build the order size classification attribute
Go to the
orders
entity, and press “+Attribute”.
Call it order_size_classification
, enter the SQL calculation formula:Check it with a Dynamic Dataset
Check it with a Dynamic Dataset
Build a new dynamic dataset, with
orders.order_size_classification
attributes and a single metric COUNT(*) as count
. Preview its data to see how many orders of each size classification there are.
It should be something like:orders.order_size_classification | count |
---|---|
big | 228,149 |
small | 86,560 |
regular | 1,185,291 |
Attribute with metrics (named aggregations)
The TPC-H schema defines that the status of order this way:- When status of all lines is
F
(fulfilled) then status of order isF
- When status of all lines is
O
(ordered) then status of order isO
- Else the status of order is
P
(partially fulfilled)
lineitems.count
that is the count of lineitems, combined with a filter:
Metrics are a very powerful abstractions for reusability. When a metric is
combined with a
FILTER
qualifier like her it can reduced to aggregating over
a selection of related rows to the attribute instead of them all. Check out
metrics for more.orderstatus
attribute of orders
TPC-H sample data.
Constant attributes
Attributes can be constants and don’t have to compute anything. Using an attribute like that will create a virtual column with a constant value. However, in most use cases that call for a constant, it is better to use parameters both for reusability reasons (parameters allow to change their value in an ad-hoc computation) and for performance reasons.Semi-structured attributes
Attributes can be semi-structured objects like a JSON or an array. Can also create regular attributes by unpacking nested fields, e.g.Since most BI tools do not support semi-structured objects, their data type in SQL interface is reported as a string.
YAML Schema
Every attribute is backed by a text file in git that defines it, and keeps history of every change. The schema for an attribute is:-
entity
: Name of entity the attribute belongs to -
name
: Name of attribute -
display_name
,owner
,description
,labels
,folder
,hidden
: Metadata -
datatype
: Data type of attribute, as exposed to JDBC (string
is the default, if not specified). Possible values are:bool
- a boolean value.number
- an integer numberfloat
- a floating point numberdate
- a date valuetimestamp
- a timestamp valuestring
- a string value, used also for any other type of data (like array, JSON, etc.)
-
timegrain
: Time grain of the attribute, if it is a date/time attribute. This can be used in time-based aggregations. Possible values are:microsecond
,millisecond
,second
,minute
,hour
,day
,week
,month
,quarter
,year
. -
sql
: SQL statement of the calculated attribute -
metadata
: Additional metadata for the attribute (see examples for AI, Tableau, Power BI, Lightdash and ThoughtSpot)