Introduction to calculated attributes
A calculated attribute (sometimes called a “dimension”) adds a virtual column to an entity. In other words, the attribute can be computed for each unique row of an entity. Attributes within entities can be simple computations (like aCASE WHEN or multiplying a value). They can also be complex computations that span multiple related entities and aggregations.
Metadata
Attributes may include metadata such as their owner, business description, labels. See metadata section for more details.Basic calculated attribute
A basic calculated attribute only depends on other attributes in the same entity. For example, thelineitems entity (which represents lines of an order), there is an attribute that is called extendedprice. It represents the price not accounting for discount. You might want to have an attribute that reflects the actual price.
The definition of price is:
Build the new calculated attribute
Build the new calculated attribute
Go to the
lineitems entity, and press “+Attribute”. Call it price, enter
the SQL calculation formula above and save it.Check it with Preview Data
Check it with Preview Data
You can preview data by going to the ‘Preview’ tab.
Combining entities
Calculated attributes can be based on multiple entities. Which entities can be used depends on their relationship and level of granularity: An attribute may use it its calculation- Any attribute from a related entity that has the same or lower level of granularity as its own.
- Any aggregation on a related entity that has a higher level of granularity
orders entity has a value for each order.
If you want to combine it with attributes of other entities, it can use any attribute from the customers entity, but not from the lineitem entity:
See the relationship graph
See the relationship graph
You can view it in the product by going to the main page of entities.- Each order has a single associated customer: the
customersentity is related toordersentity with a lower granularity (a many:1 relationship). Any attribute inordersmay refer to any attribute incustomers. Technically, the row per order can be extended with columns coming from its customer. - Each order has many associated lineitems: the
lineitementity is related to theordersentity with a higher granularity (a 1:many relationship). So the is no single lineitem per order to choose from for the calculation, so using it would result in an error.
lineitems for an order: it possible to create an attribute per order that counts how many lineitems it had. See examples below.
Multi-entity attribute
The price of a line item is based on the quantity acquired times the price per unit (if you buy 3 apples for $2, then the price of the line is $6). The calculation would combineparts and the lineitems entity: every lineitem has an associated part that can be used in the calculation.
extendedprice attribute of lineitems. We can recreate the same calculation.
Build the new calculated attribute
Build the new calculated attribute
Go to the
lineitems entity, and press “+Attribute”.
Call it extendedprice_calculated, and enter the SQL calculation lineitems.quantity * parts.retailprice and save it.Check it with a Dynamic Dataset
Check it with a Dynamic Dataset
Build a new dynamic dataset that has the following attributes:
lineitems.extendedprice(the original from TPC-H)lineitems.extendedprice_calculated(the new one you just built)
Update the `price` attribute to use it
Update the `price` attribute to use it
Go to the ToIt will now use the calculated attribute we just built
lineitems entity, and edit the price attribute.
Change its calculation fromLookup attributes
Assume there is a tablepromotion_part_types that has a list of every promoted part type.
is_promotion attribute for a part would be TRUE is it matches a value from that table.
This is called a lookup attribute:
Using aggregations
Attributes may use aggregations as part of the calculation. It is usually required when referring to entities with a higher granularity than the one of the attribute. For example, a customer has many orders, so an attribute incustomer can’t refer to a single order, but it can look at any aggregation over them like a count.
See the relationship graph
See the relationship graph
You can view it in the product by going to the main page of entities.- Ad-hoc aggregations (like a
SUMor aCOUNT) - Metrics - named aggregations (see metrics for more)
Attribute with aggregations
The total price of an order is the sum of the prices of every line item bought in that order.totalprice attribute of orders TPC-H sample data. We can recreate the same calculation as a calculated field.
Note this example uses the price calculated attribute defined in the previous section.
Note that this can part of a more complex calculation:
Build the total price attribute
Build the total price attribute
Go to the
orders entity, and press “+Attribute”.
Call it totalprice_calculated , enter the SQL calculation formula:Build the order size classification attribute
Build the order size classification attribute
Go to the
orders entity, and press “+Attribute”.
Call it order_size_classification , enter the SQL calculation formula:Check it with a Dynamic Dataset
Check it with a Dynamic Dataset
Build a new dynamic dataset, with
orders.order_size_classification attributes and a single metric COUNT(*) as count. Preview its data to see how many orders of each size classification there are.
It should be something like:| orders.order_size_classification | count |
|---|---|
| big | 228,149 |
| small | 86,560 |
| regular | 1,185,291 |
Attribute with metrics (named aggregations)
The TPC-H schema defines that the status of order this way:- When status of all lines is
F(fulfilled) then status of order isF - When status of all lines is
O(ordered) then status of order isO - Else the status of order is
P(partially fulfilled)
lineitems.count that is the count of lineitems, combined with a filter:
orderstatus attribute of orders TPC-H sample data.
Constant attributes
Attributes can be constants and don’t have to compute anything. Using an attribute like that will create a virtual column with a constant value. However, in most use cases that call for a constant, it is better to use parameters both for reusability reasons (parameters allow to change their value in an ad-hoc computation) and for performance reasons.Semi-structured attributes
Attributes can be semi-structured objects like a JSON or an array. Can also create regular attributes by unpacking nested fields, e.g.YAML Schema
Each attribute is defined by a YAML file in Git, which also tracks and preserves the full history of every change. The schema for an attribute is:-
entity: Name of entity the attribute belongs to -
name: Name of attribute -
display_name,owner,description,labels,folder,hidden: Metadata -
datatype: Data type of attribute, as exposed to JDBC (stringis the default, if not specified). Possible values are:bool- a boolean value.number- an integer numberfloat- a floating point numberdate- a date valuetimestamp- a timestamp valuestring- a string value, used also for any other type of data (like array, JSON, etc.)
-
timegrain: Time grain of the attribute, if it is a date/time attribute. This can be used in time-based aggregations. Possible values are:microsecond,millisecond,second,minute,hour,day,week,month,quarter,year. -
sql: SQL statement of the calculated attribute -
metadata: Additional metadata for the attribute (see examples for AI, Tableau, Power BI, Lightdash and ThoughtSpot)