Source Data
Entity Source Data
The source data definition sets where the data for an entity comes from.
Source data is associated with a single entity.
The source data definition specifies:
- The origin of the data (e.g., a physical object or an SQL query).
- The columns to be included as attributes of an entity, along with their designated names.
- The metadata associated with those attributes, derived from the source.
Columns
Columns from the source data are added as attributes in the entity.
Attribute Mapping and Renaming
As part of the source data definition, you can optionally specify a name for the created attribute. If no name is provided, the column name is used as the attribute’s default name.
A common pattern for light data cleaning transformations involves adding a column as a hidden attribute and then creating a calculated attribute with the same name that applies the transformation.
For example, if the city_name
column in the source data contains surrounding whitespace, you can normalize the column by
adding it as a hidden attribute and creating a calculated attribute to remove the whitespace:
- Add the
city_name
column as_city_name_raw
attribute, set as hidden. - Create a calculated attribute called
city_name
with the expressionTRIM(entity._city_name_raw)
Structure and Type Synchronization
When source data is created, its column structure and data types are initially imported from the source.
If the structure or data types change over time, you can re-synchronize them by using the “synchronize from source” option available in the UI or API.
Data Mapping
Source data can be a database object (such as a table, view, or custom SQL query).
Source objects can include any Snowflake object containing data, such as TABLE, VIEW, DYNAMIC TABLE, MATERIALIZED VIEW, and more. All types of tables are supported, including regular tables, Iceberg tables, and external tables.
Custom SQL may result in lower performance due to the following reasons:
- Filters may not be pushed down into the query, leading to larger table scans.
- Column selection may not be pushed down, causing unnecessary data movement.
If the source data is a large dataset that is frequently filtered for performance reasons, it’s recommended to encapsulate it in a table. If the source data contains many columns, consider encapsulating it in a view.
In particular, it’s always better to reference a table directly rather than using a SELECT * FROM table
custom SQL source.
Semi-Structured and JSON Data
Source data, whether it’s a query, table, or view, can include semi-structured columns (such as JSON, arrays, or maps).
A Honeydew attribute can be of a semi-structured type and used directly as a structured object. However, most BI tools cannot process semi-structured data and require simple, structured columns.
There are two approaches to normalize semi-structured data for BI:
- Use the source as-is with semi-structured attributes and create calculated attributes to unpack specific fields..
- Unpack all fields with a custom SQL query source.
For a JSON column with different fields, a structured custom SQL query can look like this:
Note that when unpacking semi-structured data, it is recommended to explicitly cast the columns to the appropriate types.
In most cases, it is recommended to avoid using LATERAL FLATTEN
join to flatten a JSON object.
When using LATERAL FLATTEN
, each key in the JSON object becomes a separate row with a key and a value,
which can alter the granularity of the entity. If this behavior is desired, use the SEQ
column,
which provides a unique sequence number, as part of the entity key to maintain consistency and avoid issues with key duplication.
YAML Schema
Each entity is based on a source table defined in a separate text file stored in Git.
The schema for source table is:
Fields:
name
: Name of source tabledisplay_name
,owner
,description
,labels
,folder
,hidden
: Metadatadataset_type
: Choose if a table, a SQL query or a virtual entitysql
: The name of table (when type is a table) / the base entity (virtual table) / SQL query (custom sql)attributes
: List of attributes to bring from the source, with optional renaming
Was this page helpful?