> ## Documentation Index > Fetch the complete documentation index at: https://honeydew.ai/docs/llms.txt > Use this file to discover all available pages before exploring further. # Datasets > How to build datasets based on shared semantics

## Intro A Dynamic Dataset is a way to look at data using Honeydew. A typical Honeydew deployment has many thousands of pieces of logic. An analysis might need just a subset, for example, the orders of a particular customer. The simplest dataset is “all the data of an entity”. Dynamic datasets are built: 1. Attributes 2. Metrics 3. Filters The components can come from fields defined in the semantic layer (attributes and metrics) or defined at ad-hoc field SQL expressions. Unlike attributes or metrics defined in Entities, ad-hoc calculations defined in a dataset is not reusable by other calculations. ### Use Cases 1. Build static datasets for data science (i.e. to load into a Python data frame) 2. Share data with external tools that accept a table 3. Deliver semantic models into [dbt](/integration/etl-tools/dbt) 4. Setup [aggregate aware caches](/performance/aggregate-awareness) for query acceleration ### Metadata Dynamic datasets may include metadata such as their owner, business description, labels. See [metadata](/governance/metadata) section for more details. ### Example Build a new dynamic dataset, and select from `lineitem` two fields: `l_linestatus` (an attribute) and `sum_qty` (a metric). The sums up item quantity per their status delivery status. The schema for it is - ```yaml theme={null} attributes: - lineitem.l_linestatus metrics: - lineitem.sum_qty ``` ## Interface ### Accessing the data Dynamic datasets can power different delivery modes: * Snowflake - with a view, a table, a dynamic table or an interactive table * Databricks - with a view or a table * BI - creating an extract based on a dynamic dataset that was deployed to Snowflake or Databricks * Python - with the honeydew `hdapi` python module * 3rd party such as dbt model - using honeydew APIs ### Filtering Order Filters defined as part of a dynamic dataset definition will apply to its query. For more details, see [order of computation](/advanced-modeling/order-of-computation) ### Parameterized Datasets Dynamic datasets can be parameterized with [Parameters](/parameters). Parameters can be specific to a dynamic dataset, or can overrule global parameters. ### Domains Dynamic datasets operate with the context of a [domain](/domains) if set and default domain otherwise. Filters, parameters and any other domain configuration will apply to the dynamic dataset. ## Materialization When a dynamic dataset is delivered to a data warehouse, all fields are created in the chosen objects (a view or a table): * Attributes as columns. * Metrics as columns. Metrics are aggregated and grouped by default by all the attributes (unless specified otherwise with a `GROUP BY` in the metric definition). * All filters of the dataset apply. Dynamically changing the level of granularity is not possible after delivery - if you need to explore the same dataset at different levels of granularity use the SQL interface. ### Snowflake #### Views Datasets can be deployed as a Snowflake VIEW. Honeydew will maintain the underlying query and will detect if its logic is different from the shared source of truth. Use views for: * Build live datasets for sharing * Integration with tools that can't use live JDBC connection * Real-time data Parameterized Datasets can be deployed as VIEWs. Parameters can be controlled with the Snowflake `SET` SQL command. #### Tables Datasets can be deployed as a Snowflake TABLE. Honeydew will NOT automatically detect if the logic that built a TABLE has changed Use tables for: * "Freeze" a dataset for an ad-hoc analysis on it * Use cases where a dynamic table is impossible to use Following configuration is possible: * Target database and schema * Virtual data warehouse to use * Should it be a transient table (the default is to create transient tables) * Table clustering expressions (optional) You can find more information about Tables creation parameters [here](https://docs.snowflake.com/en/sql-reference/sql/create-table). #### Dynamic Tables Datasets can be deployed as a Snowflake [Dynamic Table](https://docs.snowflake.com/en/user-guide/dynamic-tables-about). Honeydew will maintain the underlying query and will detect if its logic different from the shared source of truth. Following configuration is possible: * Target database and schema * Virtual data warehouse to use * Dynamic table lag amount and units (seconds / minutes / hours / days) * Should it refresh downstream dynamic tables * Dynamic table refresh mode (`AUTO` / `FULL` / `INCREMENTAL`). Default is `FULL`, if not provided. * Dynamic table initialize mode (`ON_CREATE` / `ON_SCHEDULE`). Default is `ON_CREATE`, if not provided. * Dynamic table clustering expressions (optional) You can find more information about Dynamic Tables creation parameters [here](https://docs.snowflake.com/en/sql-reference/sql/create-dynamic-table). Use dynamic tables for: * Large data or complex queries that require performance * Continuous materialization with automatic refresh #### Interactive Tables Datasets can be deployed as a Snowflake [Interactive Table](https://docs.snowflake.com/en/user-guide/interactive). Snowflake Interactive Tables are currently in preview and are not generally available yet. Please refer to [Snowflake documentation](https://docs.snowflake.com/en/user-guide/interactive) for more details. Honeydew will maintain the underlying query and will detect if its logic different from the shared source of truth. Following configuration is possible: * Target database and schema * Virtual data warehouse to use * Interactive table clustering expressions (required) * Interactive table lag amount and units (seconds / minutes / hours / days) (optional) You can find more information about Interactive Tables creation parameters [here](https://docs.snowflake.com/en/sql-reference/sql/create-interactive-table). Use interactive tables for dynamic datasets that need to be optimized for low-latency, interactive queries ### Databricks #### Views Datasets can be deployed as a Databricks VIEW. Honeydew will maintain the underlying query and will detect if its logic is different from the shared source of truth. Use views for: * Build live datasets for sharing * Integration with tools that can't use a live JDBC connection * Real-time data #### Tables Datasets can be deployed as a Databricks TABLE. Honeydew will NOT automatically detect if the logic that built a TABLE has changed Use tables for: * "Freeze" a dataset for an ad-hoc analysis on it Following configuration is possible: * Target catalog and schema * HTTP path for cluster or warehouse connection * Table clustering expressions (optional) ## YAML Schema Each dynamic dataset is defined by a YAML file in Git, which also tracks and preserves the full history of every change. The schema for a dynamic dataset is: ```yaml theme={null} type: perspective name: owner: domain: description: labels: [label1, label2, ...] tags: - key: value: source: - ... folder: display_name: hidden: use_cache: attributes: - - ... metrics: - - ... filters: - - ... parameters: - name: value: - ... transform_sql: group_by_all: # group by all attributes to deduplicate the data, when no metrics are provided delivery: - snowflake: target: name: schema: database: warehouse: table_settings: - transient: - cluster_by: dynamic_table_settings: - lag: - num: - units: - downstream: - refresh_mode: - initialize: - cluster_by: interactive_table_settings: - lag: - num: - units: - cluster_by: enabled: - databricks: target: name: schema: catalog: http_path: table_settings: - cluster_by: enabled: ``` Fields: * `name`: Name of the dataset * `owner`, `description`, `labels`, `tags`, `folder`, `display_name`, `hidden` (all optional): Metadata * `domain` (optional): [Domain](/domains) in which context to run * `attributes` (optional): List of [attribute](/calculations/attributes) expressions (can be named attributes or ad-hoc expressions) * `metrics` (optional): List of [metric](/calculations/metrics) expressions (can be named metrics or ad-hoc expressions) * `filters` (optional): List of filter expressions (can be named boolean attributes or ad-hoc expressions) * `parameters` (optional): List of [parameters](/parameters) and values (override workspace values, if exist) * `use_cache` (optional): Whether to use caches (for [entities](/performance/entity-caching) and/or [pre-aggregations](/performance/aggregate-awareness)). Default - yes. * `transform_sql` (optional): An additional SQL transformation that will be applied to the dataset. A typical use case includes adding `ORDER BY` or `LIMIT` clauses.