> ## Documentation Index
> Fetch the complete documentation index at: https://honeydew.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Google BigQuery

<Warning>
  Honeydew support for Google BigQuery is currently in Beta.

  Please contact [support@honeydew.ai](mailto:support@honeydew.ai) to get access to Google BigQuery integration.
</Warning>

# Google BigQuery Integration Setup

Honeydew requires access to Google BigQuery to operate.
You have two options to set up Google BigQuery access - either using a central
org-level connection or map your individual Google BigQuery user credentials to
Honeydew.

If you would like to use a central org-level connection,
it is advised to create a new dedicated service account for Honeydew integration.
The following Google BigQuery connection parameters are required for Honeydew setup:

1. Project ID
2. Dataset (for storing dynamic datasets)

## Authentication Methods

Honeydew supports the following authentication methods for Google BigQuery:

### [**Service Account Key authentication**](https://cloud.google.com/bigquery/docs/authentication/service-account-file)

This is the recommended method for org-level service accounts.
For this method, you will need to provide a **Service Account JSON Key File**.

<Steps>
  <Step title="Create a service account">
    If you do not already have a service account, create one in your Google Cloud
    project:

    1. Go to the [Google Cloud Console](https://console.cloud.google.com)
    2. Navigate to **IAM & Admin** > **Service Accounts**
    3. Click **Create Service Account**
    4. Enter a name for the service account (e.g. "Honeydew Integration")
    5. Optionally add a description
    6. Click **Create and Continue**
  </Step>

  <Step title="Grant required permissions">
    Grant the service account the necessary IAM roles to access Google BigQuery:

    1. In the service account creation flow,
       under **Grant this service account access to project**
    2. Add the role **BigQuery Job User** (allows running queries)
    3. Click **Continue**
    4. Click **Done**

    <Note>
      Additional dataset-level permissions will be configured in a later step
    </Note>
  </Step>

  <Step title="Generate and download the JSON key">
    To generate credentials for the service account:

    1. In the service accounts list, click the service account you just created
    2. Go to the **Keys** tab
    3. Click **Add Key** > **Create new key**
    4. Select **JSON** as the key type
    5. Click **Create**
    6. The JSON key file will be downloaded automatically
    7. Store this file securely

    <Warning>
      The JSON key file contains sensitive credentials. Store it in a secure location.
    </Warning>
  </Step>

  <Step title="Configure the Google BigQuery connection in Honeydew">
    In [Honeydew App settings page](https://app.honeydew.cloud/settings),
    configure the Google BigQuery connection by uploading the
    **Service Account JSON Key File** from the previous step, along with the Project ID.
  </Step>
</Steps>

### [**OAuth User Authentication**](https://docs.cloud.google.com/bigquery/docs/authentication/end-user-installed)

This method allows individual users to authenticate with their Google credentials.
Each user will need to authorize Honeydew to access their Google BigQuery resources using OAuth.

<Note>
  OAuth user authentication requires additional configuration.
  Contact [support@honeydew.ai](mailto:support@honeydew.ai) to enable this authentication method.
</Note>

## Connection Parameters

### Project ID

The Project ID is the unique identifier for your Google Cloud project.
You can find it in the Google Cloud Console:

1. Go to the [Google Cloud Console](https://console.cloud.google.com)
2. Select your project from the project drop-down at the top
3. The Project ID is displayed in the project info panel

<Tip>
  The Project ID is different from the project name.
  Make sure to use the Project ID, not the project name.
</Tip>

### Dataset Configuration

You will need to specify:

1. **Dataset** - the dataset where Honeydew will deploy dynamic datasets as views or tables
2. **Dev Dataset** - the dataset for dev branch deployments (optional, defaults to main dataset)

<Info>
  Google BigQuery uses a hierarchical namespace: `project.dataset.table`.
  Ensure your service account has appropriate permissions to access these resources.
</Info>

## Allowing Honeydew Client IP Addresses

If you have IP-based access restrictions in Google BigQuery using VPC Service
Controls,
add the IP addresses displayed in the Google BigQuery connection screen
in [Honeydew App settings page](https://app.honeydew.cloud/settings)
to the access level allowlist.

<Info>
  For the Honeydew Cloud deployment, the following IP addresses are used:

  * `34.86.209.90`
  * `34.145.147.92`

  If you are using a private Honeydew deployment, the IP addresses will be different.
  You can find them in the Google BigQuery connection screen in
  [Honeydew App settings page](https://app.honeydew.cloud/settings).
</Info>

<Tip>
  To configure VPC Service Controls:

  1. Create an access level with the Honeydew IP addresses
  2. Add the access level to your service perimeter protecting Google BigQuery
  3. Test with a dry-run perimeter first to avoid blocking legitimate traffic
</Tip>

## Permissions

Honeydew does not extract or store your data.
It only reads schema metadata and executes SQL queries inside your Google BigQuery environment.

You can find more security-related information [here](/security/security).

### Required Permissions

If using a service account, the following permissions are required:

#### Project-Level Permissions

**BigQuery Job User** role (`roles/bigquery.jobUser`) - Required to run queries and jobs

Grant this role at the project level:

```bash theme={null}
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:SERVICE_ACCOUNT_EMAIL" \
  --role="roles/bigquery.jobUser"
```

#### Dataset-Level Permissions

For datasets used in the semantic layer:

1. **BigQuery Data Viewer** role (`roles/bigquery.dataViewer`) -
   Read access to tables and views
2. **BigQuery Metadata Viewer** role (`roles/bigquery.metadataViewer`) -
   Read dataset and table metadata

For the dataset where dynamic datasets will be deployed:

**BigQuery Data Editor** role (`roles/bigquery.dataEditor`) - Create and manage tables/views

Grant dataset-level permissions:

```bash theme={null}
# For datasets used in the semantic layer
bq add-iam-policy-binding \
  --member="serviceAccount:SERVICE_ACCOUNT_EMAIL" \
  --role="roles/bigquery.dataViewer" \
  PROJECT_ID:DATASET_NAME

bq add-iam-policy-binding \
  --member="serviceAccount:SERVICE_ACCOUNT_EMAIL" \
  --role="roles/bigquery.metadataViewer" \
  PROJECT_ID:DATASET_NAME

# For the dataset where dynamic datasets will be deployed
bq add-iam-policy-binding \
  --member="serviceAccount:SERVICE_ACCOUNT_EMAIL" \
  --role="roles/bigquery.dataEditor" \
  PROJECT_ID:DATASET_NAME
```

<Tip>
  For production deployments, consider creating a custom role with only the
  specific permissions needed rather than using predefined roles.
</Tip>

### Minimal Permission Set

For a minimal permission set, the service account needs:

* `bigquery.jobs.create` (at project level)
* `bigquery.tables.get`, `bigquery.tables.list`, `bigquery.tables.getData` (on source datasets)
* `bigquery.tables.create`, `bigquery.tables.update`,
  `bigquery.tables.delete` (on deployment dataset)
* `bigquery.datasets.get`, `bigquery.readsessions.create` (on all datasets)

## Tracking Honeydew Queries in Google BigQuery

You can track and monitor queries executed by Honeydew in Google BigQuery using
query labels and the INFORMATION\_SCHEMA views.

### Query Label Format

All Honeydew queries include labels with the following format:

```json theme={null}
{
  "application": "honeydew",
  "workspace": "some_workspace",
  "branch": "branch_name",
  "user": "username_example_com",
  "client": "honeydew_server"
}
```

<Note>
  Google BigQuery labels only support lowercase letters, numbers, hyphens, and
  underscores.
  Special characters like `@` and `.` in email addresses are replaced with
  underscores.
</Note>

The query labels contain:

* **application**: Always set to "honeydew"
* **workspace**: The Honeydew workspace name
* **branch**: The Honeydew workspace branch being used (e.g., "dev", "prod")
* **user**: The Honeydew user identifier (email address with special characters replaced)
* **client**: The client name, usually "honeydew\_server" for server-side operations

### Tracking Methods

You can track Honeydew queries using any of the following approaches:

#### 1. By Service Account

If you're using a dedicated service account for Honeydew integration,
you can filter queries by the service account email in the `INFORMATION_SCHEMA.JOBS` view.

#### 2. By Query Labels (Recommended)

The most comprehensive method is to filter by Honeydew query labels.
This approach allows you to track queries by workspace, branch, or user.

Example query to track Honeydew queries:

```sql theme={null}
SELECT
  job_id,
  creation_time,
  user_email,
  total_bytes_processed,
  total_slot_ms,
  query
FROM
  `PROJECT_ID.region-REGION.INFORMATION_SCHEMA.JOBS`
WHERE
  ARRAY_LENGTH(labels) > 0
  AND EXISTS(
    SELECT 1
    FROM UNNEST(labels) AS label
    WHERE label.key = 'application' AND label.value = 'honeydew'
  )
ORDER BY
  creation_time DESC;
```

#### 3. Filter by Specific Workspace or Branch

To track queries for a specific workspace:

```sql theme={null}
SELECT
  job_id,
  creation_time,
  user_email,
  query
FROM
  `PROJECT_ID.region-REGION.INFORMATION_SCHEMA.JOBS`
WHERE
  EXISTS(
    SELECT 1
    FROM UNNEST(labels) AS label
    WHERE label.key = 'workspace' AND label.value = 'your_workspace_name'
  )
ORDER BY
  creation_time DESC;
```

<Tip>
  Replace `PROJECT_ID` and `REGION` in the queries above with your actual
  project ID and Google BigQuery region (e.g., `us`, `eu`).
</Tip>
