Databricks Asset Bundles

Deploy SchemaX schema changes through Databricks Asset Bundles (DABs) for a fully declarative, version-controlled deployment workflow.

What are Databricks Asset Bundles?

Databricks Asset Bundles (DABs) are a way to define Databricks resources -- jobs, pipelines, dashboards, and more -- as YAML configuration files that live alongside your code. You manage them with the databricks bundle CLI, which handles packaging, deploying, and running resources across workspaces and environments (called targets in DAB terminology).

Key properties:

Resources are declared in YAML and deployed with databricks bundle deploy.
Targets (dev, staging, prod) map to different workspaces or configurations.
The CLI syncs local files to the workspace and creates or updates the declared resources.

How SchemaX integrates with DABs

SchemaX provides a schemax bundle command that generates two files:

File	Purpose
`resources/schemax.yml`	DAB resource definition: a serverless Python job that runs `schemax apply` with the right parameters
`resources/schemax_deploy.py`	A Python script executed by the job; it invokes `schemax apply` in-process using Databricks runtime auth

These files slot into an existing DAB project. You do not need to write the job definition by hand -- schemax bundle produces it from your SchemaX project configuration.

info

SchemaX does not replace your databricks.yml. It generates resource files that you include from your existing bundle configuration.

Setup

1. Generate the bundle resources

From your SchemaX project root, run:

schemax bundle

This creates:

resources/
  schemax.yml
  schemax_deploy.py

2. Include the resource file in your `databricks.yml`

Add the generated resource file to the include list:

include:
  - resources/schemax.yml

3. Configure file sync

The DAB job needs access to your .schemax project files and the deploy script at runtime. Add them to the sync configuration:

sync:
  include:
    - .schemax/**
    - resources/schemax_deploy.py

warning

Without the sync configuration, the job will not have access to your SchemaX project files and will fail at runtime.

4. Set the `warehouse_id` variable per target

Each target must specify the SQL warehouse ID that SchemaX will use to execute DDL statements:

targets:
  dev:
    variables:
      warehouse_id: "abc123def456"
  prod:
    variables:
      warehouse_id: "xyz789ghi012"

5. Match target names to SchemaX environment names

DAB target names must match your SchemaX environment names. If your SchemaX project defines environments dev, test, and prod, your DAB targets should be named identically:

targets:
  dev:
    # ...
  test:
    # ...
  prod:
    # ...

The generated job uses ${bundle.target} as the SchemaX environment, so the names must align.

tip

Check your SchemaX environment names in .schemax/environments/ and ensure your DAB targets use the same names.

Example `databricks.yml`

A complete bundle configuration with SchemaX integration:

bundle:
  name: my-data-project

include:
  - resources/schemax.yml

sync:
  include:
    - .schemax/**
    - resources/schemax_deploy.py

workspace:
  host: https://my-workspace.cloud.databricks.com

targets:
  dev:
    mode: development
    default: true
    variables:
      warehouse_id: "abc123def456"

  test:
    variables:
      warehouse_id: "def456ghi789"

  prod:
    variables:
      warehouse_id: "xyz789ghi012"

Generated resource YAML

For reference, schemax bundle generates a resource file like this:

variables:
  warehouse_id:
    description: "SQL warehouse ID (required for remote mode, ignored in local mode)"
    default: ""
  schemax_execution_mode:
    description: "Execution mode: local (spark.sql) or remote (SQL warehouse)"
    default: "local"
  schemax_auto_rollback:
    description: "Auto-rollback on failure (true/false)"
    default: "false"

resources:
  jobs:
    schemax_deploy_<project_name>:
      name: "schemax-deploy-<project_name>-${bundle.target}"
      tags:
        managed_by: "schemax"
        schemax_project: "<project_name>"
        schemax_version: "0.2.11"
      environments:
        - environment_key: schemax
          spec:
            environment_version: "2"
            dependencies:
              - schemaxpy>=0.2.11
      tasks:
        - task_key: schemax_apply
          environment_key: schemax
          spark_python_task:
            python_file: schemax_deploy.py
            parameters:
              - "${bundle.target}"
              - "${var.schemax_execution_mode}"
              - "${var.schemax_auto_rollback}"
              - "${var.warehouse_id}"

note

python_file is relative to the YAML file's directory (both live in resources/).
Parameters are passed via sys.argv to the deploy script, not environment variables.
The job includes tags for easy filtering in the Databricks Jobs UI.

The job name includes ${bundle.target} so each target gets a distinct job (e.g., schemax-deploy-my_project-dev).

Deployment workflow

Local deployment

Generate the bundle resources and deploy in one command:

schemax bundle && databricks bundle deploy -t dev

To also run the job immediately after deploying:

schemax bundle && databricks bundle deploy -t dev && databricks bundle run -t dev schemax_deploy_<project_name>

CI/CD deployment

In a pipeline, the pattern is:

Generate the bundle resources.
Deploy the bundle to the target workspace.
Trigger the SchemaX job to apply schema changes.

schemax bundle
databricks bundle deploy -t prod
databricks bundle run -t prod schemax_deploy_<project_name>

How it works under the hood

When the DAB job runs, the following happens:

File sync -- databricks bundle deploy uploads your .schemax/ project files and resources/schemax_deploy.py to the workspace.
Job creation -- The DAB CLI creates (or updates) the job defined in resources/schemax.yml.
Job execution -- When the job runs (either via databricks bundle run or a manual trigger), it:
- Spins up a serverless Python environment (version 2) with schemaxpy>=0.2.11 installed.
- Executes resources/schemax_deploy.py as a spark_python_task.
- The deploy script locates the .schemax/ project root (one level up from resources/) and invokes schemax apply in-process to inherit Databricks runtime authentication.
- The target environment, warehouse ID, and auto-rollback flag are passed as parameters via sys.argv.
- ${bundle.target} (e.g., dev, prod) maps to the SchemaX environment name, so target names must match.
- No --profile is needed -- the serverless runtime provides authentication automatically.

Auto-rollback

To automatically roll back schema changes on failure, set the schemax_auto_rollback variable to "true" on the desired target:

targets:
  prod:
    variables:
      warehouse_id: "xyz789ghi012"
      schemax_auto_rollback: "true"

When enabled, if schemax apply encounters an error partway through execution, it will automatically roll back all changes that were applied in the current run.

tip

Auto-rollback is especially useful in production targets where partial schema changes could leave the environment in an inconsistent state.

CI/CD integration

GitHub Actions

Combine schemax bundle with databricks bundle in a GitHub Actions workflow:

name: Deploy Schema Changes
on:
  push:
    branches: [main]
    paths:
      - '.schemax/**'

jobs:
  deploy-dev:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install Databricks CLI
        run: curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

      - name: Install SchemaX CLI
        run: pip install schemaxpy

      - name: Generate bundle resources
        run: schemax bundle

      - name: Deploy and run
        env:
          DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
          DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
        run: |
          databricks bundle deploy -t dev
          databricks bundle run -t dev schemax_deploy_my_project

  deploy-prod:
    runs-on: ubuntu-latest
    needs: deploy-dev
    environment: production  # Requires manual approval
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install Databricks CLI
        run: curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

      - name: Install SchemaX CLI
        run: pip install schemaxpy

      - name: Generate bundle resources
        run: schemax bundle

      - name: Deploy and run
        env:
          DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_PROD }}
          DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN_PROD }}
        run: |
          databricks bundle deploy -t prod
          databricks bundle run -t prod schemax_deploy_my_project

info

The environment: production setting on the deploy-prod job enables GitHub environment protection rules, such as required reviewers, before the production deployment proceeds.

Azure DevOps

trigger:
  branches:
    include:
      - main
  paths:
    include:
      - .schemax/**

pool:
  vmImage: ubuntu-latest

stages:
  - stage: DeployDev
    jobs:
      - job: Deploy
        steps:
          - checkout: self
          - task: UsePythonVersion@0
            inputs:
              versionSpec: '3.11'
          - script: |
              curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
              pip install schemaxpy
            displayName: Install CLIs
          - script: schemax bundle
            displayName: Generate bundle resources
          - script: |
              databricks bundle deploy -t dev
              databricks bundle run -t dev schemax_deploy_my_project
            displayName: Deploy and run
            env:
              DATABRICKS_HOST: $(DATABRICKS_HOST)
              DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)

Next steps

Git and CI/CD setup -- General CI/CD guidance and authentication in pipelines
Authentication -- Databricks SDK, profiles, and environment variables
Environments and Scope -- Configure dev, test, and prod environments in SchemaX

What are Databricks Asset Bundles?​

How SchemaX integrates with DABs​

Setup​

1. Generate the bundle resources​

2. Include the resource file in your databricks.yml​

3. Configure file sync​

4. Set the warehouse_id variable per target​

5. Match target names to SchemaX environment names​

Example databricks.yml​

Generated resource YAML​

Deployment workflow​

Local deployment​

CI/CD deployment​

How it works under the hood​

Auto-rollback​

CI/CD integration​

GitHub Actions​

Azure DevOps​

Next steps​