Databricks Asset Bundles
Deploy SchemaX schema changes through Databricks Asset Bundles (DABs) for a fully declarative, version-controlled deployment workflow.
What are Databricks Asset Bundles?
Databricks Asset Bundles (DABs) are a way to define Databricks resources -- jobs, pipelines, dashboards, and more -- as YAML configuration files that live alongside your code. You manage them with the databricks bundle CLI, which handles packaging, deploying, and running resources across workspaces and environments (called targets in DAB terminology).
Key properties:
- Resources are declared in YAML and deployed with
databricks bundle deploy. - Targets (
dev,staging,prod) map to different workspaces or configurations. - The CLI syncs local files to the workspace and creates or updates the declared resources.
How SchemaX integrates with DABs
SchemaX provides a schemax bundle command that generates two files:
| File | Purpose |
|---|---|
resources/schemax.yml | DAB resource definition: a serverless Python job that runs schemax apply with the right parameters |
resources/schemax_deploy.py | A Python script executed by the job; it invokes schemax apply in-process using Databricks runtime auth |
These files slot into an existing DAB project. You do not need to write the job definition by hand -- schemax bundle produces it from your SchemaX project configuration.
SchemaX does not replace your databricks.yml. It generates resource files that you include from your existing bundle configuration.
Setup
1. Generate the bundle resources
From your SchemaX project root, run:
schemax bundle
This creates:
resources/
schemax.yml
schemax_deploy.py
2. Include the resource file in your databricks.yml
Add the generated resource file to the include list:
include:
- resources/schemax.yml
3. Configure file sync
The DAB job needs access to your .schemax project files and the deploy script at runtime. Add them to the sync configuration:
sync:
include:
- .schemax/**
- resources/schemax_deploy.py
Without the sync configuration, the job will not have access to your SchemaX project files and will fail at runtime.
4. Set the warehouse_id variable per target
Each target must specify the SQL warehouse ID that SchemaX will use to execute DDL statements:
targets:
dev:
variables:
warehouse_id: "abc123def456"
prod:
variables:
warehouse_id: "xyz789ghi012"
5. Match target names to SchemaX environment names
DAB target names must match your SchemaX environment names. If your SchemaX project defines environments dev, test, and prod, your DAB targets should be named identically:
targets:
dev:
# ...
test:
# ...
prod:
# ...
The generated job uses ${bundle.target} as the SchemaX environment, so the names must align.
Check your SchemaX environment names in .schemax/environments/ and ensure your DAB targets use the same names.
Example databricks.yml
A complete bundle configuration with SchemaX integration:
bundle:
name: my-data-project
include:
- resources/schemax.yml
sync:
include:
- .schemax/**
- resources/schemax_deploy.py
workspace:
host: https://my-workspace.cloud.databricks.com
targets:
dev:
mode: development
default: true
variables:
warehouse_id: "abc123def456"
test:
variables:
warehouse_id: "def456ghi789"
prod:
variables:
warehouse_id: "xyz789ghi012"
Generated resource YAML
For reference, schemax bundle generates a resource file like this:
variables:
warehouse_id:
description: "SQL warehouse ID (required for remote mode, ignored in local mode)"
default: ""
schemax_execution_mode:
description: "Execution mode: local (spark.sql) or remote (SQL warehouse)"
default: "local"
schemax_auto_rollback:
description: "Auto-rollback on failure (true/false)"
default: "false"
resources:
jobs:
schemax_deploy_<project_name>:
name: "schemax-deploy-<project_name>-${bundle.target}"
tags:
managed_by: "schemax"
schemax_project: "<project_name>"
schemax_version: "0.2.11"
environments:
- environment_key: schemax
spec:
environment_version: "2"
dependencies:
- schemaxpy>=0.2.11
tasks:
- task_key: schemax_apply
environment_key: schemax
spark_python_task:
python_file: schemax_deploy.py
parameters:
- "${bundle.target}"
- "${var.schemax_execution_mode}"
- "${var.schemax_auto_rollback}"
- "${var.warehouse_id}"
python_fileis relative to the YAML file's directory (both live inresources/).- Parameters are passed via
sys.argvto the deploy script, not environment variables. - The job includes
tagsfor easy filtering in the Databricks Jobs UI.
The job name includes ${bundle.target} so each target gets a distinct job (e.g., schemax-deploy-my_project-dev).
Deployment workflow
Local deployment
Generate the bundle resources and deploy in one command:
schemax bundle && databricks bundle deploy -t dev
To also run the job immediately after deploying:
schemax bundle && databricks bundle deploy -t dev && databricks bundle run -t dev schemax_deploy_<project_name>
CI/CD deployment
In a pipeline, the pattern is:
- Generate the bundle resources.
- Deploy the bundle to the target workspace.
- Trigger the SchemaX job to apply schema changes.
schemax bundle
databricks bundle deploy -t prod
databricks bundle run -t prod schemax_deploy_<project_name>
How it works under the hood
When the DAB job runs, the following happens:
- File sync --
databricks bundle deployuploads your.schemax/project files andresources/schemax_deploy.pyto the workspace. - Job creation -- The DAB CLI creates (or updates) the job defined in
resources/schemax.yml. - Job execution -- When the job runs (either via
databricks bundle runor a manual trigger), it:- Spins up a serverless Python environment (version 2) with
schemaxpy>=0.2.11installed. - Executes
resources/schemax_deploy.pyas aspark_python_task. - The deploy script locates the
.schemax/project root (one level up fromresources/) and invokesschemax applyin-process to inherit Databricks runtime authentication. - The target environment, warehouse ID, and auto-rollback flag are passed as
parametersviasys.argv. ${bundle.target}(e.g.,dev,prod) maps to the SchemaX environment name, so target names must match.- No
--profileis needed -- the serverless runtime provides authentication automatically.
- Spins up a serverless Python environment (version 2) with
Auto-rollback
To automatically roll back schema changes on failure, set the schemax_auto_rollback variable to "true" on the desired target:
targets:
prod:
variables:
warehouse_id: "xyz789ghi012"
schemax_auto_rollback: "true"
When enabled, if schemax apply encounters an error partway through execution, it will automatically roll back all changes that were applied in the current run.
Auto-rollback is especially useful in production targets where partial schema changes could leave the environment in an inconsistent state.
CI/CD integration
GitHub Actions
Combine schemax bundle with databricks bundle in a GitHub Actions workflow:
name: Deploy Schema Changes
on:
push:
branches: [main]
paths:
- '.schemax/**'
jobs:
deploy-dev:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install Databricks CLI
run: curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
- name: Install SchemaX CLI
run: pip install schemaxpy
- name: Generate bundle resources
run: schemax bundle
- name: Deploy and run
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
run: |
databricks bundle deploy -t dev
databricks bundle run -t dev schemax_deploy_my_project
deploy-prod:
runs-on: ubuntu-latest
needs: deploy-dev
environment: production # Requires manual approval
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install Databricks CLI
run: curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
- name: Install SchemaX CLI
run: pip install schemaxpy
- name: Generate bundle resources
run: schemax bundle
- name: Deploy and run
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_PROD }}
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN_PROD }}
run: |
databricks bundle deploy -t prod
databricks bundle run -t prod schemax_deploy_my_project
The environment: production setting on the deploy-prod job enables GitHub environment protection rules, such as required reviewers, before the production deployment proceeds.
Azure DevOps
trigger:
branches:
include:
- main
paths:
include:
- .schemax/**
pool:
vmImage: ubuntu-latest
stages:
- stage: DeployDev
jobs:
- job: Deploy
steps:
- checkout: self
- task: UsePythonVersion@0
inputs:
versionSpec: '3.11'
- script: |
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
pip install schemaxpy
displayName: Install CLIs
- script: schemax bundle
displayName: Generate bundle resources
- script: |
databricks bundle deploy -t dev
databricks bundle run -t dev schemax_deploy_my_project
displayName: Deploy and run
env:
DATABRICKS_HOST: $(DATABRICKS_HOST)
DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)
Next steps
- Git and CI/CD setup -- General CI/CD guidance and authentication in pipelines
- Authentication -- Databricks SDK, profiles, and environment variables
- Environments and Scope -- Configure dev, test, and prod environments in SchemaX