SchemaX Quickstart Guide
Complete guide to get started with SchemaX — both the VS Code extension and Python SDK/CLI. See Prerequisites for what you need before starting.
Simple path (5 steps)
- Open a project — Open a folder in VS Code (or Extension Development Host) that will hold your SchemaX project.
- Add a table — Open the designer (SchemaX: Open Designer), add a catalog and schema if needed, then add a table and columns (see Your first schema below).
- Generate SQL — Use SchemaX: Generate SQL Migration to export DDL to
.schemax/migrations/(or runschemax sqlin the project directory). - Apply (or run in pipeline) — Run
schemax apply --target <env>to execute the SQL against Databricks, or run the same in CI/CD (see Git and CI/CD setup). - Create a snapshot (optional) — Use SchemaX: Create Snapshot or
schemax snapshot createto version your state.
SchemaX can run in full mode (create and manage catalogs, schemas, tables, and governance) or governance-only mode (comments, tags, grants, row filters, column masks on existing objects). See Environments and deployment scope for how to configure this.
Table of Contents
- Providers
- VS Code Extension
- Python SDK & CLI
- Your First Schema
- Generating SQL
- CI/CD Integration
- Troubleshooting
Providers
SchemaX uses a provider-based architecture to support different data catalog systems.
Supported Providers
| Provider | Status | When to Use |
|---|---|---|
| Unity Catalog | ✅ Available (v1.0) | Databricks Unity Catalog projects |
| Hive Metastore | 🔜 Coming Q1 2026 | Apache Hive / legacy Databricks |
| PostgreSQL | 🔜 Coming Q1 2026 | PostgreSQL with Lakebase extensions |
Default Provider
Unity Catalog is the default provider for all new projects. When you create a new SchemaX project (by opening the designer for the first time), it automatically initializes with Unity Catalog.
Provider Selection (Future)
In v0.3.0+, you'll be able to select a provider when creating a new project:
# CLI (future)
schemax init --provider unity # Unity Catalog (default)
schemax init --provider hive # Hive Metastore
schemax init --provider postgres # PostgreSQL
# For now, all projects use Unity Catalog
schemax init
For this quickstart, we'll use Unity Catalog (the current provider).
VS Code Extension
Installation & Launch
If you installed the extension from the VS Code Marketplace or a .vsix file:
- Open VS Code
- File → Open Folder — open or create your project folder (e.g.,
~/my-schema-project) - Press
Cmd+Shift+P→ SchemaX: Open Designer
That's it — the extension is already loaded.
If you're running SchemaX from source (contributing to SchemaX itself):
cd /path/to/schemax-vscode
code .
Press F5 (or Fn+F5 on Mac) to launch the Extension Development Host, then in that new window open your project folder.
Using the Designer
Step 1: Launch SchemaX Designer
- Press
Cmd+Shift+P(Mac) orCtrl+Shift+P(Windows/Linux) - Type: SchemaX: Open Designer
- Press Enter
The visual designer opens!
Step 2: Create Your First Catalog
- Click "Add Catalog" button
- Enter name:
main - Click OK
Step 3: Add a Schema
- Select the
maincatalog in the tree - Click "Add Schema" button
- Enter name:
sales - Click OK
Step 4: Add a Table
- Select the
salesschema - Click "Add Table" button
- Enter name:
customers - Select format:
delta - Click OK
Step 5: Add Columns
- Select the
customerstable - Click "Add Column" button
- Fill in details:
- Name:
customer_id - Type:
BIGINT - Nullable: No
- Comment:
Primary key
- Name:
- Add more columns as needed
Step 6: Add a View (Optional)
- Select the
salesschema - Click "+" button → Choose "View"
- Enter SQL definition:
SELECT customer_id, COUNT(*) as order_count
FROM customers
GROUP BY customer_id - View name will be auto-extracted
- Click OK
Note: SchemaX automatically:
- Extracts dependencies from your view SQL
- Qualifies table references with fully-qualified names (FQN)
- Orders SQL generation so tables are created before views (and before materialized views)
- Applies the same ordering for materialized views: tables and views are created before MVs
- Detects circular dependencies between views
You can edit dependencies manually in the view or materialized view detail panel (Dependencies section → Edit) if you need to fix extraction gaps or force creation order.
Step 7: Create a Snapshot
- Press
Cmd+Shift+P - Type: SchemaX: Create Snapshot
- Enter name:
v0.1.0 - Enter comment:
Initial schema
Your schema is now versioned!
To apply the same grant or tag to all objects in a catalog or schema, select the catalog or schema in the tree and click Bulk operations in the detail panel. Choose Add grant (principal + privileges) or Add tag, then Apply. See Unity Catalog grants — Bulk grants and tags.
Checking the Files
ls -la .schemax/
cat .schemax/project.json
cat .schemax/changelog.json
ls -la .schemax/snapshots/
Available Commands
SchemaX: Open Designer- Launch visual designerSchemaX: Create Snapshot- Version your schemaSchemaX: Generate SQL Migration- Export to SQLSchemaX: Show Last Emitted Changes- View operations
Python SDK & CLI
Installation
pip install schemaxpy
The SDK is published on PyPI. For a development install (contributing to SchemaX itself), see the Development guide.
Verify Installation
schemax --version
CLI Commands
Validate Schema
cd your-project
schemax validate
Output:
Validating project files...
✓ project.json (version 4)
✓ changelog.json (5 operations)
Project: my_project
Catalogs: 1
Schemas: 1
Tables: 2
✓ Schema files are valid
Generate SQL
# Output to stdout
schemax sql
# Save to file
schemax sql --output migration.sql
# View the file
cat migration.sql
Apply to Databricks
# Preview (dry-run)
schemax apply --target dev --profile my-profile --warehouse-id abc123 --dry-run
# Apply (tracks deployment automatically)
schemax apply --target dev --profile my-profile --warehouse-id abc123
# CI/CD (non-interactive)
schemax apply --target prod --profile my-profile --warehouse-id abc123 --no-interaction
Deployment tracking is built into schemax apply — no separate record step needed.
See Git and CI/CD setup for pipeline examples.
Python API
Create a script to use SchemaX programmatically:
#!/usr/bin/env python3
from pathlib import Path
from schemax.core.storage import load_current_state, read_project
from schemax.providers.base.operations import Operation
# Load schema with provider
workspace = Path.cwd()
state, changelog, provider = load_current_state(workspace)
# Show summary
project = read_project(workspace)
print(f"Project: {project['name']}")
print(f"Provider: {provider.info.name}")
if "catalogs" in state:
print(f"Catalogs: {len(state['catalogs'])}")
print(f"Pending operations: {len(changelog['ops'])}")
# Generate SQL
if changelog["ops"]:
operations = [Operation(**op) for op in changelog["ops"]]
generator = provider.get_sql_generator(state)
sql = generator.generate_sql(operations)
Path("migration.sql").write_text(sql)
print("✓ SQL generated: migration.sql")
Your First Schema
Let's create a complete example from scratch.
Step 1: Create Project Directory
mkdir ~/my-first-schema
cd ~/my-first-schema
Step 2: Open in VS Code
code .
Step 3: Launch SchemaX (in Extension Development Host)
- Press F5 in the main VS Code window
- In the new window, open the
~/my-first-schemafolder - Press
Cmd+Shift+P→ SchemaX: Open Designer
Step 4: Build Schema
Create Catalog: ecommerce
Create Schema: production
Create Tables:
Table 1: customers
- Columns:
id(BIGINT, NOT NULL, Primary Key)email(STRING, NOT NULL, Unique)name(STRING, NOT NULL)created_at(TIMESTAMP, NOT NULL)
- Properties:
delta.enableChangeDataFeed=true
- Constraints:
- PRIMARY KEY (
id)
- PRIMARY KEY (
Table 2: orders
- Columns:
id(BIGINT, NOT NULL, Primary Key)customer_id(BIGINT, NOT NULL, Foreign Key)amount(DECIMAL(10,2), NOT NULL)status(STRING, NOT NULL)created_at(TIMESTAMP, NOT NULL)
- Constraints:
- PRIMARY KEY (
id) - FOREIGN KEY (
customer_id) REFERENCEScustomers(id)
- PRIMARY KEY (
Step 5: Create Snapshot
Press Cmd+Shift+P → SchemaX: Create Snapshot
- Name:
v1.0.0 - Comment:
Initial e-commerce schema
Step 6: Verify Files
tree .schemax/
Output:
.schemax/
├── changelog.json
├── project.json
└── snapshots/
└── v1.0.0.json
Generating SQL
From VS Code
- Make some changes (add columns, tables, etc.)
- Press
Cmd+Shift+P - Type: SchemaX: Generate SQL Migration
- Review the SQL file that opens
The SQL is saved to:
.schemax/migrations/migration_YYYY-MM-DD_HH-MM-SS.sql
From CLI
# Generate SQL from changelog
schemax sql --output deploy.sql
# Review
cat deploy.sql
Example Generated SQL
The SQL file (schemax sql) produces idempotent DDL. Columns are included inline in the CREATE TABLE statement — no separate ADD COLUMN step needed. Each statement is separated by a semicolon so it can be run in any SQL tool or applied via schemax apply.
-- Op: op_abc123 (2025-10-13T12:00:00Z)
-- Type: add_catalog
CREATE CATALOG IF NOT EXISTS `ecommerce`;
-- Op: op_def456 (2025-10-13T12:01:00Z)
-- Type: add_schema
CREATE SCHEMA IF NOT EXISTS `ecommerce`.`production`;
-- Op: op_ghi789 (2025-10-13T12:02:00Z) | op_jkl012 (2025-10-13T12:03:00Z) | ...
-- Type: add_table + add_column (batched)
CREATE TABLE IF NOT EXISTS `ecommerce`.`production`.`customers` (
`id` BIGINT NOT NULL COMMENT 'Primary key',
`email` STRING NOT NULL,
`name` STRING NOT NULL,
`created_at` TIMESTAMP NOT NULL
) USING DELTA;
Apply to Databricks
Use schemax apply to execute the SQL against Databricks — it handles authentication, executes each statement individually, and records the deployment automatically:
schemax apply \
--target dev \
--profile my-databricks-profile \
--warehouse-id <warehouse-id>
See Git and CI/CD setup for pipeline examples, or continue below for a basic GitHub Actions template.
CI/CD Integration
schemax apply with --no-interaction is the recommended way to run SchemaX in pipelines. It handles SQL generation (or accepts a pre-generated file), executes statements one-by-one against Databricks, and records the deployment — no separate deploy step needed.
For detailed pipeline setup including Azure DevOps, see Git and CI/CD setup.
GitHub Actions (quick example)
name: Deploy Schema
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install SchemaX
run: pip install schemaxpy
- name: Validate schema
run: schemax validate
- name: Apply to dev
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
run: |
schemax apply \
--target dev \
--profile DEFAULT \
--warehouse-id ${{ secrets.WAREHOUSE_ID }} \
--no-interaction
GitLab CI (quick example)
deploy-dev:
image: python:3.11
script:
- pip install schemaxpy
- schemax validate
- schemax apply
--target dev
--profile DEFAULT
--warehouse-id $WAREHOUSE_ID
--no-interaction
variables:
DATABRICKS_HOST: $DATABRICKS_HOST
DATABRICKS_TOKEN: $DATABRICKS_TOKEN
only:
- main
Troubleshooting
VS Code Extension
Problem: Extension commands not appearing
Solution:
- Make sure you have the SchemaX extension installed (check the Extensions panel)
- Open a folder in VS Code (the extension activates only when a workspace is open)
- Check the "SchemaX" output channel (View → Output → SchemaX) for errors
If running from source (development):
- Make sure you pressed F5 in the
schemax-vscoderepo window - Look for "Extension Development Host" window title
- Open a folder in the Extension Development Host window
Problem: F5 doesn't work (development only)
Solution:
# Make sure you're in the right directory
cd /path/to/schemax-vscode
code .
# Wait for VS Code to fully load, then press F5 or Run → Start Debugging
Problem: Webview doesn't open
Solution:
- Check "SchemaX" output channel (View → Output → SchemaX)
- Look for build errors
- Rebuild:
cd packages/vscode-extension && npm run build
Python CLI
Problem: schemax command not found
Solution:
# Check if installed
pip list | grep schemax
# Reinstall
pip install --upgrade schemaxpy
# Verify
which schemax
schemax --version
Problem: Import errors
Solution:
pip install --upgrade schemaxpy
Problem: Validation fails
Solution:
# Check file structure
ls .schemax/
# Validate JSON
python -m json.tool .schemax/project.json
python -m json.tool .schemax/changelog.json
# Check permissions
ls -la .schemax/
SQL Generation
Problem: No SQL generated
Solution:
- Make sure there are operations in the changelog
- Check:
cat .schemax/changelog.json - Create some changes in the designer first
Problem: SQL has errors
Solution:
- Review the generated SQL
- Check operation IDs in comments to trace back
- Verify table/column names in the visual designer
Next Steps
- Explore Examples: Check
examples/basic-schema/ - Environments & scope: See Environments and deployment scope for governance-only mode and existing catalogs.
- Grants: See Unity Catalog grants for managing GRANT/REVOKE on catalogs, schemas, tables, and views.
- Read Architecture: See Architecture.
- Project Lifecycle & Workflows: See Workflows for single/multi-dev, greenfield/brownfield, and rollback timelines.
- Set Up CI/CD: Use templates in
examples/github-actions/ - Join Community: GitHub Discussions
Quick Reference
VS Code Commands
| Command | What It Does |
|---|---|
SchemaX: Open Designer | Open visual designer |
SchemaX: Create Snapshot | Version your schema |
SchemaX: Generate SQL Migration | Export to SQL file |
SchemaX: Show Last Emitted Changes | View pending operations |
CLI Commands
| Command | What It Does |
|---|---|
schemax validate | Check schema files |
schemax sql | Generate SQL migration file |
schemax apply | Execute SQL against Databricks (with tracking) |
schemax rollback | Rollback a deployment |
schemax snapshot create | Create a versioned snapshot |
schemax diff | Compare two snapshot versions |
File Structure
.schemax/
├── project.json # Metadata & configuration
├── changelog.json # Pending operations
├── snapshots/ # Version snapshots
│ └── v*.json
└── migrations/ # Generated SQL
└── migration_*.sql
You're all set! Start building your schemas! 🚀