SchemaX Quickstart Guide

Complete guide to get started with SchemaX — both the VS Code extension and Python SDK/CLI. See Prerequisites for what you need before starting.

Simple path (5 steps)

Open a project — Open a folder in VS Code (or Extension Development Host) that will hold your SchemaX project.
Add a table — Open the designer (SchemaX: Open Designer), add a catalog and schema if needed, then add a table and columns (see Your first schema below).
Generate SQL — Use SchemaX: Generate SQL Migration to export DDL to .schemax/migrations/ (or run schemax sql in the project directory).
Apply (or run in pipeline) — Run schemax apply --target <env> to execute the SQL against Databricks, or run the same in CI/CD (see Git and CI/CD setup).
Create a snapshot (optional) — Use SchemaX: Create Snapshot or schemax snapshot create to version your state.

SchemaX can run in full mode (create and manage catalogs, schemas, tables, and governance) or governance-only mode (comments, tags, grants, row filters, column masks on existing objects). See Environments and deployment scope for how to configure this.

Providers
VS Code Extension
Python SDK & CLI
Your First Schema
Generating SQL
CI/CD Integration
Troubleshooting

Providers

SchemaX uses a provider-based architecture to support different data catalog systems.

Supported Providers

Provider	Status	When to Use
Unity Catalog	✅ Available (v1.0)	Databricks Unity Catalog projects
Hive Metastore	🔜 Coming Q1 2026	Apache Hive / legacy Databricks
PostgreSQL	🔜 Coming Q1 2026	PostgreSQL with Lakebase extensions

Default Provider

Unity Catalog is the default provider for all new projects. When you create a new SchemaX project (by opening the designer for the first time), it automatically initializes with Unity Catalog.

Provider Selection (Future)

In v0.3.0+, you'll be able to select a provider when creating a new project:

# CLI (future)
schemax init --provider unity      # Unity Catalog (default)
schemax init --provider hive       # Hive Metastore
schemax init --provider postgres   # PostgreSQL

# For now, all projects use Unity Catalog
schemax init

For this quickstart, we'll use Unity Catalog (the current provider).

VS Code Extension

Installation & Launch

If you installed the extension from the VS Code Marketplace or a .vsix file:

Open VS Code
File → Open Folder — open or create your project folder (e.g., ~/my-schema-project)
Press Cmd+Shift+P → SchemaX: Open Designer

That's it — the extension is already loaded.

If you're running SchemaX from source (contributing to SchemaX itself):

cd /path/to/schemax-vscode
code .

Press F5 (or Fn+F5 on Mac) to launch the Extension Development Host, then in that new window open your project folder.

Using the Designer

Step 1: Launch SchemaX Designer

Press Cmd+Shift+P (Mac) or Ctrl+Shift+P (Windows/Linux)
Type: SchemaX: Open Designer
Press Enter

The visual designer opens!

Step 2: Create Your First Catalog

Click "Add Catalog" button
Enter name: main
Click OK

Step 3: Add a Schema

Select the main catalog in the tree
Click "Add Schema" button
Enter name: sales
Click OK

Step 4: Add a Table

Select the sales schema
Click "Add Table" button
Enter name: customers
Select format: delta
Click OK

Step 5: Add Columns

Select the customers table
Click "Add Column" button
Fill in details:
- Name: customer_id
- Type: BIGINT
- Nullable: No
- Comment: Primary key
Add more columns as needed

Step 6: Add a View (Optional)

Select the sales schema
Click "+" button → Choose "View"

Enter SQL definition:

SELECT customer_id, COUNT(*) as order_count
FROM customers
GROUP BY customer_id

View name will be auto-extracted
Click OK

Note: SchemaX automatically:

Extracts dependencies from your view SQL
Qualifies table references with fully-qualified names (FQN)
Orders SQL generation so tables are created before views (and before materialized views)
Applies the same ordering for materialized views: tables and views are created before MVs
Detects circular dependencies between views

You can edit dependencies manually in the view or materialized view detail panel (Dependencies section → Edit) if you need to fix extraction gaps or force creation order.

Step 7: Create a Snapshot

Press Cmd+Shift+P
Type: SchemaX: Create Snapshot
Enter name: v0.1.0
Enter comment: Initial schema

Your schema is now versioned!

Bulk operations

To apply the same grant or tag to all objects in a catalog or schema, select the catalog or schema in the tree and click Bulk operations in the detail panel. Choose Add grant (principal + privileges) or Add tag, then Apply. See Unity Catalog grants — Bulk grants and tags.

Checking the Files

ls -la .schemax/
cat .schemax/project.json
cat .schemax/changelog.json
ls -la .schemax/snapshots/

Available Commands

SchemaX: Open Designer - Launch visual designer
SchemaX: Create Snapshot - Version your schema
SchemaX: Generate SQL Migration - Export to SQL
SchemaX: Show Last Emitted Changes - View operations

Python SDK & CLI

Installation

pip install schemaxpy

The SDK is published on PyPI. For a development install (contributing to SchemaX itself), see the Development guide.

Verify Installation

schemax --version

CLI Commands

Validate Schema

cd your-project
schemax validate

Output:

Validating project files...
  ✓ project.json (version 4)
  ✓ changelog.json (5 operations)

Project: my_project
  Catalogs: 1
  Schemas: 1
  Tables: 2

✓ Schema files are valid

Generate SQL

# Output to stdout
schemax sql

# Save to file
schemax sql --output migration.sql

# View the file
cat migration.sql

Apply to Databricks

# Preview (dry-run)
schemax apply --target dev --profile my-profile --warehouse-id abc123 --dry-run

# Apply (tracks deployment automatically)
schemax apply --target dev --profile my-profile --warehouse-id abc123

# CI/CD (non-interactive)
schemax apply --target prod --profile my-profile --warehouse-id abc123 --no-interaction

Deployment tracking is built into schemax apply — no separate record step needed.
See Git and CI/CD setup for pipeline examples.

Python API

Create a script to use SchemaX programmatically:

#!/usr/bin/env python3
from pathlib import Path
from schemax.core.storage import load_current_state, read_project
from schemax.providers.base.operations import Operation

# Load schema with provider
workspace = Path.cwd()
state, changelog, provider = load_current_state(workspace)

# Show summary
project = read_project(workspace)
print(f"Project: {project['name']}")
print(f"Provider: {provider.info.name}")
if "catalogs" in state:
    print(f"Catalogs: {len(state['catalogs'])}")
print(f"Pending operations: {len(changelog['ops'])}")

# Generate SQL
if changelog["ops"]:
    operations = [Operation(**op) for op in changelog["ops"]]
    generator = provider.get_sql_generator(state)
    sql = generator.generate_sql(operations)
    Path("migration.sql").write_text(sql)
    print("✓ SQL generated: migration.sql")

Your First Schema

Let's create a complete example from scratch.

Step 1: Create Project Directory

mkdir ~/my-first-schema
cd ~/my-first-schema

Step 2: Open in VS Code

code .

Step 3: Launch SchemaX (in Extension Development Host)

Press F5 in the main VS Code window
In the new window, open the ~/my-first-schema folder
Press Cmd+Shift+P → SchemaX: Open Designer

Step 4: Build Schema

Create Catalog: ecommerce

Create Schema: production

Create Tables:

Table 1: customers

Columns:
- id (BIGINT, NOT NULL, Primary Key)
- email (STRING, NOT NULL, Unique)
- name (STRING, NOT NULL)
- created_at (TIMESTAMP, NOT NULL)
Properties:
- delta.enableChangeDataFeed = true
Constraints:
- PRIMARY KEY (id)

Table 2: orders

Columns:
- id (BIGINT, NOT NULL, Primary Key)
- customer_id (BIGINT, NOT NULL, Foreign Key)
- amount (DECIMAL(10,2), NOT NULL)
- status (STRING, NOT NULL)
- created_at (TIMESTAMP, NOT NULL)
Constraints:
- PRIMARY KEY (id)
- FOREIGN KEY (customer_id) REFERENCES customers(id)

Step 5: Create Snapshot

Press Cmd+Shift+P → SchemaX: Create Snapshot

Name: v1.0.0
Comment: Initial e-commerce schema

Step 6: Verify Files

tree .schemax/

Output:

.schemax/
├── changelog.json
├── project.json
└── snapshots/
    └── v1.0.0.json

Generating SQL

From VS Code

Make some changes (add columns, tables, etc.)
Press Cmd+Shift+P
Type: SchemaX: Generate SQL Migration
Review the SQL file that opens

The SQL is saved to:

.schemax/migrations/migration_YYYY-MM-DD_HH-MM-SS.sql

From CLI

# Generate SQL from changelog
schemax sql --output deploy.sql

# Review
cat deploy.sql

Example Generated SQL

The SQL file (schemax sql) produces idempotent DDL. Columns are included inline in the CREATE TABLE statement — no separate ADD COLUMN step needed. Each statement is separated by a semicolon so it can be run in any SQL tool or applied via schemax apply.

-- Op: op_abc123 (2025-10-13T12:00:00Z)
-- Type: add_catalog
CREATE CATALOG IF NOT EXISTS `ecommerce`;

-- Op: op_def456 (2025-10-13T12:01:00Z)
-- Type: add_schema
CREATE SCHEMA IF NOT EXISTS `ecommerce`.`production`;

-- Op: op_ghi789 (2025-10-13T12:02:00Z) | op_jkl012 (2025-10-13T12:03:00Z) | ...
-- Type: add_table + add_column (batched)
CREATE TABLE IF NOT EXISTS `ecommerce`.`production`.`customers` (
  `id` BIGINT NOT NULL COMMENT 'Primary key',
  `email` STRING NOT NULL,
  `name` STRING NOT NULL,
  `created_at` TIMESTAMP NOT NULL
) USING DELTA;

Apply to Databricks

Use schemax apply to execute the SQL against Databricks — it handles authentication, executes each statement individually, and records the deployment automatically:

schemax apply \
  --target dev \
  --profile my-databricks-profile \
  --warehouse-id <warehouse-id>

See Git and CI/CD setup for pipeline examples, or continue below for a basic GitHub Actions template.

CI/CD Integration

schemax apply with --no-interaction is the recommended way to run SchemaX in pipelines. It handles SQL generation (or accepts a pre-generated file), executes statements one-by-one against Databricks, and records the deployment — no separate deploy step needed.

For detailed pipeline setup including Azure DevOps, see Git and CI/CD setup.

GitHub Actions (quick example)

name: Deploy Schema

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install SchemaX
        run: pip install schemaxpy

      - name: Validate schema
        run: schemax validate

      - name: Apply to dev
        env:
          DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
          DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
        run: |
          schemax apply \
            --target dev \
            --profile DEFAULT \
            --warehouse-id ${{ secrets.WAREHOUSE_ID }} \
            --no-interaction

GitLab CI (quick example)

deploy-dev:
  image: python:3.11
  script:
    - pip install schemaxpy
    - schemax validate
    - schemax apply
        --target dev
        --profile DEFAULT
        --warehouse-id $WAREHOUSE_ID
        --no-interaction
  variables:
    DATABRICKS_HOST: $DATABRICKS_HOST
    DATABRICKS_TOKEN: $DATABRICKS_TOKEN
  only:
    - main

Troubleshooting

VS Code Extension

Problem: Extension commands not appearing

Solution:

Make sure you have the SchemaX extension installed (check the Extensions panel)
Open a folder in VS Code (the extension activates only when a workspace is open)
Check the "SchemaX" output channel (View → Output → SchemaX) for errors

If running from source (development):

Make sure you pressed F5 in the schemax-vscode repo window
Look for "Extension Development Host" window title
Open a folder in the Extension Development Host window

Problem: F5 doesn't work (development only)

Solution:

# Make sure you're in the right directory
cd /path/to/schemax-vscode
code .

# Wait for VS Code to fully load, then press F5 or Run → Start Debugging

Problem: Webview doesn't open

Solution:

Check "SchemaX" output channel (View → Output → SchemaX)
Look for build errors
Rebuild: cd packages/vscode-extension && npm run build

Python CLI

Problem: schemax command not found

Solution:

# Check if installed
pip list | grep schemax

# Reinstall
pip install --upgrade schemaxpy

# Verify
which schemax
schemax --version

Problem: Import errors

Solution:

pip install --upgrade schemaxpy

Problem: Validation fails

Solution:

# Check file structure
ls .schemax/

# Validate JSON
python -m json.tool .schemax/project.json
python -m json.tool .schemax/changelog.json

# Check permissions
ls -la .schemax/

SQL Generation

Problem: No SQL generated

Solution:

Make sure there are operations in the changelog
Check: cat .schemax/changelog.json
Create some changes in the designer first

Problem: SQL has errors

Solution:

Review the generated SQL
Check operation IDs in comments to trace back
Verify table/column names in the visual designer

Next Steps

Explore Examples: Check examples/basic-schema/
Environments & scope: See Environments and deployment scope for governance-only mode and existing catalogs.
Grants: See Unity Catalog grants for managing GRANT/REVOKE on catalogs, schemas, tables, and views.
Read Architecture: See Architecture.
Project Lifecycle & Workflows: See Workflows for single/multi-dev, greenfield/brownfield, and rollback timelines.
Set Up CI/CD: Use templates in examples/github-actions/
Join Community: GitHub Discussions

Quick Reference

VS Code Commands

Command	What It Does
`SchemaX: Open Designer`	Open visual designer
`SchemaX: Create Snapshot`	Version your schema
`SchemaX: Generate SQL Migration`	Export to SQL file
`SchemaX: Show Last Emitted Changes`	View pending operations

CLI Commands

Command	What It Does
`schemax validate`	Check schema files
`schemax sql`	Generate SQL migration file
`schemax apply`	Execute SQL against Databricks (with tracking)
`schemax rollback`	Rollback a deployment
`schemax snapshot create`	Create a versioned snapshot
`schemax diff`	Compare two snapshot versions

File Structure

.schemax/
├── project.json          # Metadata & configuration
├── changelog.json        # Pending operations
├── snapshots/           # Version snapshots
│   └── v*.json
└── migrations/          # Generated SQL
    └── migration_*.sql

You're all set! Start building your schemas! 🚀

Simple path (5 steps)​

Table of Contents​

Providers​

Supported Providers​

Default Provider​

Provider Selection (Future)​

VS Code Extension​

Installation & Launch​

Using the Designer​

Checking the Files​

Available Commands​

Python SDK & CLI​

Installation​

Verify Installation​

CLI Commands​

Validate Schema​

Generate SQL​

Apply to Databricks​

Python API​

Your First Schema​

Step 1: Create Project Directory​

Step 2: Open in VS Code​

Step 3: Launch SchemaX (in Extension Development Host)​

Step 4: Build Schema​

Step 5: Create Snapshot​

Step 6: Verify Files​

Generating SQL​

From VS Code​

From CLI​

Example Generated SQL​

Apply to Databricks​

CI/CD Integration​

GitHub Actions (quick example)​

GitLab CI (quick example)​

Troubleshooting​

VS Code Extension​

Python CLI​

SQL Generation​

Next Steps​

Quick Reference​

VS Code Commands​

CLI Commands​

File Structure​

Simple path (5 steps)

Table of Contents

Providers

Supported Providers

Default Provider

Provider Selection (Future)

VS Code Extension

Installation & Launch

Using the Designer

Checking the Files

Available Commands

Python SDK & CLI

Installation

Verify Installation

CLI Commands

Validate Schema

Generate SQL

Apply to Databricks

Python API

Your First Schema

Step 1: Create Project Directory

Step 2: Open in VS Code

Step 3: Launch SchemaX (in Extension Development Host)

Step 4: Build Schema

Step 5: Create Snapshot

Step 6: Verify Files

Generating SQL

From VS Code

From CLI

Example Generated SQL

Apply to Databricks

CI/CD Integration

GitHub Actions (quick example)

GitLab CI (quick example)

Troubleshooting

VS Code Extension

Python CLI

SQL Generation

Next Steps

Quick Reference

VS Code Commands

CLI Commands

File Structure