Architecture

This document describes the technical architecture and design decisions behind SchemaX.

Overview

SchemaX implements a provider-based, snapshot-driven schema versioning system. The core principle is to maintain an append-only operation log with periodic snapshots, enabling both state-based and change-based workflows across multiple catalog types (Unity Catalog, Hive Metastore, PostgreSQL, etc.).

Design Goals

Git-Friendly: Store schema definitions in human-readable JSON that produces clean diffs
Reproducible: Replay operations from a snapshot to reconstruct current state
Performant: Fast loading even with hundreds of operations
Auditable: Complete history of who changed what and when
Migration-Ready: Operations can be converted to SQL migration scripts
Extensible: Easy to add support for new catalog types via providers
Multi-Provider: Support multiple catalog systems with unified interface

Architectural Patterns

SchemaX follows several well-established architectural patterns that work together to provide a robust, maintainable, and extensible system.

Primary Pattern: Event Sourcing

The foundation of SchemaX is Event Sourcing - all changes are stored as immutable events (operations) in an append-only log.

Implementation:

// Operations are immutable events
interface Operation {
  id: string;
  ts: string;
  provider: string;
  op: string;
  target: string;
  payload: Record<string, any>;
}

// Current state = replay all operations from a snapshot
state = loadSnapshot(latestSnapshot);
for (const operation of changelog.ops) {
  state = provider.applyOperation(state, operation);
}

Key Characteristics:

✅ Append-only log (changelog.json)
✅ Operations never modified or deleted
✅ Complete audit trail
✅ Time-travel capability via snapshots
✅ State is derived, not stored directly

Benefits:

Full history of all changes
Can reconstruct state at any point
Easy debugging ("what happened?")
Enables undo/redo capabilities

Snapshot + Delta Pattern

Optimization of Event Sourcing to prevent unbounded operation log growth.

Implementation:

State at v0.3.0 = 
  load_snapshot("v0.2.0") + 
  apply_operations(changelog.ops)

.schemax/
├── snapshots/v0.2.0.json    ← Full state checkpoint
└── changelog.json            ← Only ops since v0.2.0

Benefits:

Fast state loading (no need to replay 1000s of operations)
Bounded memory usage
Clean separation of committed vs uncommitted changes

Plugin Architecture (Provider System)

Extensibility through providers - catalog-specific implementations plugged into a common interface.

Implementation:

// Base contract
interface Provider {
  info: ProviderInfo;
  capabilities: ProviderCapabilities;
  applyOperation(state: ProviderState, op: Operation): ProviderState;
  getSQLGenerator(state: ProviderState): SQLGenerator;
  validateOperation(op: Operation): ValidationResult;
}

// Implementations
class UnityProvider implements Provider { ... }
class HiveProvider implements Provider { ... }
class PostgresProvider implements Provider { ... }

// Registry
ProviderRegistry.register(unityProvider);

Key Characteristics:

✅ Open/Closed Principle (open for extension, closed for modification)
✅ Each provider is isolated and independent
✅ Core system doesn't know provider details
✅ New providers added without changing core

Strategy Pattern (Provider Operations)

Different algorithms (SQL generation, state reduction) selected based on provider.

Implementation:

// Context uses provider to select strategy
function generateSQL(ops: Operation[], project: Project) {
  const provider = ProviderRegistry.get(project.provider.type);
  const generator = provider.getSQLGenerator(state);
  return generator.generateSQL(ops);
}

// Concrete strategies
class UnitySQLGenerator extends SQLGenerator {
  generateSQL(ops: Operation[]): string {
    // Unity Catalog-specific SQL
  }
}

class HiveSQLGenerator extends SQLGenerator {
  generateSQL(ops: Operation[]): string {
    // Hive Metastore-specific SQL
  }
}

Benefits:

Swappable implementations
Each strategy optimized for its system
Clean separation of concerns

State Reducer Pattern (Redux-inspired)

Immutable state transformations through pure functions.

Implementation:

function applyOperation(state: ProviderState, operation: Operation): ProviderState {
  // Pure function: state + operation → new_state
  const newState = deepClone(state);
  
  switch (operation.op) {
    case 'unity.add_catalog':
      newState.catalogs.push(createCatalog(operation.payload));
      break;
    case 'unity.add_table':
      const schema = findSchema(newState, operation.payload.schemaId);
      schema.tables.push(createTable(operation.payload));
      break;
  }
  
  return newState; // Never mutate input
}

Key Principles:

✅ Pure functions (no side effects)
✅ Immutable state
✅ Predictable transformations
✅ Easy to test
✅ Time-travel debugging

Redux Comparison:

// Redux
newState = reducer(state, action)

// SchemaX
newState = provider.applyOperation(state, operation)

Registry Pattern (Provider Lookup)

Central registry for service discovery and dependency injection.

Implementation:

class ProviderRegistryClass {
  private providers = new Map<string, Provider>();
  
  register(provider: Provider): void {
    this.providers.set(provider.info.id, provider);
  }
  
  get(providerId: string): Provider | undefined {
    return this.providers.get(providerId);
  }
}

// Singleton
export const ProviderRegistry = new ProviderRegistryClass();

// Auto-registration on import
ProviderRegistry.register(unityProvider);

Benefits:

Service discovery
Loose coupling
Easy testing (swap implementations)

Repository Pattern (Storage Layer)

Abstraction over file system operations.

Implementation:

// storage_v3.ts/py acts as repository
class StorageRepository {
  readProject(workspacePath: Path): ProjectFile;
  writeProject(workspacePath: Path, project: ProjectFile): void;
  readChangelog(workspacePath: Path): ChangelogFile;
  writeChangelog(workspacePath: Path, changelog: ChangelogFile): void;
  readSnapshot(workspacePath: Path, version: string): SnapshotFile;
  writeSnapshot(workspacePath: Path, snapshot: SnapshotFile): void;
}

// Usage
const project = storage.readProject(workspace);
// Don't care if it's JSON, SQLite, or remote API

Benefits:

Data access abstraction
Easy to swap storage backend
Testability (mock the repository)

Command Pattern (Operations)

Operations as command objects that encapsulate requests.

Implementation:

// Command = Operation
interface Operation {
  id: string;        // Command ID
  op: string;        // Command name
  target: string;    // Receiver
  payload: object;   // Parameters
  ts: string;        // Timestamp
}

// Command execution
function execute(state: State, command: Operation): State {
  return applyOperation(state, command);
}

Characteristics:

✅ Encapsulates request as object
✅ Supports queuing and logging
✅ Can be serialized
✅ Enables undo (store reverse operations)

Adapter Pattern (Python ↔ TypeScript)

Translating between language ecosystems while maintaining compatibility.

Implementation:

# Python (Pydantic) - accepts both camelCase and snake_case
class Column(BaseModel):
    id: str
    name: str
    mask_id: Optional[str] = Field(None, alias="maskId")
    
    class Config:
        populate_by_name = True  # Accept both maskId and mask_id

// TypeScript (Zod) - uses camelCase
const Column = z.object({
  id: z.string(),
  name: z.string(),
  maskId: z.string().optional(),
});

Same JSON works in both:

{"id": "col_1", "name": "email", "maskId": "mask_1"}

Benefits:

Seamless interoperability
Single source of truth (JSON files)
No code sharing required

Façade Pattern (CLI)

Simplified interface to complex subsystems.

Implementation:

# cli.py provides simple interface hiding complexity
@cli.command()
def sql(workspace: str):
    # Hides complexity of:
    # - File system operations
    # - Provider lookup
    # - State reconstruction
    # - SQL generation
    state, changelog, provider = load_current_state(Path(workspace))
    generator = provider.get_sql_generator(state)
    sql = generator.generate_sql(changelog.ops)
    console.print(sql)

Benefits:

Simple API for complex operations
Easy to use
Decouples CLI from internal complexity

Pattern Interaction Diagram

┌─────────────────────────────────────────────────────────────────┐
│                    CLI / Extension (Façade)                     │
└────────────┬────────────────────────────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────────────────────────────┐
│               Storage Repository (Repository Pattern)           │
│                 Reads/writes .schemax/ files                    │
└────────────┬────────────────────────────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────────────────────────────┐
│            Provider Registry (Registry + Strategy)              │
│             ProviderRegistry.get(provider_id)                   │
└────────────┬────────────────────────────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────────────────────────────┐
│              Provider Instance (Plugin Architecture)            │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐         │
│  │   Unity      │  │    Hive      │  │  PostgreSQL  │         │
│  │   Provider   │  │   Provider   │  │   Provider   │         │
│  └──────────────┘  └──────────────┘  └──────────────┘         │
└────────────┬────────────────────────────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────────────────────────────┐
│              State Reducer (State Reducer Pattern)              │
│         state' = applyOperation(state, operation)               │
│              (Immutable transformations)                        │
│         Based on Event Sourcing + Command Pattern               │
└─────────────────────────────────────────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────────────────────────────┐
│               SQL Generator (Strategy Pattern)                  │
│           sql = generator.generateSQL(operations)               │
└─────────────────────────────────────────────────────────────────┘

Architectural Style: Functional Core, Imperative Shell

SchemaX follows the Functional Core, Imperative Shell pattern:

Functional Core (Pure Logic):

State reducers (pure functions)
Operation validation
SQL generation algorithms
State transformations

Imperative Shell (Side Effects):

File I/O (storage layer)
CLI output
VS Code webview communication
Extension activation

Benefits:

Easy to test (core is pure functions)
Easy to reason about (no hidden state)
Side effects isolated at boundaries

Design Patterns Summary

Pattern	Where Used	Purpose
Event Sourcing	`changelog.json`, operations	Core architectural foundation
Snapshot + Delta	`snapshots/`, `changelog.json`	Performance optimization
Plugin Architecture	`providers/` system	Extensibility for new catalog types
Strategy	Provider implementations	Swappable algorithms
State Reducer	`state_reducer.ts/py`	Immutable state updates
Registry	`ProviderRegistry`	Service discovery
Repository	`storage_v3.ts/py`	Data access abstraction
Command	`Operation` objects	Operation encapsulation
Adapter	Pydantic/Zod models	Cross-language compatibility
Façade	`cli.py`, `extension.ts`	Simple interface to complexity

Architectural Principles

1. Separation of Concerns

Storage ≠ Provider ≠ CLI
Each layer has single responsibility

2. Immutability

Operations never change
State transformations create new objects
Snapshots are read-only

3. Idempotency

SQL can be run multiple times safely
Operations produce same result when replayed

4. Extensibility

Add providers without changing core
Plugin-based architecture

5. Type Safety

Zod (TypeScript) and Pydantic (Python)
Runtime validation
IDE autocomplete

6. Testability

Pure functions (state reducers)
Mockable repositories
Isolated providers

Anti-Patterns Avoided

✅ No Mutable Global State - All state is passed explicitly

✅ No Tight Coupling - Providers are independent plugins

✅ No Direct File System Access - Goes through repository layer

✅ No Hardcoded Provider Logic - Uses registry + strategy

✅ No Side Effects in Reducers - Pure functions only

Architecture Inspirations

SchemaX's architecture draws inspiration from:

Redux (State Management)
- Immutable state
- Pure reducers
- Action dispatching → Operations
Git (Version Control)
- Commit log → Operations log
- Branches → Environments
- Tags → Snapshots
Terraform (Infrastructure as Code)
- Desired state → Schema definition
- Plan → SQL preview
- Apply → SQL execution
Liquibase/Flyway (Database Migrations)
- Version-controlled schema changes
- Idempotent migrations
- Rollback support
Event-Driven Architecture
- Events → Operations
- Event store → Changelog
- Projections → Current state

Provider-Based Architecture

What is a Provider?

A Provider is a plugin that adds support for a specific catalog system. Each provider implements a standard interface that SchemaX uses for:

State Management - How objects are stored and modified
Operations - What actions users can perform
SQL Generation - How to convert operations to DDL statements
Validation - What constraints and rules to enforce
UI Metadata - How to display objects in the interface

Provider Registry

class ProviderRegistry {
  private static providers = new Map<string, Provider>();
  
  static register(provider: Provider): void;
  static get(providerId: string): Provider | undefined;
  static getAll(): Provider[];
}

Providers are registered at startup:

// TypeScript
import './providers'; // Auto-registers Unity provider

// Python
from schemax.providers import unity_provider
# Unity provider auto-registered on import

Current Providers

Available:

✅ Unity Catalog (unity) - Databricks Unity Catalog with full governance features

Planned (Stage 2):

🔜 Hive Metastore (hive) - Apache Hive Metastore
🔜 PostgreSQL/Lakebase (postgres) - PostgreSQL with Lakebase extensions

Base Provider Enhancements

Generic SQL Optimization Algorithms

The base provider layer includes reusable optimization algorithms that all providers inherit automatically:

1. ColumnReorderOptimizer - Novel single-column drag detection

Detects when only one column moved in a reorder operation
Generates 1 SQL statement instead of N statements
Time complexity: O(n²) worst case, O(n) average case
Works for any SQL database with column positioning support

2. OperationBatcher - Groups operations by target object

Batches table operations to create complete CREATE TABLE statements
Reduces "empty table + multiple ALTERs" to single CREATE statement
Consolidates property settings and constraints
Provider-agnostic algorithm with provider-specific SQL generation

3. Enhanced BaseSQLGenerator - Template pattern for providers

Abstract methods for provider-specific SQL syntax
Generic utilities (_build_fqn, escape_identifier, escape_string)
Dependency-level ordering (catalog → schema → table)
Optimization components auto-initialized

Benefits for Provider Implementers

When implementing a new provider (e.g., Postgres, Snowflake):

60% Less Code - Generic algorithms already implemented
Automatic Optimizations - Batching and reordering work out-of-the-box
Consistent Behavior - All providers generate high-quality SQL
Faster Time-to-Market - Focus on SQL syntax, not algorithms
Easier Maintenance - Bug fixes in base benefit all providers

Implementation Example

class PostgresSQLGenerator(BaseSQLGenerator):
    """Postgres provider using base optimizations"""
    
    def __init__(self, state, name_mapping=None):
        super().__init__(state, name_mapping)
        # self.batcher and self.optimizer inherited from base
        self.id_name_map = self._build_id_name_map()
    
    # Implement abstract methods
    def _get_target_object_id(self, op):
        """Extract table ID from Postgres operation"""
        if op.op == "postgres.create_table":
            return op.target
        return op.payload.get("table_id")
    
    def _is_create_operation(self, op):
        """Check if operation creates new object"""
        return op.op in ["postgres.create_schema", "postgres.create_table"]
    
    def _generate_batched_create_sql(self, object_id, batch_info):
        """Generate Postgres CREATE TABLE with columns"""
        # Postgres-specific SQL syntax here
        return f"CREATE TABLE {table_name} ({columns}) ..."
    
    # Base handles: batching, reordering, dependency ordering

Code Reduction Metrics

Unity Catalog provider refactoring results:

Before: 928 lines (everything Unity-specific + generic algorithms)
After: 738 lines (only Unity-specific SQL syntax)
Reduction: 190 lines (20% reduction)
Moved to Base: 272 lines of generic code

Future providers will start with this reduced baseline.

File Structure

Version 3 Architecture (Current)

workspace-root/
└── .schemax/
    ├── project.json           # Project metadata with provider info
    ├── changelog.json         # Uncommitted operations
    ├── snapshots/
    │   ├── v0.1.0.json       # Full state snapshot
    │   ├── v0.2.0.json
    │   └── v0.3.0.json
    └── migrations/            # Generated SQL files
        └── migration_*.sql

Key Changes from V2:

project.json now includes provider metadata
Operations are prefixed with provider ID (e.g., unity.add_catalog)
Snapshots include provider context
Automatic V2 to V3 migration on first load

Project File Schema (V3)

project.json contains metadata and provider selection:

{
  "version": 3,
  "name": "my-databricks-schemas",
  "provider": {
    "type": "unity",
    "version": "1.0.0"
  },
  "environments": ["dev", "test", "prod"],
  "latestSnapshot": "v0.2.0",
  "snapshots": [
    {
      "id": "snap_uuid",
      "version": "v0.1.0",
      "name": "Initial schema",
      "file": ".schemax/snapshots/v0.1.0.json",
      "ts": "2025-10-06T10:00:00Z",
      "opsCount": 15,
      "hash": "sha256...",
      "previousSnapshot": null
    }
  ],
  "deployments": [],
  "settings": {
    "autoIncrementVersion": true,
    "versionPrefix": "v",
    "requireSnapshotForProd": true,
    "allowDrift": false
  }
}

New in V3:

provider field specifies catalog type and version
Provider info used to load correct reducer and SQL generator

Snapshot File Schema

snapshots/vX.Y.Z.json contains the full state:

{
  "id": "snap_uuid",
  "version": "v0.1.0",
  "name": "Initial schema",
  "ts": "2025-10-06T10:00:00Z",
  "createdBy": "user@example.com",
  "state": {
    "catalogs": [
      {
        "id": "cat_uuid",
        "name": "my_catalog",
        "schemas": [
          {
            "id": "sch_uuid",
            "name": "my_schema",
            "tables": [...]
          }
        ]
      }
    ]
  },
  "operations": [
    {
      "id": "op_1",
      "ts": "2025-01-01T10:00:00Z",
      "provider": "unity",
      "op": "unity.add_schema",
      "target": "sch_uuid",
      "payload": {...}
    },
    ...
  ],
  "previousSnapshot": null,
  "hash": "sha256...",
  "tags": ["production"],
  "comment": "First release"
}

Note: The structure of state varies by provider. Unity Catalog uses catalogs, Hive uses databases, etc.

Changelog File Schema

changelog.json contains uncommitted operations:

{
  "version": 1,
  "sinceSnapshot": "v0.2.0",
  "ops": [
    {
      "id": "op_abc123",
      "ts": "2025-10-07T14:30:00Z",
      "provider": "unity",
      "op": "unity.add_catalog",
      "target": "cat_new",
      "payload": {
        "catalogId": "cat_new",
        "name": "analytics"
      }
    }
  ],
  "lastModified": "2025-10-07T14:30:00Z"
}

New in V3:

provider field on each operation
op field prefixed with provider ID (e.g., unity.add_catalog)
Operations validated by provider before being saved

Core Concepts

1. Operations

Operations are the fundamental unit of change in SchemaX. Every user action generates one or more operations.

Operation Structure:

interface Operation {
  id: string;           // Unique operation ID
  ts: string;           // ISO 8601 timestamp
  provider: string;     // Provider ID (e.g., 'unity')
  op: string;           // Operation type with provider prefix
  target: string;       // ID of object being modified
  payload: Record<string, any>; // Operation-specific data
}

Example - Unity Catalog:

{
  "id": "op_abc123",
  "ts": "2025-10-07T14:30:00Z",
  "provider": "unity",
  "op": "unity.add_catalog",
  "target": "cat_xyz",
  "payload": {
    "catalogId": "cat_xyz",
    "name": "analytics"
  }
}

Example - Hive Metastore (Future):

{
  "id": "op_def456",
  "ts": "2025-10-07T14:35:00Z",
  "provider": "hive",
  "op": "hive.add_database",
  "target": "db_xyz",
  "payload": {
    "databaseId": "db_xyz",
    "name": "analytics"
  }
}

2. Snapshots

Snapshots are point-in-time captures of the complete state. They serve as:

Performance optimization - No need to replay all operations
Release markers - Tagged versions for deployment
Rollback points - Revert to known good state
Audit checkpoints - Verify system integrity

When to Create Snapshots:

Before deploying to production
After completing a major feature
Before risky operations
On a regular schedule (e.g., weekly)

3. State Loading (Provider-Aware)

Current state is computed as: Latest Snapshot + Changelog Operations

async function loadCurrentState(workspaceUri: Uri): Promise<{
  state: ProviderState;
  changelog: ChangelogFile;
  provider: Provider;
}> {
  // 1. Read project file
  const project = await readProject(workspaceUri);
  
  // 2. Get provider from registry
  const provider = ProviderRegistry.get(project.provider.type);
  
  // 3. Load latest snapshot (or start with empty state)
  let state = project.latestSnapshot
    ? await readSnapshot(workspaceUri, project.latestSnapshot)
    : provider.createInitialState();
  
  // 4. Load changelog
  const changelog = await readChangelog(workspaceUri);
  
  // 5. Apply changelog operations using provider's reducer
  state = provider.applyOperations(state, changelog.ops);
  
  return { state, changelog, provider };
}

Key Points:

Provider selected based on project.provider.type
Provider's state reducer applies operations
Operations validated by provider before applying
State structure defined by provider

4. SQL Generation (Provider-Specific)

Each provider implements its own SQL generator that converts operations to DDL:

interface Provider {
  getSQLGenerator(state: ProviderState): SQLGenerator;
}

interface SQLGenerator {
  generateSQL(ops: Operation[]): string;
}

Example - Unity Catalog:

const generator = unityProvider.getSQLGenerator(state);
const sql = generator.generateSQL(changelog.ops);

Output:

-- Operation: unity.add_catalog (op_abc123)
CREATE CATALOG IF NOT EXISTS `analytics`;

-- Operation: unity.add_schema (op_def456)
CREATE SCHEMA IF NOT EXISTS `analytics`.`bronze`;

Example - Hive Metastore (Future):

-- Operation: hive.add_database (op_abc123)
CREATE DATABASE IF NOT EXISTS analytics;

-- Operation: hive.add_table (op_def456)
CREATE TABLE IF NOT EXISTS analytics.bronze_users (...);

Provider Hierarchy

Different providers have different object hierarchies:

Unity Catalog (3 levels)

Catalog
  └─ Schema
      └─ Table

Hive Metastore (2 levels)

Database
  └─ Table

PostgreSQL (3 levels)

Database
  └─ Schema
      └─ Table

UI Adaptation:

The UI dynamically adapts to the provider's hierarchy:

const hierarchy = provider.capabilities.hierarchy;
const levels = hierarchy.levels; // Array of HierarchyLevel

// Render tree based on hierarchy depth
levels.forEach(level => {
  renderLevel(level.name, level.displayName, level.icon);
});

Operation Flow

1. User Action in UI

// User clicks "Add Catalog"
useDesignerStore().addCatalog('analytics');

2. Store Creates Operation

addCatalog: (name) => {
  const catalogId = `cat_${uuidv4()}`;
  const provider = get().provider; // Get current provider
  
  const op: Operation = {
    id: `op_${uuidv4()}`,
    ts: new Date().toISOString(),
    provider: provider.id,
    op: `${provider.id}.add_catalog`, // Provider prefix!
    target: catalogId,
    payload: { catalogId, name },
  };
  
  emitOps([op]); // Send to extension
},

3. Extension Validates and Saves

// Extension receives operation
const ops: Operation[] = message.payload;

// Validate using provider
const provider = ProviderRegistry.get(project.provider.type);
for (const op of ops) {
  const validation = provider.validateOperation(op);
  if (!validation.valid) {
    throw new Error(`Invalid operation: ${validation.errors}`);
  }
}

// Append to changelog
await appendOps(workspaceUri, ops);

4. State Updated

// Reload state
const { state, changelog, provider } = await loadCurrentState(workspaceUri);

// Apply operations using provider
const newState = provider.applyOperations(state, changelog.ops);

// Send back to UI
webview.postMessage({
  type: 'project-updated',
  payload: { project, state: newState, ops: changelog.ops, provider }
});

Code Organization

TypeScript (VSCode Extension)

src/
├── providers/
│   ├── base/
│   │   ├── provider.ts        # Provider interface
│   │   ├── models.ts          # Base types
│   │   ├── operations.ts      # Operation types
│   │   ├── sql-generator.ts   # SQL generator base
│   │   └── hierarchy.ts       # Hierarchy types
│   ├── registry.ts            # Provider registry
│   └── unity/
│       ├── index.ts           # Unity provider exports
│       ├── models.ts          # Unity-specific models
│       ├── operations.ts      # Unity operations
│       ├── sql-generator.ts   # Unity SQL generator
│       ├── state-reducer.ts   # Unity state reducer
│       └── hierarchy.ts       # Unity hierarchy config
├── storage-v3.ts              # Provider-aware storage
├── extension.ts               # Extension entry point
└── webview/
    ├── state/
    │   └── useDesignerStore.ts  # Provider-aware store
    └── components/
        └── ... UI components

Python (SDK/CLI)

src/schemax/
├── providers/
│   ├── base/
│   │   ├── provider.py
│   │   ├── models.py
│   │   ├── operations.py
│   │   ├── sql_generator.py
│   │   └── hierarchy.py
│   ├── registry.py
│   └── unity/
│       ├── __init__.py
│       ├── models.py
│       ├── operations.py
│       ├── sql_generator.py
│       ├── state_reducer.py
│       └── hierarchy.py
├── storage_v3.py
└── cli.py

Data Models

Unity Catalog Models

interface UnityCatalog {
  id: string;
  name: string;
  schemas: UnitySchema[];
}

interface UnitySchema {
  id: string;
  name: string;
  tables: UnityTable[];
  views: UnityView[];
}

interface UnityTable {
  id: string;
  name: string;
  format: 'delta' | 'iceberg';
  columns: UnityColumn[];
  properties: Record<string, string>;
  constraints: UnityConstraint[];
  rowFilters?: UnityRowFilter[];
  columnMasks?: UnityColumnMask[];
  grants: UnityGrant[];
  comment?: string;
}

interface UnityColumn {
  id: string;
  name: string;
  type: string;
  nullable: boolean;
  comment?: string;
  tags?: Record<string, string>;
  maskId?: string;
}

interface UnityView {
  id: string;
  name: string;
  definition: string;
  comment?: string;
  properties?: Record<string, string>;
  tags?: Record<string, string>;
  extractedDependencies?: {
    tables?: string[];
    views?: string[];
  };
}

Governance Features (Unity Catalog)

interface UnityConstraint {
  id: string;
  type: 'primary_key' | 'foreign_key' | 'check';
  name?: string;
  columns: string[];
  // ... type-specific fields
}

interface UnityRowFilter {
  id: string;
  name: string;
  enabled: boolean;
  udfExpression: string;
  description?: string;
}

interface UnityColumnMask {
  id: string;
  columnId: string;
  name: string;
  enabled: boolean;
  maskFunction: string;
  description?: string;
}

Migration from V2 to V3

Automatic Migration

When a V2 project is opened, it's automatically migrated to V3:

async function migrateV2ToV3(
  workspaceUri: Uri,
  v2Project: any,
  providerId: string = 'unity'
): Promise<void> {
  // 1. Add provider field
  const v3Project = {
    ...v2Project,
    version: 3,
    provider: {
      type: providerId,
      version: '1.0.0',
    },
  };
  
  // 2. Prefix operations with provider
  const changelog = await readChangelog(workspaceUri);
  const migratedOps = changelog.ops.map(op => ({
    ...op,
    provider: providerId,
    op: `${providerId}.${op.op}`, // Add provider prefix
  }));
  
  // 3. Save migrated files
  await writeProject(workspaceUri, v3Project);
  await writeChangelog(workspaceUri, { ...changelog, ops: migratedOps });
}

Migration is:

✅ Automatic (no user action required)
✅ Non-destructive (preserves all data)
✅ One-way (V3 projects don't downgrade to V2)
✅ Logged (migration events logged to output)

Benefits of Provider Architecture

For Users

Unified Experience - Same tool for Unity Catalog, Hive, PostgreSQL
Provider-Specific Features - Full support for each catalog's unique features
Easy Migration - Switch providers if needed (future)
Single Learning Curve - Learn once, use everywhere

For Developers

Clear Boundaries - Providers are isolated modules
No Merge Conflicts - Teams work in separate provider directories
Independent Testing - Each provider has its own test suite
Easy to Add - Well-documented provider contract
Type Safety - TypeScript and Pydantic enforce contracts

For the Project

Maintainable - Clear separation of concerns
Scalable - Easy to add new providers
Testable - Provider compliance tests ensure quality
Future-Proof - Architecture supports any catalog type

Performance Considerations

State Loading

Bounded Growth: Snapshots prevent unlimited operation replay
Lazy Loading: Only load latest snapshot, not entire history
Incremental Updates: Apply only new operations since last load

Example Timings (100 tables):

Load snapshot: <10ms
Apply 50 operations: <5ms
Total: <15ms ⚡

SQL Generation

SQL generation in SchemaX uses a dependency-aware system that ensures DDL statements are generated in the correct execution order.

Dependency-Aware Ordering

Problem: Database objects have dependencies (e.g., views depend on tables, foreign keys reference other tables). Naive SQL generation in operation timestamp order can produce unexecutable SQL.

Solution: Build a dependency graph and use topological sorting to determine correct execution order.

Implementation:

// 1. Build dependency graph from operations
const graph = new DependencyGraph();
for (const op of operations) {
  graph.addNode({
    id: getTargetObjectId(op),
    type: getObjectType(op),
    hierarchyLevel: getDependencyLevel(op),
    operation: op
  });
  
  // Extract dependencies (e.g., view → table)
  const deps = extractOperationDependencies(op);
  for (const [depId, depType] of deps) {
    graph.addEdge(targetId, depId, depType);
  }
}

// 2. Detect cycles
const cycles = graph.detectCycles();
if (cycles.length > 0) {
  throw new Error('Circular dependencies detected');
}

// 3. Topological sort (dependencies first)
const sortedOps = graph.topologicalSort();

// 4. Generate SQL in dependency order
return sortedOps.map(op => generateSQLForOperation(op));

Key Features:

Graph-Based Ordering: Uses NetworkX (Python) or custom graph implementation (TypeScript)
Cycle Detection: Prevents invalid circular dependencies between views
Breaking Change Detection: Warns when dropping objects with dependents
Hierarchy-Aware: Respects natural hierarchy (catalog → schema → table/view)
Provider-Extensible: Each provider defines its own dependency extraction logic

Dependency Types

enum DependencyType {
  HIERARCHY = 'hierarchy',           // Schema-level (catalog → schema → table)
  FOREIGN_KEY = 'foreign_key',       // FK constraint dependency
  VIEW_DEPENDENCY = 'view_dependency', // View depends on table/view
  COLUMN_MASK = 'column_mask',       // Column mask dependency
  ROW_FILTER = 'row_filter'          // Row filter dependency
}

enum DependencyEnforcement {
  ENFORCED = 'enforced',             // Must be satisfied (hierarchical deps)
  INFORMATIONAL = 'informational',   // Not enforced (Unity Catalog FKs)
  WARNING = 'warning'                // Only for warnings
}

View and Materialized View Dependency Extraction

Views and materialized views are the primary use case for dependency-aware ordering. SchemaX extracts dependencies from view/MV SQL (FROM and JOIN references). The same mechanism applies to both: each view or materialized view stores extractedDependencies (tables and views it references), and the SQL generator orders creation so that tables and views are created before any view or materialized view that depends on them. Chains of arbitrary depth are supported: e.g. table → view1 → view2 → … → view_n (and materialized views at the end) are ordered correctly via the dependency graph.

Python (SQLGlot):

from sqlglot import parse_one

def extract_dependencies_from_view(view_sql: str) -> dict:
    parsed = parse_one(view_sql, dialect="databricks")
    tables = []
    for table_exp in parsed.find_all(exp.Table):
        fqn = f"{table_exp.catalog}.{table_exp.schema}.{table_exp.name}"
        tables.append(fqn)
    return {"tables": tables, "views": [], ...}

TypeScript (Regex-Based):

function extractDependenciesFromView(viewSql: string) {
  const tablePattern = /(?:FROM|JOIN)\s+(?:(\w+)\.)?(?:(\w+)\.)?(\w+)/gi;
  const tables = [];
  let match;
  while ((match = tablePattern.exec(viewSql)) !== null) {
    const fqn = [match[1], match[2], match[3]].filter(Boolean).join('.');
    tables.push(fqn);
  }
  return { tables, views: [], catalogs: [], schemas: [] };
}

Manual Dependency Override

In the VS Code Designer, views and materialized views show a Dependencies section (tables and views they depend on). Dependencies are filled automatically from the SQL when you save the definition. You can also edit the list manually (add or remove table or view names) to fix extraction gaps or force creation order. Use object names as they appear in the project (e.g. table name or schema.table). Saving the definition again re-extracts from SQL and replaces the dependency list; use Edit dependencies after saving to adjust if needed.

Hierarchy Levels

Operations are assigned hierarchy levels for fallback ordering:

Level 0: Catalog operations (CREATE CATALOG)
Level 1: Schema operations (CREATE SCHEMA)
Level 2: Table/View/Materialized view creation (CREATE TABLE, CREATE VIEW, CREATE MATERIALIZED VIEW)
Level 3: Modifications (ALTER TABLE, ADD COLUMN, SET PROPERTIES)

Breaking Change Detection

When a DROP operation is detected, the system checks for dependents:

if (opType.startsWith('drop_')) {
  const dependents = graph.getDependents(targetId);
  if (dependents.length > 0) {
    warnings.push(
      `BREAKING CHANGE: Dropping ${targetId} will affect: ${dependents}`
    );
  }
}

Fallback Behavior

If dependency analysis fails (e.g., cycles detected, parsing errors):

Detect Cycle: Report cycle path to user
Fall Back: Use hierarchy level + timestamp sorting
Add Warnings: Include warnings in generated SQL comments

Example Warning:

-- WARNINGS:
-- Circular dependencies detected:
--   view_a → view_b → view_a
-- Using level-based sorting as fallback

-- Statement 1
CREATE SCHEMA my_schema;

Automatic FQN Qualification

Problem: View SQL may contain unqualified table references (e.g., SELECT * FROM table4), which fail when executed without a current catalog/schema context (like in SQL Statement Execution API).

Solution: Automatically qualify all table/view references in view definitions with fully-qualified names (FQN).

Python Implementation (SQLGlot):

def _qualify_view_definition(self, definition: str, extracted_deps: dict) -> str:
    """Qualify unqualified table/view references with fully-qualified names"""
    import sqlglot
    from sqlglot import expressions as exp
    
    # Parse SQL
    parsed = sqlglot.parse_one(definition, dialect="databricks")
    
    # Build name → FQN mapping from id_name_map
    name_to_fqn = {}
    for object_id, fqn in self.id_name_map.items():
        if "." in fqn:
            parts = fqn.split(".")
            if len(parts) == 3:
                catalog, schema, name = parts
                name_to_fqn[name] = fqn                      # unqualified
                name_to_fqn[f"{schema}.{name}"] = fqn        # partially qualified
    
    # Replace table references with FQNs
    for table_node in parsed.find_all(exp.Table):
        current_ref = ".".join([
            table_node.catalog, table_node.db, table_node.name
        ]).strip(".")
        
        if current_ref in name_to_fqn:
            fqn = name_to_fqn[current_ref]
            catalog, schema, name = fqn.split(".")
            # Use quoted identifiers for Databricks compatibility
            table_node.set("catalog", exp.to_identifier(catalog, quoted=True))
            table_node.set("db", exp.to_identifier(schema, quoted=True))
            table_node.set("this", exp.to_identifier(name, quoted=True))
    
    return parsed.sql(dialect="databricks", pretty=True)

Example Transformation:

-- Input (unqualified)
SELECT * FROM table4

-- Output (qualified with backticks)
SELECT * FROM `dev_sales_analytics`.`customer_analytics`.`table4`

Key Benefits:

✅ Views work regardless of current catalog/schema context
✅ Compatible with SQL Statement Execution API
✅ Prevents TABLE_OR_VIEW_NOT_FOUND errors
✅ Uses SQLGlot for accurate SQL parsing and transformation
✅ Supports unqualified (table4) and partially qualified (schema.table4) references

Performance

Graph Construction: O(V + E) where V = operations, E = dependencies
Cycle Detection: O(V + E) using DFS
Topological Sort: O(V + E) using Kahn's algorithm
Typical Scale: <10ms for 100 operations with 50 dependencies

Provider-Specific Implementation

Each provider implements:

abstract class BaseSQLGenerator {
  // Required: Define hierarchy levels
  protected abstract _getDependencyLevel(op: Operation): number;
  
  // Optional: Extract dependencies from operations
  protected _extractOperationDependencies(op: Operation): Dependency[] {
    return []; // Base implementation
  }
}

class UnitySQLGenerator extends BaseSQLGenerator {
  protected _getDependencyLevel(op: Operation): number {
    if (op.op.includes('catalog')) return 0;
    if (op.op.includes('schema')) return 1;
    if (op.op.includes('add_table') || op.op.includes('add_view')) return 2;
    return 3;
  }
  
  protected _extractOperationDependencies(op: Operation): Dependency[] {
    // Extract from view SQL, foreign keys, etc.
    if (op.op === 'unity.add_view') {
      const deps = extractDependenciesFromView(op.payload.definition);
      return deps.tables.map(tableId => [
        tableId,
        DependencyType.VIEW_DEPENDENCY,
        DependencyEnforcement.INFORMATIONAL
      ]);
    }
    return [];
  }
}

Benefits

✅ Correct Execution Order: SQL runs without dependency errors
✅ View Support: Enables complex view hierarchies (view → view → table)
✅ Breaking Change Warnings: Prevents accidental data loss
✅ Scalable: Handles projects with 1000+ objects efficiently
✅ Provider-Agnostic: Works with any catalog system

Memory Usage

Single State: Only one state object in memory
Immutable Operations: Operations are small and append-only
Efficient Snapshots: Snapshots compressed when written

Security & Validation

Operation Validation

Every operation is validated before being saved:

const validation = provider.validateOperation(op);
if (!validation.valid) {
  throw new Error(`Invalid operation: ${validation.errors}`);
}

Validation Checks:

Required fields present
Field types correct
Provider supports operation
References valid (IDs exist)
Constraints satisfied

State Validation

State can be validated at any time:

const validation = provider.validateState(state);
if (!validation.valid) {
  console.error('State validation errors:', validation.errors);
}

Snapshot Integrity

Snapshots include SHA-256 hash for integrity verification:

const expectedHash = snapshot.hash;
const actualHash = calculateHash(snapshot.state, snapshot.operations);
if (expectedHash !== actualHash) {
  throw new Error('Snapshot integrity check failed');
}

Extension Points

Adding a New Provider

Implement Provider Interface - See Provider Contract
Register Provider - Add to registry
Test Compliance - Run provider compliance tests
Document - Add provider-specific docs

Adding New Operations

Define Operation - Add to provider's operations.ts
Update State Reducer - Handle in state-reducer.ts
Update SQL Generator - Generate DDL in sql-generator.ts
Add UI - Create UI for operation
Test - Add operation tests

Extending UI

Use Provider Capabilities - Check what provider supports
Adapt to Hierarchy - Use provider's hierarchy definition
Dynamic Forms - Generate forms based on operation metadata
Provider-Specific Components - Add provider-specific features

Multi-Environment Support (v4)

SchemaX v4 introduces comprehensive multi-environment support, enabling users to design schemas once and deploy them to multiple environments (dev, test, prod) with different physical catalog names.

Logical vs Physical Naming

Design Pattern:

Logical names stored in SchemaX state (environment-agnostic)
Physical names generated at SQL generation time (environment-specific)
Mapping defined in project.json environment configuration

Example:

// Logical state (in changelog.json)
{
  "catalogs": [{ "name": "__implicit__" }],
  "schemas": [{ "name": "customer_360" }]
}

// Environment configuration (in project.json v4)
{
  "provider": {
    "environments": {
      "dev": { "catalog": "dev_my_analytics" },
      "prod": { "catalog": "prod_my_analytics" }
    }
  }
}

// Generated SQL
schemax sql --target dev  → CREATE SCHEMA `dev_my_analytics`.`customer_360`;
schemax sql --target prod → CREATE SCHEMA `prod_my_analytics`.`customer_360`;

Project Schema v4

Environment Configuration:

{
  "version": 4,
  "provider": {
    "type": "unity",
    "environments": {
      "dev": {
        "catalog": "dev_analytics",
        "description": "Development environment",
        "allowDrift": true,
        "requireSnapshot": false,
        "autoCreateCatalog": true,
        "autoCreateSchemaxSchema": true
      },
      "prod": {
        "catalog": "prod_analytics",
        "description": "Production environment",
        "allowDrift": false,
        "requireSnapshot": true,
        "autoCreateCatalog": false
      }
    }
  }
}

Environment Settings:

Setting	Type	Description
`catalog`	string	Physical catalog name in target system
`description`	string	Human-readable description
`allowDrift`	boolean	Allow actual state to differ from SchemaX
`requireSnapshot`	boolean	Require snapshot before deployment
`autoCreateCatalog`	boolean	Create catalog if it doesn't exist
`autoCreateSchemaxSchema`	boolean	Auto-create tracking schema

Catalog Name Mapping

Implementation (TypeScript):

// extension.ts
function buildCatalogMapping(
  state: any,
  envConfig: EnvironmentConfig
): Record<string, string> {
  const catalogs = state.catalogs || [];
  
  if (catalogs.length === 1) {
    const logicalName = catalogs[0].name;  // e.g., "__implicit__"
    const physicalName = envConfig.catalog; // e.g., "dev_analytics"
    return { [logicalName]: physicalName };
  }
  
  if (catalogs.length > 1) {
    throw new Error("Multi-catalog projects not yet supported");
  }
  
  return {};
}

// UnitySQLGenerator
constructor(state: UnityState, catalogNameMapping?: Record<string, string>) {
  this.catalogNameMapping = catalogNameMapping || {};
  // Mapping applied when building fully-qualified names
}

Benefits:

✅ Clean separation: logical names in state, physical names in deployment
✅ Automatic mapping for single-catalog projects (90%+ use case)
✅ Environment isolation: same schema → different catalogs per environment
✅ Git-friendly: logical names in version control, physical names in config

Implicit Catalog Mode

For single-catalog projects (the vast majority), SchemaX uses implicit catalog mode to simplify the user experience.

User Experience:

Instead of:

Configure environment catalog: dev_analytics
Create catalog in UI: analytics
Map during deployment: analytics → dev_analytics

Users now experience:

Configure environment catalog: dev_analytics
Directly add schemas (no catalog creation step)
Auto-map: __implicit__ → dev_analytics

Implementation:

// Project settings
{
  "settings": {
    "catalogMode": "single"  // Default mode
  }
}

// Auto-created implicit catalog
{
  "catalogs": [
    {
      "id": "cat_implicit",
      "name": "__implicit__",  // Special marker
      "schemas": [...]
    }
  ]
}

UI Changes:

✅ No "+ Catalog" button in single-catalog mode
✅ Schemas shown at root level (flat hierarchy)
✅ Catalog layer invisible to user
✅ Physical catalog names shown in SQL generation prompts

Benefits:

Simpler mental model (schemas and tables, not catalogs)
Matches how users think ("I'm designing customer data")
Physical catalog per environment already configured
No confusion about logical vs physical names

Deployment Tracking

SchemaX tracks deployments in the target catalog itself using a dedicated schemax schema:

-- Auto-created on first deployment
CREATE SCHEMA IF NOT EXISTS <catalog>.schemax;

-- Deployment history
CREATE TABLE <catalog>.schemax.deployments (
  id STRING,
  environment STRING,
  snapshot_version STRING,
  deployed_at TIMESTAMP,
  deployed_by STRING,
  status STRING,  -- pending/success/failed
  ops_count INT,
  error_message STRING,
  sql_executed STRING,
  PRIMARY KEY (id)
);

-- Per-operation tracking
CREATE TABLE <catalog>.schemax.deployment_ops (
  deployment_id STRING,
  op_id STRING,
  op_type STRING,
  sql_statement STRING,
  status STRING,
  execution_order INT,
  PRIMARY KEY (deployment_id, op_id)
);

Benefits:

✅ Queryable audit trail
✅ Multi-user visibility
✅ Compliance and governance
✅ Tracks partial failures

Apply Command Workflow

# Generate SQL for dev
schemax sql --target dev --output migration.sql

# Preview changes (dry run)
schemax apply --target dev --profile DEV --warehouse-id abc123 --dry-run

# Apply with confirmation
schemax apply --target dev --profile DEV --warehouse-id abc123

# Non-interactive (CI/CD)
schemax apply --target dev --profile DEV --warehouse-id abc123 --no-interaction

Execution Flow:

Load project and validate environment config
Build catalog name mapping (logical → physical)
Generate SQL with mapped names
Show preview and prompt for confirmation
Execute SQL statements sequentially
Record deployment in <catalog>.schemax.deployments
On failure: stop immediately, record error, show status

Future Architecture Plans

Stage 2 Enhancements

Multi-Catalog Projects - Support explicit multiple catalogs per environment
Provider Plugins - Load providers from external packages
Cross-Provider References - Reference objects across providers
Provider Marketplace - Community-contributed providers

Advanced Features

Drift Detection - Compare deployed state vs SchemaX state
Impact Analysis - Show what a change will affect
Rollback Support - Revert to previous snapshots
State Diffs - Visual comparison between versions
Schema Import - Reverse-engineer existing catalogs into SchemaX
DAB Generation - Export as Databricks Asset Bundles

Testing & Quality Assurance

Test Coverage

SchemaX includes comprehensive test suites for both Python and TypeScript implementations:

Python SDK Test Status:

✅ 124 passing tests (91.2%)
⏸️ 12 skipped tests (8.8%) - documented in GitHub issues #19, #20
Test Categories:
- Storage operations (28 tests)
- State reducer (29 tests)
- Provider system (14 tests)
- SQL generator (39 passing, 9 skipped)
- Integration workflows (13 passing, 4 skipped)

Test Helpers:

Tests use OperationBuilder pattern for creating operations:

from tests.utils import OperationBuilder

builder = OperationBuilder()
op = builder.add_catalog("cat_123", "bronze", op_id="op_001")
# Automatically includes: id, ts, provider, op, target, payload

SQL Validation

SchemaX includes optional SQLGlot integration for validating generated SQL:

import sqlglot

# Validate Databricks SQL
sql = "ALTER TABLE `catalog`.`schema`.`table` ADD COLUMN `id` BIGINT"
parsed = sqlglot.parse_one(sql, dialect="databricks")
assert parsed is not None  # Valid SQL

Benefits:

Catches SQL syntax errors early
Supports multiple SQL dialects (Databricks, PostgreSQL, MySQL, etc.)
Can be integrated into CI/CD pipelines

Installation:

pip install sqlglot>=20.0.0
# or
pip install "schemaxpy[validation]"

Conclusion

SchemaX's provider-based architecture provides:

✅ Extensibility - Easy to add new catalog types
✅ Flexibility - Each provider can have unique features
✅ Type Safety - Strong typing throughout
✅ Performance - Fast state loading and SQL generation
✅ Maintainability - Clear module boundaries
✅ Future-Proof - Ready for new catalog systems
✅ Well-Tested - Comprehensive test coverage with 124+ passing tests
✅ SQL Validation - Optional SQLGlot integration for quality assurance

For more details:

Provider Development: Provider Contract
Development Guide: Development
Getting Started: Quickstart

Overview​

Design Goals​

Architectural Patterns​

Primary Pattern: Event Sourcing​

Snapshot + Delta Pattern​

Plugin Architecture (Provider System)​

Strategy Pattern (Provider Operations)​

State Reducer Pattern (Redux-inspired)​

Registry Pattern (Provider Lookup)​

Repository Pattern (Storage Layer)​

Command Pattern (Operations)​

Adapter Pattern (Python ↔ TypeScript)​

Façade Pattern (CLI)​

Pattern Interaction Diagram​

Architectural Style: Functional Core, Imperative Shell​

Design Patterns Summary​

Architectural Principles​

1. Separation of Concerns​

2. Immutability​

3. Idempotency​

4. Extensibility​

5. Type Safety​

6. Testability​

Anti-Patterns Avoided​

Architecture Inspirations​

Provider-Based Architecture​

What is a Provider?​

Provider Registry​

Current Providers​

Base Provider Enhancements​

Generic SQL Optimization Algorithms​

Benefits for Provider Implementers​

Implementation Example​

Code Reduction Metrics​

File Structure​

Version 3 Architecture (Current)​

Project File Schema (V3)​

Snapshot File Schema​

Changelog File Schema​

Core Concepts​

1. Operations​

2. Snapshots​

3. State Loading (Provider-Aware)​

4. SQL Generation (Provider-Specific)​

Provider Hierarchy​

Unity Catalog (3 levels)​

Hive Metastore (2 levels)​

PostgreSQL (3 levels)​

Operation Flow​

1. User Action in UI​

2. Store Creates Operation​

3. Extension Validates and Saves​

4. State Updated​

Code Organization​

TypeScript (VSCode Extension)​

Python (SDK/CLI)​

Data Models​

Unity Catalog Models​

Governance Features (Unity Catalog)​

Migration from V2 to V3​

Automatic Migration​

Benefits of Provider Architecture​

For Users​

For Developers​

For the Project​

Performance Considerations​

State Loading​

SQL Generation​

Dependency-Aware Ordering​

Dependency Types​

View and Materialized View Dependency Extraction​

Manual Dependency Override​

Hierarchy Levels​

Breaking Change Detection​

Fallback Behavior​

Automatic FQN Qualification​

Performance​

Provider-Specific Implementation​

Benefits​

Memory Usage​

Overview

Design Goals

Architectural Patterns

Primary Pattern: Event Sourcing

Snapshot + Delta Pattern

Plugin Architecture (Provider System)

Strategy Pattern (Provider Operations)

State Reducer Pattern (Redux-inspired)

Registry Pattern (Provider Lookup)

Repository Pattern (Storage Layer)

Command Pattern (Operations)

Adapter Pattern (Python ↔ TypeScript)

Façade Pattern (CLI)

Pattern Interaction Diagram

Architectural Style: Functional Core, Imperative Shell

Design Patterns Summary

Architectural Principles

1. Separation of Concerns

2. Immutability

3. Idempotency

4. Extensibility

5. Type Safety

6. Testability

Anti-Patterns Avoided

Architecture Inspirations

Provider-Based Architecture

What is a Provider?

Provider Registry

Current Providers

Base Provider Enhancements

Generic SQL Optimization Algorithms

Benefits for Provider Implementers

Implementation Example

Code Reduction Metrics

File Structure

Version 3 Architecture (Current)

Project File Schema (V3)

Snapshot File Schema

Changelog File Schema

Core Concepts

1. Operations

2. Snapshots

3. State Loading (Provider-Aware)

4. SQL Generation (Provider-Specific)

Provider Hierarchy

Unity Catalog (3 levels)

Hive Metastore (2 levels)

PostgreSQL (3 levels)

Operation Flow

1. User Action in UI

2. Store Creates Operation

3. Extension Validates and Saves

4. State Updated

Code Organization

TypeScript (VSCode Extension)

Python (SDK/CLI)

Data Models

Unity Catalog Models

Governance Features (Unity Catalog)

Migration from V2 to V3

Automatic Migration

Benefits of Provider Architecture

For Users

For Developers

For the Project

Performance Considerations

State Loading

SQL Generation

Dependency-Aware Ordering

Dependency Types

View and Materialized View Dependency Extraction

Manual Dependency Override

Hierarchy Levels

Breaking Change Detection

Fallback Behavior

Automatic FQN Qualification

Performance

Provider-Specific Implementation

Benefits

Memory Usage