Architecture
This document describes the technical architecture and design decisions behind SchemaX.
Overview
SchemaX implements a provider-based, snapshot-driven schema versioning system. The core principle is to maintain an append-only operation log with periodic snapshots, enabling both state-based and change-based workflows across multiple catalog types (Unity Catalog, Hive Metastore, PostgreSQL, etc.).
Design Goals
- Git-Friendly: Store schema definitions in human-readable JSON that produces clean diffs
- Reproducible: Replay operations from a snapshot to reconstruct current state
- Performant: Fast loading even with hundreds of operations
- Auditable: Complete history of who changed what and when
- Migration-Ready: Operations can be converted to SQL migration scripts
- Extensible: Easy to add support for new catalog types via providers
- Multi-Provider: Support multiple catalog systems with unified interface
Architectural Patterns
SchemaX follows several well-established architectural patterns that work together to provide a robust, maintainable, and extensible system.
Primary Pattern: Event Sourcing
The foundation of SchemaX is Event Sourcing - all changes are stored as immutable events (operations) in an append-only log.
Implementation:
// Operations are immutable events
interface Operation {
id: string;
ts: string;
provider: string;
op: string;
target: string;
payload: Record<string, any>;
}
// Current state = replay all operations from a snapshot
state = loadSnapshot(latestSnapshot);
for (const operation of changelog.ops) {
state = provider.applyOperation(state, operation);
}
Key Characteristics:
- ✅ Append-only log (
changelog.json) - ✅ Operations never modified or deleted
- ✅ Complete audit trail
- ✅ Time-travel capability via snapshots
- ✅ State is derived, not stored directly
Benefits:
- Full history of all changes
- Can reconstruct state at any point
- Easy debugging ("what happened?")
- Enables undo/redo capabilities
Snapshot + Delta Pattern
Optimization of Event Sourcing to prevent unbounded operation log growth.
Implementation:
State at v0.3.0 =
load_snapshot("v0.2.0") +
apply_operations(changelog.ops)
.schemax/
├── snapshots/v0.2.0.json ← Full state checkpoint
└── changelog.json ← Only ops since v0.2.0
Benefits:
- Fast state loading (no need to replay 1000s of operations)
- Bounded memory usage
- Clean separation of committed vs uncommitted changes
Plugin Architecture (Provider System)
Extensibility through providers - catalog-specific implementations plugged into a common interface.
Implementation:
// Base contract
interface Provider {
info: ProviderInfo;
capabilities: ProviderCapabilities;
applyOperation(state: ProviderState, op: Operation): ProviderState;
getSQLGenerator(state: ProviderState): SQLGenerator;
validateOperation(op: Operation): ValidationResult;
}
// Implementations
class UnityProvider implements Provider { ... }
class HiveProvider implements Provider { ... }
class PostgresProvider implements Provider { ... }
// Registry
ProviderRegistry.register(unityProvider);
Key Characteristics:
- ✅ Open/Closed Principle (open for extension, closed for modification)
- ✅ Each provider is isolated and independent
- ✅ Core system doesn't know provider details
- ✅ New providers added without changing core
Strategy Pattern (Provider Operations)
Different algorithms (SQL generation, state reduction) selected based on provider.
Implementation:
// Context uses provider to select strategy
function generateSQL(ops: Operation[], project: Project) {
const provider = ProviderRegistry.get(project.provider.type);
const generator = provider.getSQLGenerator(state);
return generator.generateSQL(ops);
}
// Concrete strategies
class UnitySQLGenerator extends SQLGenerator {
generateSQL(ops: Operation[]): string {
// Unity Catalog-specific SQL
}
}
class HiveSQLGenerator extends SQLGenerator {
generateSQL(ops: Operation[]): string {
// Hive Metastore-specific SQL
}
}
Benefits:
- Swappable implementations
- Each strategy optimized for its system
- Clean separation of concerns
State Reducer Pattern (Redux-inspired)
Immutable state transformations through pure functions.
Implementation:
function applyOperation(state: ProviderState, operation: Operation): ProviderState {
// Pure function: state + operation → new_state
const newState = deepClone(state);
switch (operation.op) {
case 'unity.add_catalog':
newState.catalogs.push(createCatalog(operation.payload));
break;
case 'unity.add_table':
const schema = findSchema(newState, operation.payload.schemaId);
schema.tables.push(createTable(operation.payload));
break;
}
return newState; // Never mutate input
}
Key Principles:
- ✅ Pure functions (no side effects)
- ✅ Immutable state
- ✅ Predictable transformations
- ✅ Easy to test
- ✅ Time-travel debugging
Redux Comparison:
// Redux
newState = reducer(state, action)
// SchemaX
newState = provider.applyOperation(state, operation)
Registry Pattern (Provider Lookup)
Central registry for service discovery and dependency injection.
Implementation:
class ProviderRegistryClass {
private providers = new Map<string, Provider>();
register(provider: Provider): void {
this.providers.set(provider.info.id, provider);
}
get(providerId: string): Provider | undefined {
return this.providers.get(providerId);
}
}
// Singleton
export const ProviderRegistry = new ProviderRegistryClass();
// Auto-registration on import
ProviderRegistry.register(unityProvider);
Benefits:
- Service discovery
- Loose coupling
- Easy testing (swap implementations)
Repository Pattern (Storage Layer)
Abstraction over file system operations.
Implementation:
// storage_v3.ts/py acts as repository
class StorageRepository {
readProject(workspacePath: Path): ProjectFile;
writeProject(workspacePath: Path, project: ProjectFile): void;
readChangelog(workspacePath: Path): ChangelogFile;
writeChangelog(workspacePath: Path, changelog: ChangelogFile): void;
readSnapshot(workspacePath: Path, version: string): SnapshotFile;
writeSnapshot(workspacePath: Path, snapshot: SnapshotFile): void;
}
// Usage
const project = storage.readProject(workspace);
// Don't care if it's JSON, SQLite, or remote API
Benefits:
- Data access abstraction
- Easy to swap storage backend
- Testability (mock the repository)
Command Pattern (Operations)
Operations as command objects that encapsulate requests.
Implementation:
// Command = Operation
interface Operation {
id: string; // Command ID
op: string; // Command name
target: string; // Receiver
payload: object; // Parameters
ts: string; // Timestamp
}
// Command execution
function execute(state: State, command: Operation): State {
return applyOperation(state, command);
}
Characteristics:
- ✅ Encapsulates request as object
- ✅ Supports queuing and logging
- ✅ Can be serialized
- ✅ Enables undo (store reverse operations)
Adapter Pattern (Python ↔ TypeScript)
Translating between language ecosystems while maintaining compatibility.
Implementation:
# Python (Pydantic) - accepts both camelCase and snake_case
class Column(BaseModel):
id: str
name: str
mask_id: Optional[str] = Field(None, alias="maskId")
class Config:
populate_by_name = True # Accept both maskId and mask_id
// TypeScript (Zod) - uses camelCase
const Column = z.object({
id: z.string(),
name: z.string(),
maskId: z.string().optional(),
});
Same JSON works in both:
{"id": "col_1", "name": "email", "maskId": "mask_1"}
Benefits:
- Seamless interoperability
- Single source of truth (JSON files)
- No code sharing required
Façade Pattern (CLI)
Simplified interface to complex subsystems.
Implementation:
# cli.py provides simple interface hiding complexity
@cli.command()
def sql(workspace: str):
# Hides complexity of:
# - File system operations
# - Provider lookup
# - State reconstruction
# - SQL generation
state, changelog, provider = load_current_state(Path(workspace))
generator = provider.get_sql_generator(state)
sql = generator.generate_sql(changelog.ops)
console.print(sql)
Benefits:
- Simple API for complex operations
- Easy to use
- Decouples CLI from internal complexity
Pattern Interaction Diagram
┌─────────────────────────────────────────────────────────────────┐
│ CLI / Extension (Façade) │
└────────────┬────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐