Provider Contract
This document defines the contract that all SchemaX catalog providers must implement. Following this contract ensures your provider integrates seamlessly with both the VSCode extension and Python SDK.
Overview
A Provider is a plugin that adds support for a specific catalog system (Unity Catalog, Hive Metastore, PostgreSQL, etc.). Each provider implements a standard interface that SchemaX uses for:
- State Management - How objects are stored and modified
- Operations - What actions users can perform
- SQL Generation - How to convert operations to DDL statements
- Validation - What constraints and rules to enforce
- UI Metadata - How to display objects in the interface
Quick Start
1. Provider Identification
Every provider must have a unique ID and metadata:
// TypeScript
const info: ProviderInfo = {
id: 'myprovider', // Short, lowercase identifier
name: 'My Provider', // Human-readable name
version: '1.0.0', // Semantic version
description: 'Provider for MyDB',
author: 'Your Name',
docsUrl: 'https://...',
};
# Python
info = ProviderInfo(
id="myprovider",
name="My Provider",
version="1.0.0",
description="Provider for MyDB",
author="Your Name",
docs_url="https://...",
)
2. Define Hierarchy
Specify the object hierarchy for your catalog:
// TypeScript - Example: 2-level hierarchy
const hierarchy = new Hierarchy([
{
name: 'database',
displayName: 'Database',
pluralName: 'databases',
icon: 'database',
isContainer: true,
},
{
name: 'table',
displayName: 'Table',
pluralName: 'tables',
icon: 'table',
isContainer: false,
},
]);
Common Hierarchies:
- Unity Catalog: Catalog → Schema → Table (3 levels)
- Hive Metastore: Database → Table (2 levels)
- PostgreSQL: Database → Schema → Table (3 levels)
3. Define Models
Create type-safe models for your objects:
// TypeScript
export interface MyDatabase extends BaseObject {
id: string;
name: string;
tables: MyTable[];
}
export interface MyTable extends BaseObject {
id: string;
name: string;
columns: MyColumn[];
}
export interface MyProviderState extends ProviderState {
databases: MyDatabase[];
}
# Python
class MyDatabase(BaseModel):
id: str
name: str
tables: List[MyTable] = []
class MyProviderState(BaseModel):
databases: List[MyDatabase] = []
4. Define Operations
List all operations your provider supports:
// TypeScript
export const MY_OPERATIONS = {
ADD_DATABASE: 'myprovider.add_database',
ADD_TABLE: 'myprovider.add_table',
// ... more operations
} as const;
export const myOperationMetadata: OperationMetadata[] = [
{
type: MY_OPERATIONS.ADD_DATABASE,
displayName: 'Add Database',
description: 'Create a new database',
category: OperationCategory.Schema,
requiredFields: ['databaseId', 'name'],
optionalFields: [],
isDestructive: false,
},
// ... more metadata
];
Operation Naming Convention: {providerId}.{action}_{object}
Examples:
unity.add_cataloghive.add_databasepostgres.drop_table
5. Implement State Reducer
Apply operations to state immutably:
// TypeScript
export function applyOperation(state: MyProviderState, op: Operation): MyProviderState {
const newState = JSON.parse(JSON.stringify(state)); // Deep clone
const opType = op.op.replace('myprovider.', '');
switch (opType) {
case 'add_database': {
const database: MyDatabase = {
id: op.payload.databaseId,
name: op.payload.name,
tables: [],
};
newState.databases.push(database);
break;
}
// ... handle all operations
}
return newState;
}
Key Rules:
- ✅ Always deep clone state (immutability)
- ✅ Handle ALL operations your provider supports
- ✅ Return new state, never mutate input
- ✅ Be defensive - check if objects exist before modifying
6. Implement SQL Generator
Convert operations to SQL DDL:
// TypeScript
export class MySQLGenerator extends BaseSQLGenerator {
constructor(protected state: MyProviderState) {
super(state);
}
generateSQLForOperation(op: Operation): SQLGenerationResult {
const opType = op.op.replace('myprovider.', '');
switch (opType) {
case 'add_database':
return {
sql: `CREATE DATABASE IF NOT EXISTS ${this.escapeIdentifier(op.payload.name)}`,
warnings: [],
isIdempotent: true,
};
// ... handle all operations
}
}
canGenerateSQL(op: Operation): boolean {
return op.provider === 'myprovider';
}
}
Key Rules:
- ✅ Generate idempotent SQL (use IF NOT EXISTS, IF EXISTS, etc.)
- ✅ Escape identifiers and strings properly
- ✅ Add warnings for operations that can't be fully translated
- ✅ Return empty string or comment for unsupported operations
- ✅ Validate SQL with SQLGlot (optional but recommended)
SQL Validation with SQLGlot:
import sqlglot
def validate_sql(sql: str, dialect: str = "databricks") -> bool:
"""Validate SQL syntax using SQLGlot"""
try:
parsed = sqlglot.parse_one(sql, dialect=dialect)
return parsed is not None
except Exception:
return False
# In tests
assert validate_sql(generated_sql, "databricks")
7. Leverage Base Optimization Algorithms
NEW: Providers should extend the enhanced BaseSQLGenerator to inherit generic optimization algorithms.
Using Base Components
from schemax.providers.base import BaseSQLGenerator, BatchInfo, Operation
class MyProviderSQLGenerator(BaseSQLGenerator):
"""Provider using base optimizations"""
def __init__(self, state, name_mapping=None):
# Initialize base with state and name mapping
super().__init__(state, name_mapping)
# self.batcher and self.optimizer now available
# self.name_mapping contains logical → physical name mappings
# Build provider-specific ID → name mapping
self.id_name_map = self._build_id_name_map()
# REQUIRED: Implement abstract methods
def _get_target_object_id(self, op: Operation) -> Optional[str]:
"""Extract target object ID from operation"""
if op.op == "myprovider.create_table":
return op.target
elif op.op == "myprovider.add_column":
return op.payload.get("table_id")
return None
def _is_create_operation(self, op: Operation) -> bool:
"""Check if operation creates new object"""
return op.op in [
"myprovider.create_database",
"myprovider.create_schema",
"myprovider.create_table"
]
def _get_dependency_level(self, op: Operation) -> int:
"""Get dependency level for ordering (0 = highest priority)"""
if "database" in op.op:
return 0
elif "schema" in op.op:
return 1
elif "create_table" in op.op:
return 2
else:
return 3
def _generate_batched_create_sql(
self,
object_id: str,
batch_info: BatchInfo
) -> str:
"""Generate CREATE statement with batched operations"""
# Use batch_info.create_op for CREATE operation
# Use batch_info.modify_ops for columns, properties, etc.
# Generate complete CREATE with all columns (not empty + ALTERs)
table_name = self._get_table_name(object_id)
columns = self._extract_columns_from_batch(batch_info)
return f"CREATE TABLE {table_name} ({columns})"
def _generate_batched_alter_sql(
self,
object_id: str,
batch_info: BatchInfo
) -> str:
"""Generate ALTER statements for batched operations"""
statements = []
for op in batch_info.modify_ops:
statements.append(self._generate_alter_for_op(op))
return ";\n".join(statements)
Benefits of Using Base
-
Operation Batching - Automatic via
self.batcher- Groups operations by target object
- Enables complete CREATE statements vs empty + ALTERs
-
Column Reorder Optimization - Automatic via
self.optimizer- Detects single-column moves
- Generates 1 statement instead of N statements
-
Generic Utilities - Available methods
self._build_fqn(*parts)- Build fully-qualified namesself.escape_identifier(id)- Escape SQL identifiersself.escape_string(s)- Escape string literals
-
Automatic Ordering - Dependency-based execution order
- Catalogs/databases first (level 0)
- Schemas second (level 1)
- Tables third (level 2)
- Modifications last (level 3+)
Example: Column Reorder with Base Optimizer
def _handle_column_reorder(self, table_id, original_order, final_order):
"""Use base optimizer for column reordering"""
# Base optimizer detects single-column moves
single_move = self.optimizer.detect_single_column_move(
original_order,
final_order
)
if single_move:
# Only one column moved - generate 1 SQL statement
col_id, orig_pos, new_pos = single_move
return self._generate_single_column_move_sql(col_id, new_pos)
else:
# Multiple columns moved - use general algorithm
return self._generate_multi_column_reorder_sql(original_order, final_order)
What to Implement vs What's Inherited
Implement (Provider-Specific):
- SQL syntax for CREATE, ALTER, DROP statements
- Provider-specific keywords (USING DELTA, CLUSTER BY, etc.)
- ID → name mapping for your hierarchy
- Operation type checking
Inherited (Generic):
- Operation batching algorithm
- Column reorder optimization
- FQN building with escaping
- Dependency-level ordering
- String/identifier escaping
8. Implement Provider Class
Tie everything together:
// TypeScript
export class MyProvider extends BaseProvider implements Provider {
readonly info = myProviderInfo;
readonly capabilities = myCapabilities;
constructor() {
super();
myOperationMetadata.forEach(m => this.registerOperation(m));
}
validateOperation(op: Operation): ValidationResult {
// Implement validation logic
}
applyOperation(state: ProviderState, op: Operation): ProviderState {
return applyOperation(state as MyProviderState, op);
}
getSQLGenerator(state: ProviderState): SQLGenerator {
return new MySQLGenerator(state as MyProviderState);
}
createInitialState(): ProviderState {
return { databases: [] };
}
validateState(state: ProviderState): ValidationResult {
// Implement state validation
}
}
export const myProvider = new MyProvider();
8. Register Provider
// TypeScript - in providers/index.ts
import { myProvider } from './myprovider';
export function initializeProviders(): void {
ProviderRegistry.register(myProvider);
// ... register other providers
}
# Python - in providers/__init__.py
from .my_provider import my_provider
ProviderRegistry.register(my_provider)
Required Interface Methods
Provider Info & Capabilities
interface Provider {
readonly info: ProviderInfo;
readonly capabilities: ProviderCapabilities;
}
ProviderCapabilities must specify:
supportedOperations: Array of operation typessupportedObjectTypes: Array of object type nameshierarchy: Your hierarchy definitionfeatures: Feature flags (constraints, tags, etc.)
Operation Metadata
getOperationMetadata(operationType: string): OperationMetadata | undefined;
getAllOperations(): OperationMetadata[];
Return metadata for UI and validation.
Validation
validateOperation(op: Operation): ValidationResult;
validateState(state: ProviderState): ValidationResult;
Validate operations and state structure.
State Management
applyOperation(state: ProviderState, op: Operation): ProviderState;
applyOperations(state: ProviderState, ops: Operation[]): ProviderState;
createInitialState(): ProviderState;
Immutably apply operations to state.
SQL Generation
getSQLGenerator(state: ProviderState): SQLGenerator;
Return a configured SQL generator instance.
File Structure
providers/
└── myprovider/
├── index.ts/py # Exports
├── hierarchy.ts/py # Hierarchy definition
├── models.ts/py # Data models
├── operations.ts/py # Operation definitions
├── state-reducer.ts/py # State reducer
├── sql-generator.ts/py # SQL generator
└── provider.ts/py # Provider implementation
Testing Requirements
Every provider must have tests for:
- Provider Compliance - Passes
testProviderCompliance()suite - Operation Application - Each operation correctly modifies state
- SQL Generation - Each operation generates valid SQL
- Validation - Validates operations and state correctly
- Idempotency - Operations can be applied multiple times safely
- SQL Syntax Validation - Generated SQL is syntactically valid (use SQLGlot)
// TypeScript example
import { testProviderCompliance } from '../base/__tests__/provider-compliance.test';
import { myProvider } from './provider';
describe('MyProvider', () => {
testProviderCompliance(myProvider);
// ... provider-specific tests
});
# Python example with SQLGlot validation
import sqlglot
from schemax.providers.myprovider import my_provider
def test_add_database_sql():
"""Test SQL generation with validation"""
state = my_provider.create_initial_state()
op = create_operation("myprovider.add_database", ...)
generator = my_provider.get_sql_generator(state)
result = generator.generate_sql_for_operation(op)
# Validate SQL syntax
parsed = sqlglot.parse_one(result.sql, dialect="databricks")
assert parsed is not None, "Generated SQL is invalid"
assert "CREATE DATABASE" in result.sql
Best Practices
DO ✅
- Use type-safe models - Leverage TypeScript/Pydantic types
- Make operations atomic - One operation = one logical change
- Generate idempotent SQL - Safe to run multiple times
- Validate early - Check operation validity before applying
- Document operations - Clear descriptions in metadata
- Test thoroughly - Cover all operations and edge cases
- Handle missing objects - Check existence before modifying
- Use meaningful IDs - UUIDs or structured identifiers
DON'T ❌
- Don't mutate state - Always return new state objects
- Don't assume order - Operations should be order-independent when possible
- Don't skip validation - Always validate before applying
- Don't generate destructive SQL - No DROP without IF EXISTS
- Don't hardcode names - Use ID-to-name mapping
- Don't forget edge cases - Handle nulls, empty arrays, etc.
Operation Design Guidelines
Granular Operations
Prefer small, focused operations over large composite ones:
✅ Good:
unity.add_table // Just creates table
unity.add_column // Adds one column
❌ Bad:
unity.create_table_with_columns // Does too much
Payload Structure
Use consistent payload structure:
{
id: 'op_123',
ts: '2024-01-01T00:00:00Z',
provider: 'myprovider',
op: 'myprovider.add_table',
target: 'table_id', // ID of object being modified
payload: {
// Operation-specific data
tableId: 'table_id', // Often duplicates target for convenience
name: 'my_table',
parentId: 'schema_id', // ID of parent object
// ... other fields
}
}
Naming Conventions
- Objects:
snake_caseinternally, display names in UI - Operations:
{provider}.{verb}_{noun}(e.g.,unity.add_catalog) - IDs: Prefixed by type (e.g.,
cat_uuid,tbl_uuid)
View Support and Dependency Extraction
Views are a key feature that require special handling for dependency-aware SQL generation:
View Model
interface MyView {
id: string;
name: string;
definition: string; // SQL definition
comment?: string;
properties?: Record<string, string>;
extractedDependencies?: { // Dependencies from SQL parsing
tables?: string[]; // Table IDs this view depends on
views?: string[]; // View IDs this view depends on
};
}
View Operations
Implement these view operations:
export const MY_OPERATIONS = {
ADD_VIEW: 'myprovider.add_view',
UPDATE_VIEW: 'myprovider.update_view',
RENAME_VIEW: 'myprovider.rename_view',
DROP_VIEW: 'myprovider.drop_view',
SET_VIEW_COMMENT: 'myprovider.set_view_comment',
} as const;
Dependency Extraction
Frontend (TypeScript) - Extract dependencies when user creates/updates view:
import { extractDependenciesFromView } from '@/providers/base/sql-parser';
function handleViewCreate(viewSql: string, schemaId: string) {
// Extract dependencies using regex-based parser
const extractedDeps = extractDependenciesFromView(viewSql);
// Create operation with dependencies
emitOps([{
id: `op_${uuidv4()}`,
ts: new Date().toISOString(),
provider: 'myprovider',
op: 'myprovider.add_view',
target: viewId,
payload: {
viewId,
name: viewName,
schemaId,
definition: viewSql,
extractedDependencies: extractedDeps, // Store for backend
},
}]);
}
SQL Generator - Dependency Graph
Backend (Python) - Build dependency graph and topologically sort operations:
from schemax.providers.base.dependency_graph import DependencyGraph
class MyProviderSQLGenerator(BaseSQLGenerator):
def generate_sql_with_mapping(self, operations: list[Operation]) -> SQLGenerationResult:
# Build dependency graph
graph = self._build_dependency_graph(operations)
# Topologically sort (dependencies first)
sorted_ops = self._topological_sort_with_fallback(operations, graph)
# Generate SQL in correct order
statements = []
for op in sorted_ops:
result = self._generate_sql_for_operation(op)
statements.extend(result.statements)
return SQLGenerationResult(statements=statements, warnings=[])
def _extract_operation_dependencies(self, op: Operation) -> list[tuple[str, str]]:
"""Extract dependencies from operation"""
deps = []
# View dependencies
if op.op == 'myprovider.add_view':
extracted = op.payload.get('extractedDependencies', {})
for table_id in extracted.get('tables', []):
deps.append((table_id, 'table'))
for view_id in extracted.get('views', []):
deps.append((view_id, 'view'))
return deps
Automatic FQN Qualification (Optional but Recommended)
Use SQLGlot to automatically qualify unqualified table references in view definitions:
import sqlglot
from sqlglot import expressions as exp
def _qualify_view_definition(self, definition: str) -> str:
"""Qualify unqualified table/view references"""
parsed = sqlglot.parse_one(definition, dialect="databricks")
# Build name → FQN mapping
name_to_fqn = {}
for object_id, fqn in self.id_name_map.items():
if "." in fqn:
parts = fqn.split(".")
name = parts[-1]
name_to_fqn[name] = fqn
# Replace table references
for table_node in parsed.find_all(exp.Table):
table_name = str(table_node.name)
if table_name in name_to_fqn:
fqn = name_to_fqn[table_name]
# Update with qualified names
parts = fqn.split(".")
table_node.set("catalog", exp.to_identifier(parts[0], quoted=True))
table_node.set("db", exp.to_identifier(parts[1], quoted=True))
table_node.set("this", exp.to_identifier(parts[2], quoted=True))
return parsed.sql(dialect="databricks", pretty=True)
Benefits:
- ✅ Views work regardless of current catalog/schema context
- ✅ Compatible with SQL Statement Execution API
- ✅ Prevents
TABLE_OR_VIEW_NOT_FOUNDerrors
Examples
See the Unity Catalog provider as a reference implementation:
- TypeScript:
packages/vscode-extension/src/providers/unity/ - Python:
packages/python-sdk/src/schemax/providers/unity/
Getting Help
- Check existing providers for patterns
- Review base provider tests for requirements
- Ask in GitHub Discussions
- Submit draft PR for early feedback
Checklist
Before submitting your provider:
- Implements all required interface methods
- Has complete operation metadata
- Generates valid SQL for all operations
- Passes provider compliance tests
- Has provider-specific tests
- Includes documentation
- Follows file structure convention
- Registers with ProviderRegistry
- Handles all edge cases
- Uses immutable state updates
Next Steps: See the provider examples (when available) for complete example implementations.