provisioning/PROVISIONING.md
2025-10-07 10:41:14 +01:00

31 KiB

Provisioning Logo

Provisioning

Provisioning - Infrastructure Automation Platform

A modular, declarative Infrastructure as Code (IaC) platform for managing complete infrastructure lifecycles

Table of Contents


What is Provisioning?

Provisioning is a comprehensive Infrastructure as Code (IaC) platform designed to manage complete infrastructure lifecycles: cloud providers, infrastructure services, clusters, and isolated workspaces across multiple cloud/local environments.

Extensible and customizable by design, it delivers type-safe, configuration-driven workflows with enterprise security (encrypted configuration, Cosmian KMS integration, Cedar policy engine, secrets management, authorization and permissions control, compliance checking, anomaly detection) and adaptable deployment modes (interactive UI, CLI automation, unattended CI/CD) suitable for any scale from development to production.

Technical Definition

Declarative Infrastructure as Code (IaC) platform providing:

  • Type-safe, configuration-driven workflows with schema validation and constraint checking
  • Modular, extensible architecture: cloud providers, task services, clusters, workspaces
  • Multi-cloud abstraction layer with unified API (UpCloud, AWS, local infrastructure)
  • High-performance state management:
    • Graph database backend for complex relationships
    • Real-time state tracking and queries
    • Multi-model data storage (document, graph, relational)
  • Enterprise security stack:
    • Encrypted configuration and secrets management
    • Cosmian KMS integration for confidential key management
    • Cedar policy engine for fine-grained access control
    • Authorization and permissions control via platform services
    • Compliance checking and policy enforcement
    • Anomaly detection for security monitoring
    • Audit logging and compliance tracking
  • Hybrid orchestration: Rust-based performance layer + scripting flexibility
  • Production-ready features:
    • Batch workflows with dependency resolution
    • Checkpoint recovery and automatic rollback
    • Parallel execution with state management
  • Adaptable deployment modes:
    • Interactive TUI for guided setup
    • Headless CLI for scripted automation
    • Unattended mode for CI/CD pipelines
  • Hierarchical configuration system with inheritance and overrides

What It Does

  • Provisions Infrastructure - Create servers, networks, storage across multiple cloud providers
  • Installs Services - Deploy Kubernetes, containerd, databases, monitoring, and 50+ infrastructure components
  • Manages Clusters - Orchestrate complete cluster deployments with dependency management
  • Handles Configuration - Hierarchical configuration system with inheritance and overrides
  • Orchestrates Workflows - Batch operations with parallel execution and checkpoint recovery
  • Manages Secrets - SOPS/Age integration for encrypted configuration

Why Provisioning?

The Problems It Solves

1. Multi-Cloud Complexity

Problem: Each cloud provider has different APIs, tools, and workflows.

Solution: Unified abstraction layer with provider-agnostic interfaces. Write configuration once, deploy anywhere.

# Same configuration works on UpCloud, AWS, or local infrastructure
server: Server {
    name = "web-01"
    plan = "medium"      # Abstract size, provider-specific translation
    provider = "upcloud" # Switch to "aws" or "local" as needed
}

2. Dependency Hell

Problem: Infrastructure components have complex dependencies (Kubernetes needs containerd, Cilium needs Kubernetes, etc.).

Solution: Automatic dependency resolution with topological sorting and health checks.

# Provisioning resolves: containerd → etcd → kubernetes → cilium
taskservs = ["cilium"]  # Automatically installs all dependencies

3. Configuration Sprawl

Problem: Environment variables, hardcoded values, scattered configuration files.

Solution: Hierarchical configuration system with 476+ config accessors replacing 200+ ENV variables.

Defaults → User → Project → Infrastructure → Environment → Runtime

4. Imperative Scripts

Problem: Brittle shell scripts that don't handle failures, don't support rollback, hard to maintain.

Solution: Declarative KCL configurations with validation, type safety, and automatic rollback.

5. Lack of Visibility

Problem: No insight into what's happening during deployment, hard to debug failures.

Solution:

  • Real-time workflow monitoring
  • Comprehensive logging system
  • Web-based control center
  • REST API for integration

6. No Standardization

Problem: Each team builds their own deployment tools, no shared patterns.

Solution: Reusable task services, cluster templates, and workflow patterns.


Core Concepts

1. Providers

Cloud infrastructure backends that handle resource provisioning.

  • UpCloud - Primary cloud provider
  • AWS - Amazon Web Services integration
  • Local - Local infrastructure (VMs, Docker, bare metal)

Providers implement a common interface, making infrastructure code portable.

2. Task Services (TaskServs)

Reusable infrastructure components that can be installed on servers.

Categories:

  • Container Runtimes - containerd, Docker, Podman, crun, runc, youki
  • Orchestration - Kubernetes, etcd, CoreDNS
  • Networking - Cilium, Flannel, Calico, ip-aliases
  • Storage - Rook-Ceph, local storage
  • Databases - PostgreSQL, Redis, SurrealDB
  • Observability - Prometheus, Grafana, Loki
  • Security - Webhook, KMS, Vault
  • Development - Gitea, Radicle, ORAS

Each task service includes:

  • Version management
  • Dependency declarations
  • Health checks
  • Installation/uninstallation logic
  • Configuration schemas

3. Clusters

Complete infrastructure deployments combining servers and task services.

Examples:

  • Kubernetes Cluster - HA control plane + worker nodes + CNI + storage
  • Database Cluster - Replicated PostgreSQL with backup
  • Build Infrastructure - BuildKit + container registry + CI/CD

Clusters handle:

  • Multi-node coordination
  • Service distribution
  • High availability
  • Rolling updates

4. Workspaces

Isolated environments for different projects or deployment stages.

workspace_librecloud/     # Production workspace
├── infra/                # Infrastructure definitions
├── config/               # Workspace configuration
├── extensions/           # Custom modules
└── runtime/              # State and runtime data

workspace_dev/            # Development workspace
├── infra/
└── config/

Switch between workspaces with single command:

provisioning workspace switch librecloud

5. Workflows

Coordinated sequences of operations with dependency management.

Types:

  • Server Workflows - Create/delete/update servers
  • TaskServ Workflows - Install/remove infrastructure services
  • Cluster Workflows - Deploy/scale complete clusters
  • Batch Workflows - Multi-cloud parallel operations

Features:

  • Dependency resolution
  • Parallel execution
  • Checkpoint recovery
  • Automatic rollback
  • Progress monitoring

Architecture

System Components

┌─────────────────────────────────────────────────────────────────┐
│                     User Interface Layer                        │
│  • CLI (provisioning command)                                   │
│  • Web Control Center (UI)                                      │
│  • REST API                                                     │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│                     Core Engine Layer                           │
│  • Command Routing & Dispatch                                   │
│  • Configuration Management                                     │
│  • Provider Abstraction                                         │
│  • Utility Libraries                                            │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│                   Orchestration Layer                           │
│  • Workflow Orchestrator (Rust/Nushell hybrid)                  │
│  • Dependency Resolver                                          │
│  • State Manager                                                │
│  • Task Scheduler                                               │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│                    Extension Layer                              │
│  • Providers (Cloud APIs)                                       │
│  • Task Services (Infrastructure Components)                    │
│  • Clusters (Complete Deployments)                              │
│  • Workflows (Automation Templates)                             │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│                  Infrastructure Layer                           │
│  • Cloud Resources (Servers, Networks, Storage)                 │
│  • Kubernetes Clusters                                          │
│  • Running Services                                             │
└─────────────────────────────────────────────────────────────────┘

Directory Structure

project-provisioning/
├── provisioning/              # Core provisioning system
│   ├── core/                  # Core engine and libraries
│   │   ├── cli/               # Command-line interface
│   │   ├── nulib/             # Core Nushell libraries
│   │   ├── plugins/           # System plugins
│   │   └── scripts/           # Utility scripts
│   │
│   ├── extensions/            # Extensible components
│   │   ├── providers/         # Cloud provider implementations
│   │   ├── taskservs/         # Infrastructure service definitions
│   │   ├── clusters/          # Complete cluster configurations
│   │   └── workflows/         # Core workflow templates
│   │
│   ├── platform/              # Platform services
│   │   ├── orchestrator/      # Rust orchestrator service
│   │   ├── control-center/    # Web control center
│   │   ├── mcp-server/        # Model Context Protocol server
│   │   ├── api-gateway/       # REST API gateway
│   │   ├── oci-registry/      # OCI registry for extensions
│   │   └── installer/         # Platform installer (TUI + CLI)
│   │
│   ├── kcl/                   # KCL configuration schemas
│   ├── config/                # Configuration files
│   ├── templates/             # Template files
│   └── tools/                 # Build and distribution tools
│
├── workspace/                 # User workspaces and data
│   ├── infra/                 # Infrastructure definitions
│   ├── config/                # User configuration
│   ├── extensions/            # User extensions
│   └── runtime/               # Runtime data and state
│
└── docs/                      # Documentation
    ├── user/                  # User guides
    ├── api/                   # API documentation
    ├── architecture/          # Architecture docs
    └── development/           # Development guides

Platform Services

1. Orchestrator (platform/orchestrator/)

  • Language: Rust + Nushell
  • Purpose: Workflow execution, task scheduling, state management
  • Features:
    • File-based persistence
    • Priority processing
    • Retry logic with exponential backoff
    • Checkpoint-based recovery
    • REST API endpoints

2. Control Center (platform/control-center/)

  • Language: Web UI + Backend API
  • Purpose: Web-based infrastructure management
  • Features:
    • Dashboard views
    • Real-time monitoring
    • Interactive deployments
    • Log viewing

3. MCP Server (platform/mcp-server/)

  • Language: Nushell
  • Purpose: Model Context Protocol integration for AI assistance
  • Features:
    • 7 AI-powered settings tools
    • Intelligent config completion
    • Natural language infrastructure queries

4. OCI Registry (platform/oci-registry/)

  • Purpose: Extension distribution and versioning
  • Features:
    • Task service packages
    • Provider packages
    • Cluster templates
    • Workflow definitions

5. Installer (platform/installer/)

  • Language: Rust (Ratatui TUI) + Nushell
  • Purpose: Platform installation and setup
  • Features:
    • Interactive TUI mode
    • Headless CLI mode
    • Unattended CI/CD mode
    • Configuration generation

Key Features

1. Modular CLI Architecture (v3.2.0)

84% code reduction with domain-driven design.

  • Main CLI: 211 lines (from 1,329 lines)
  • 80+ shortcuts: sserver, ttaskserv, etc.
  • Bi-directional help: provisioning help ws = provisioning ws help
  • 7 domain modules: infrastructure, orchestration, development, workspace, configuration, utilities, generation

2. Configuration System (v2.0.0)

Hierarchical, config-driven architecture.

  • 476+ config accessors replacing 200+ ENV variables
  • Hierarchical loading: defaults → user → project → infra → env → runtime
  • Variable interpolation: {{paths.base}}, {{env.HOME}}, {{now.date}}
  • Multi-format support: TOML, YAML, KCL

3. Batch Workflow System (v3.1.0)

Provider-agnostic batch operations with 85-90% token efficiency.

  • Multi-cloud support: Mixed UpCloud + AWS + local in single workflow
  • KCL schema integration: Type-safe workflow definitions
  • Dependency resolution: Topological sorting with soft/hard dependencies
  • State management: Checkpoint-based recovery with rollback
  • Real-time monitoring: Live progress tracking

4. Hybrid Orchestrator (v3.0.0)

Rust/Nushell architecture solving deep call stack limitations.

  • High-performance coordination layer
  • File-based persistence
  • Priority processing with retry logic
  • REST API for external integration
  • Comprehensive workflow system

5. Workspace Switching (v2.0.5)

Centralized workspace management.

  • Single-command switching: provisioning workspace switch <name>
  • Automatic tracking: Last-used timestamps, active workspace markers
  • User preferences: Global settings across all workspaces
  • Workspace registry: Centralized configuration in user_config.yaml

6. Interactive Guides (v3.3.0)

Step-by-step walkthroughs and quick references.

  • Quick reference: provisioning sc (fastest)
  • Complete guides: from-scratch, update, customize
  • Copy-paste ready: All commands include placeholders
  • Beautiful rendering: Uses glow, bat, or less

7. Test Environment Service (v3.4.0)

Automated container-based testing.

  • Three test types: Single taskserv, server simulation, multi-node clusters
  • Topology templates: Kubernetes HA, etcd clusters, etc.
  • Auto-cleanup: Optional automatic cleanup after tests
  • CI/CD integration: Easy integration into pipelines

8. Platform Installer (v3.5.0)

Multi-mode installation system with TUI, CLI, and unattended modes.

  • Interactive TUI: Beautiful Ratatui terminal UI with 7 screens
  • Headless Mode: CLI automation for scripted installations
  • Unattended Mode: Zero-interaction CI/CD deployments
  • Deployment Modes: Solo (2 CPU/4GB), MultiUser (4 CPU/8GB), CICD (8 CPU/16GB), Enterprise (16 CPU/32GB)
  • MCP Integration: 7 AI-powered settings tools for intelligent configuration

9. Version Management

Comprehensive version tracking and updates.

  • Automatic updates: Check for taskserv updates
  • Version constraints: Semantic versioning support
  • Grace periods: Cached version checks
  • Update strategies: major, minor, patch, none

Technology Stack

Core Technologies

Technology Version Purpose Why
Nushell 0.107.1+ Primary shell and scripting language Structured data pipelines, cross-platform, modern built-in parsers (JSON/YAML/TOML)
KCL 0.11.3+ Configuration language Type safety, schema validation, immutability, constraint checking
Rust Latest Platform services (orchestrator, control-center, installer) Performance, memory safety, concurrency, reliability
Tera Latest Template engine Jinja2-like syntax, configuration file rendering, variable interpolation, filters and functions

Data & State Management

Technology Version Purpose Features
SurrealDB Latest High-performance graph database backend Multi-model (document, graph, relational), real-time queries, distributed architecture, complex relationship tracking

Platform Services (Rust-based)

Service Purpose Security Features
Orchestrator Workflow execution, task scheduling, state management File-based persistence, retry logic, checkpoint recovery
Control Center Web-based infrastructure management Authorization and permissions control, RBAC, audit logging
Installer Platform installation (TUI + CLI modes) Secure configuration generation, validation
API Gateway REST API for external integration Authentication, rate limiting, request validation

Security & Secrets

Technology Version Purpose Enterprise Features
SOPS 3.10.2+ Secrets management Encrypted configuration files
Age 1.2.1+ Encryption Secure key-based encryption
Cosmian KMS Latest Key Management System Confidential computing, secure key storage, cloud-native KMS
Cedar Latest Policy engine Fine-grained access control, policy-as-code, compliance checking, anomaly detection

Optional Tools

Tool Purpose
K9s Kubernetes management interface
nu_plugin_tera Nushell plugin for Tera template rendering
nu_plugin_kcl Nushell plugin for KCL integration (CLI required, plugin optional)
glow Markdown rendering for interactive guides
bat Syntax highlighting for file viewing and guides

How It Works

Data Flow

1. User defines infrastructure in KCL
   ↓
2. CLI loads configuration (hierarchical)
   ↓
3. Configuration validated against schemas
   ↓
4. Workflow created with operations
   ↓
5. Orchestrator receives workflow
   ↓
6. Dependencies resolved (topological sort)
   ↓
7. Operations executed in order
   ↓
8. Providers handle cloud operations
   ↓
9. Task services installed on servers
   ↓
10. State persisted and monitored

Example Workflow: Deploy Kubernetes Cluster

Step 1: Define infrastructure in KCL

# infra/my-cluster.k
import provisioning.settings as cfg

settings: cfg.Settings = {
    infra = {
        name = "my-cluster"
        provider = "upcloud"
    }

    servers = [
        {name = "control-01", plan = "medium", role = "control"}
        {name = "worker-01", plan = "large", role = "worker"}
        {name = "worker-02", plan = "large", role = "worker"}
    ]

    taskservs = ["kubernetes", "cilium", "rook-ceph"]
}

Step 2: Submit to Provisioning

provisioning server create --infra my-cluster

Step 3: Provisioning executes workflow

1. Create workflow: "deploy-my-cluster"
2. Resolve dependencies:
   - containerd (required by kubernetes)
   - etcd (required by kubernetes)
   - kubernetes (explicitly requested)
   - cilium (explicitly requested, requires kubernetes)
   - rook-ceph (explicitly requested, requires kubernetes)

3. Execution order:
   a. Provision servers (parallel)
   b. Install containerd on all nodes
   c. Install etcd on control nodes
   d. Install kubernetes control plane
   e. Join worker nodes
   f. Install Cilium CNI
   g. Install Rook-Ceph storage

4. Checkpoint after each step
5. Monitor health checks
6. Report completion

Step 4: Verify deployment

provisioning cluster status my-cluster

Configuration Hierarchy

Configuration values are resolved through a hierarchy:

1. System Defaults (provisioning/config/config.defaults.toml)
   ↓ (overridden by)
2. User Preferences (~/.config/provisioning/user_config.yaml)
   ↓ (overridden by)
3. Workspace Config (workspace/config/provisioning.yaml)
   ↓ (overridden by)
4. Infrastructure Config (workspace/infra/<name>/config.toml)
   ↓ (overridden by)
5. Environment Config (workspace/config/prod-defaults.toml)
   ↓ (overridden by)
6. Runtime Flags (--flag value)

Example:

# System default
[servers]
default_plan = "small"

# User preference
[servers]
default_plan = "medium"  # Overrides system default

# Infrastructure config
[servers]
default_plan = "large"   # Overrides user preference

# Runtime
provisioning server create --plan xlarge  # Overrides everything

Use Cases

1. Multi-Cloud Kubernetes Deployment

Deploy Kubernetes clusters across different cloud providers with identical configuration.

# UpCloud cluster
provisioning cluster create k8s-prod --provider upcloud

# AWS cluster (same config)
provisioning cluster create k8s-prod --provider aws

2. Development → Staging → Production Pipeline

Manage multiple environments with workspace switching.

# Development
provisioning workspace switch dev
provisioning cluster create app-stack

# Staging (same config, different resources)
provisioning workspace switch staging
provisioning cluster create app-stack

# Production (HA, larger resources)
provisioning workspace switch prod
provisioning cluster create app-stack

3. Infrastructure as Code Testing

Test infrastructure changes before deploying to production.

# Test Kubernetes upgrade locally
provisioning test topology load kubernetes_3node | \
  test env cluster kubernetes --version 1.29.0

# Verify functionality
provisioning test env run <env-id>

# Cleanup
provisioning test env cleanup <env-id>

4. Batch Multi-Region Deployment

Deploy to multiple regions in parallel.

# workflows/multi-region.k
batch_workflow: BatchWorkflow = {
    operations = [
        {
            id = "eu-cluster"
            type = "cluster"
            region = "eu-west-1"
            cluster = "app-stack"
        }
        {
            id = "us-cluster"
            type = "cluster"
            region = "us-east-1"
            cluster = "app-stack"
        }
        {
            id = "asia-cluster"
            type = "cluster"
            region = "ap-south-1"
            cluster = "app-stack"
        }
    ]
    parallel_limit = 3  # All at once
}
provisioning batch submit workflows/multi-region.k
provisioning batch monitor <workflow-id>

5. Automated Disaster Recovery

Recreate infrastructure from configuration.

# Infrastructure destroyed
provisioning workspace switch prod

# Recreate from config
provisioning cluster create --infra backup-restore --wait

# All services restored with same configuration

6. CI/CD Integration

Automated testing and deployment pipelines.

# .gitlab-ci.yml
test-infrastructure:
  script:
    - provisioning test quick kubernetes
    - provisioning test quick postgres

deploy-staging:
  script:
    - provisioning workspace switch staging
    - provisioning cluster create app-stack --check
    - provisioning cluster create app-stack --yes

deploy-production:
  when: manual
  script:
    - provisioning workspace switch prod
    - provisioning cluster create app-stack --yes

Getting Started

Quick Start

  1. Install Prerequisites

    # Install Nushell
    brew install nushell  # macOS
    
    # Install KCL
    brew install kcl-lang/tap/kcl  # macOS
    
    # Install SOPS (optional, for secrets)
    brew install sops
    
  2. Add CLI to PATH

    ln -sf "$(pwd)/provisioning/core/cli/provisioning" /usr/local/bin/provisioning
    
  3. Initialize Workspace

    provisioning workspace init my-project
    
  4. Configure Provider

    # Edit workspace config
    provisioning sops workspace/config/provisioning.yaml
    
  5. Deploy Infrastructure

    # Check what will be created
    provisioning server create --check
    
    # Create servers
    provisioning server create --yes
    
    # Install Kubernetes
    provisioning taskserv create kubernetes
    

Learning Path

  1. Start with Guides

    provisioning sc                    # Quick reference
    provisioning guide from-scratch    # Complete walkthrough
    
  2. Explore Examples

    ls provisioning/examples/
    
  3. Read Architecture Docs

  4. Try Test Environments

    provisioning test quick kubernetes
    provisioning test quick postgres
    
  5. Build Custom Extensions

    • Create custom task services
    • Define cluster templates
    • Write workflow automation

Documentation Index

User Documentation

Architecture Documentation

Development Documentation

API Documentation


Project Status

Current Version: Active Development (2025-10-07)

Recent Milestones

  • v2.0.5 (2025-10-06) - Platform Installer with TUI and CI/CD modes
  • v2.0.4 (2025-10-06) - Test Environment Service with container management
  • v2.0.3 (2025-09-30) - Interactive Guides system
  • v2.0.2 (2025-09-30) - Modular CLI Architecture (84% code reduction)
  • v2.0.2 (2025-09-25) - Batch Workflow System (85-90% token efficiency)
  • v2.0.1 (2025-09-25) - Hybrid Orchestrator (Rust/Nushell)
  • v2.0.1 (2025-10-02) - Workspace Switching system
  • v2.0.0 (2025-09-23) - Configuration System (476+ accessors)

Roadmap

  • Platform Services

    • Web Control Center UI completion
    • API Gateway implementation
    • Enhanced MCP server capabilities
  • Extension Ecosystem

    • OCI registry for extension distribution
    • Community task service marketplace
    • Cluster template library
  • Enterprise Features

    • Multi-tenancy support
    • RBAC and audit logging
    • Cost tracking and optimization

Support and Community

Getting Help

  • Documentation: Start with provisioning help or provisioning guide from-scratch
  • Issues: Report bugs and request features on the issue tracker
  • Discussions: Join community discussions for questions and ideas

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

Key areas for contribution:

  • New task service definitions
  • Cloud provider implementations
  • Cluster templates
  • Documentation improvements
  • Bug fixes and testing

License

See LICENSE file in project root.


Maintained By: Architecture Team Last Updated: 2025-10-07 Project Home: provisioning/