Architecture Planning 2024
Tasmanian Cloud Architecture Planning
This document outlines the current and planned architecture for Tasmanian Cloud infrastructure, reflecting recent decisions and technologies.
Overview
Tasmanian Cloud is evolving from a simple Proxmox-based VPS provider to a comprehensive multi-tenant cloud platform with Kubernetes, multi-cloud support, and modern developer tooling.
Current Architecture (As-Is)
Infrastructure Layer
- Proxmox VE 9.1.4 cluster (twnhost1-4)
- LXC containers for system services
- KVM VMs for customer workloads
- Local-lvm storage per node
- UniFi networking with multiple VLANs (3, 60-64, 545, 645)
- NetBird VPN for secure access
Network Configuration
| Host | Network | Purpose |
|---|---|---|
| twnhost1 | 25GbE Mellanox CX5 | High-performance workloads |
| twnhost2 | 2.5GbE | General workloads |
| twnhost3 | 10GbE SFP+ + 10GbE RJ45 | Storage + Network services |
| twnhost4 | 2.5GbE | General workloads |
Current Services
- GlobalSO (Wazuh, Pangolin, NetBird proxy)
- Paymenter (billing)
- Various monitoring tools
Planned Architecture (To-Be)
Platform Vision: Unified Management Experience
Currently, VPS (KVM/LXC), Kubernetes, and Templates are managed through separate interfaces. We are actively developing a unified platform that brings these together:
| Feature | Current State | Planned | Timeline |
|---|---|---|---|
| KVM/LXC Management | O2S Portal | Unified API + CLI | Q2 2024 |
| Kubernetes | Separate provisioning | Unified control plane | Q2 2024 |
| Docker PaaS | Not available | Coolify integration | Q3 2024 |
| Template System | Basic | Docker Compose + AI assistant | Q3 2024 |
| Unified API | Separate endpoints | Single REST/GraphQL API | Q2 2024 |
Customer Impact: You'll be able to manage VMs, Kubernetes clusters, and containerized applications from a single interface with consistent authentication, billing, and networking.
1. Proxmox Infrastructure Updates
LXC for System Services
Continue using LXC for:
- VPN gateways (NetBird/Headscale)
- Monitoring stack (Prometheus, Grafana)
- Databases (PostgreSQL primary)
- Reverse proxies (Pangolin → future Traefik)
- Management tools (Salt master, Temporal)
KVM for Customer Workloads
Standard VM sizes:
| Size | vCPU | RAM | Disk | Use Case |
|---|---|---|---|---|
| Small | 1 | 2GB | 20GB | Development |
| Medium | 2 | 4GB | 40GB | Production apps |
| Large | 4 | 8GB | 80GB | Databases |
| XLarge | 8 | 16GB | 160GB | High performance |
2. Talos Linux for Kubernetes
Replace Ubuntu with Talos Linux for all Kubernetes nodes:
Why Talos:
- Immutable OS (read-only root filesystem)
- API-driven management (no SSH)
- Automatic updates with rollback
- Minimal attack surface (~80MB)
Implementation:
Stage 1: Proxmox (OpenTofu)
├── 3x Control Plane VMs (Talos)
├── 3x Worker VMs (Talos)
└── Cilium CNI (L2 LB, Hubble)
Stage 2: Kubernetes
├── Cilium installation
├── Gateway API
└── Storage classes
Stage 3: ArgoCD
├── GitOps deployment
└── App of apps pattern
Reference: proxmox-talos-opentofu
3. vCluster for Multi-Tenancy
Option 1: vCluster per Tenant
Tenant A → vcluster-tenant-a → Full cluster access
Tenant B → vcluster-tenant-b → Full cluster access
Option 2: Namespaces for Simple Containers
Talos Cluster
├── namespace: tenant-a (NetworkPolicy isolation)
├── namespace: tenant-b (NetworkPolicy isolation)
└── Coolify for container deployments
Decision:
- vCluster for teams needing full K8s control
- Namespaces + Coolify for simple container hosting
4. Crossplane for Infrastructure as Code
Provider Stack:
provider-proxmox-bpg- Proxmox VM/LXC managementprovider-kubernetes- In-cluster resourcesprovider-helm- Application deploymentprovider-aws/gcp- Multi-cloud failover (future)
GitOps Integration:
Git Repo → Flux → Crossplane → Proxmox API
Customer-Facing API:
apiVersion: tascloud.io/v1alpha1
kind: VirtualMachine
spec:
size: medium
image: ubuntu-22.04
region: sydney
5. Storage Architecture
Phase 1: Ceph Migration (Current)
Problem: VMs stuck on local-lvm, slow migrations
Solution:
Ceph Cluster (3+ nodes)
├── vm-fast pool (NVMe SSD)
│ └── VM root disks
├── vm-data pool (SATA SSD)
│ └── VM data volumes
└── backups pool (compressed)
└── Automated backups
Migration Strategy:
- Side-by-side deployment (keep existing network)
- Add vmbr50 (25GbE) for Ceph only
- Gradual VM migration with zero downtime
Phase 2: Tiered Storage
| Tier | Media | Use Case |
|---|---|---|
| Hot | NVMe | VM OS, active databases |
| Warm | SATA SSD | Logs, backups |
| Cold | Cloudflare R2 | Archives, compliance |
6. Networking Updates
High-Speed Storage Network
vmbr50 (25GbE)
├── twnhost1 ↔ twnhost3
└── Ceph replication traffic only
SDN Zones per Tenant
Zone: tenant-acme (VXLAN 10000)
├── VNet: vms (10.100.1.0/24)
├── VNet: k8s (10.100.2.0/24)
└── VNet: services (10.100.3.0/24)
VPN Integration
- Tailscale - Primary VPN mesh
- NetBird - Alternative/backup
- Headscale - Self-hosted coordination
7. Application Platform
Coolify for Container Hosting
Best for: Simple container deployments
Features:
- Git-based deployment
- Automatic SSL
- One-click databases
- Preview environments
Integration:
- O2S frontend
- Logto authentication
- Paymenter billing
Custom Template System
Features:
- 50+ pre-built templates
- Docker Compose converter
- AI assistant for deployment
- Visual composer (drag-and-drop)
8. Developer Portal (Backstage)
Purpose: Internal development, not customer-facing deployment
Integrations:
- Service catalog (GitLab, Kubernetes)
- Proxmox management (custom plugin)
- Scalar API documentation
- Grafana dashboards (embedded)
- GitOps cluster view (Flux)
NOT for:
- Customer VM provisioning (use O2S)
- Production deployments (use Temporal)
9. Management API (Rust)
Core Services:
tascloud-platform/
├── crates/
│ ├── tascloud-core # Domain models
│ ├── tascloud-api # Axum REST API
│ ├── tascloud-cli # Binary Lane-style CLI
│ ├── tascloud-worker # Temporal workflows
│ └── tascloud-temporal # Orchestration
Technology Stack:
- Language: Rust
- Web Framework: Axum
- Database: PostgreSQL + Valkey (cache)
- Object Storage: RustFS (S3-compatible)
- Backups: Cloudflare R2
- Workflows: Temporal
- Search: Meilisearch
10. Central SDK
Architecture:
Rust Core (tascloud-sdk)
↓ FFI
├── Python bindings
├── Go bindings
├── Node.js bindings
└── Java bindings
Composite Design:
- Binary Lane - VPS management style
- Digital Ocean - App platform style
- Microsoft 365 - Organization/tenant style
Integration Points
Logto (Centralized Auth)
All Services → Logto OIDC
├── O2S (customer portal)
├── Paymenter (billing)
├── Backstage (internal)
└── Management API
Paymenter (Billing)
Customer → Paymenter → Proxmox
→ Lago (usage metering)
Temporal (Orchestration)
O2S → Temporal Workflow
├── Reserve quota
├── Paymenter.create_service
├── Proxmox.create_vm
├── Salt.apply_states
├── NetBox.register
└── NetBird.join
Multi-Cloud Strategy (Future)
Phase 1: Proxmox Primary
- 100% self-hosted
- Binary Lane as backup
Phase 2: Failover Capabilities
- Proxmox (primary)
- Binary Lane (secondary)
- OVHcloud (tertiary)
- Hetzner (quaternary)
Phase 3: Full Multi-Cloud
- Smart scheduling across providers
- Cost-based routing
- Geographic distribution
Migration Timeline
Phase 1 (Months 1-2): Foundation
- Deploy Logto, O2S, Paymenter
- Basic tenant isolation
- Ceph storage setup
Phase 2 (Months 3-4): Kubernetes
- Talos Linux deployment
- vCluster setup
- Coolify integration
Phase 3 (Months 5-6): Platform
- Crossplane integration
- Template system
- AI assistant
Phase 4 (Months 7-8): Polish
- Backstage portal
- Advanced monitoring
- Documentation
Phase 5 (Months 9-12): Scale
- Multi-cloud failover
- Advanced networking
- Enterprise features
Key Decisions
Yes
- ✅ Talos Linux for K8s
- ✅ Ceph for shared storage
- ✅ vCluster for multi-tenancy
- ✅ Coolify for simple containers
- ✅ Crossplane for IaC
- ✅ Rust for core services
- ✅ Temporal for workflows
No
- ❌ Backstage for customer deployments
- ❌ Pulumi (use OpenTofu)
- ❌ Direct Proxmox for apps (use abstraction)
- ❌ BGP for MVP
Maybe (Future)
- 🤔 Coolify long-term (vs custom)
- 🤔 Full multi-cloud (Phase 5)
- 🤔 Advanced SDN automation
Documentation Structure
content/docs/
├── index.mdx
├── proxmox.mdx # Proxmox/KVM/LXC
├── talos.mdx # Talos Linux
├── vcluster.mdx # Virtual K8s clusters
├── crossplane.mdx # Infrastructure as code
├── vps.mdx # VM management
├── templates.mdx # Template system
├── kubernetes.mdx # K8s management
├── storage.mdx # Ceph/storage
├── cli.mdx # CLI reference
├── api.mdx # API documentation
├── security.mdx # Security practices
└── twnstack.mdx # TWN-specific
Success Metrics
- Customer provisions VM in < 5 minutes
- 99% provisioning success rate
- 10 concurrent customers (MVP)
- < 1 hour Ceph migration time
- Zero-downtime Kubernetes updates
Related Documents
- proxmox-talos-opentofu
- Talos Documentation
- vCluster Documentation
- Crossplane Documentation
- Ceph Documentation
Contact
For questions or clarifications on this architecture, contact the Tasmanian Cloud engineering team.