| applications | ||
| bootstrap | ||
| infrastructure | ||
| cloudflare-sealedsecret.yaml | ||
| README.md | ||
Homelab GitOps Multi-Cluster Repository
Welcome to your homelab's GitOps source of truth! This repository uses ArgoCD and Bitnami Sealed Secrets to manage two k3s clusters utilizing Longhorn storage:
- Talyn (Dev)
- Moya (Prod)
Directory Structure
gitops-homelab/
|── Talyn
├──── bootstrap/ # Cluster bootstrap (ArgoCD & Sealed Secrets)
│ ├── apps/ # Bootstraps Dev
├──── infrastructure/ # Infrastructure-level services
│ ├── bases/ # Traefik, Authentik, PostgreSQL, Redis, Qdrant
│ └── overlays/ # Environment-specific configuration
└──── applications/ # Application-level workloads
├── bases/ # Home Assistant, n8n, Media Stack (Arrs)
└── overlays/ # Environment-specific overlays (ingress, hostNetwork)
Cluster Node Topology (Talyn - Dev)
Your Dev cluster is mapped on the 192.168.1.1/24 network as follows. Enceladus is configured as an untainted control plane, meaning it actively executes workloads as a worker node alongside Callisto and Triton:
| Node Name | Node Type | IP Address | Exposed Roles |
|---|---|---|---|
| Enceladus | Control Plane / Worker | 192.168.1.33 |
Master / API Server / Workload Execution |
| Callisto | Worker | 192.168.1.31 |
Workload Execution / Storage |
| Triton | Worker | 192.168.1.32 |
Workload Execution / Storage |
Enceladus Hardware Specifications
- CPU: Intel(R) Core(TM) i5-9500T @ 2.20GHz (6 Cores / 6 Threads, max 3.7GHz, Low-Power 35W TDP)
- System RAM: 15GiB (16GB)
- Graphics Accelerator (iGPU): Intel(R) CoffeeLake-S GT2 [UHD Graphics 630] (supports Intel Quick Sync Video (QSV) for low-power media transcoding)
- Exposed Role: Control Plane / Worker Node (hosting Dev API/scheduling controls and actively executing workloads)
Triton Hardware Specifications
- CPU: Intel(R) Core(TM) i5-9500T @ 2.20GHz (6 Cores / 6 Threads, max 3.7GHz, Low-Power 35W TDP)
- System RAM: 7.5GiB (8GB)
- Graphics Accelerator (iGPU): Intel(R) CoffeeLake-S GT2 [UHD Graphics 630] (supports Intel Quick Sync Video (QSV) for low-power media transcoding)
- Note: Triton shares the exact same 6-Core processor and iGPU as Enceladus, giving your Dev cluster dual QSV-capable nodes.
Callisto Hardware Specifications
- CPU: Intel(R) Core(TM) i5-9500T @ 2.20GHz (6 Cores / 6 Threads, max 3.7GHz, Low-Power 35W TDP)
- System RAM: 7.5GiB (8GB)
- Graphics Accelerator (iGPU): Intel(R) CoffeeLake-S GT2 [UHD Graphics 630] (supports Intel Quick Sync Video (QSV) for low-power media transcoding)
- Note: Callisto forms an exact hardware triplet with Triton and Enceladus, creating a 100% homogenous CPU/GPU node pool for Talyn (Dev).
Cluster Node Topology (Moya - Prod)
Your Prod cluster is mapped on the 192.168.1.1/24 network with a robust 5-node setup. Titan is also untainted to maximize hardware utilization, serving worker duties in addition to hosting the control plane:
| Node Name | Node Type | IP Address | Exposed Roles |
|---|---|---|---|
| Titan | Control Plane / Worker | 192.168.1.38 |
Master / API Server / Workload Execution |
| Ganymede | Worker (GPU Accelerated) | 192.168.1.39 |
Workloads (RTX A2000 + Quadro K2200) / Storage |
| Europa | Worker | 192.168.1.40 |
Workload Execution / Storage |
| Phobos | Worker | 192.168.1.37 |
Workload Execution / Storage |
| Deimos | Worker | 192.168.1.36 |
Workload Execution / Storage |
Titan Hardware Specifications
- CPU: Intel(R) Core(TM) i9-10900 @ 2.80GHz (10 Cores / 20 Threads, max 5.2GHz)
- System RAM: 31GiB (32GB)
- Graphics Accelerator (iGPU): Intel(R) CometLake-S GT2 [UHD Graphics 630] (supports Intel Quick Sync Video (QSV) for ultra-efficient video transcoding)
- Exposed Role: Control Plane / Worker Node (actively executing API, scheduling, database, and workload pods)
Ganymede Hardware Specifications
-
CPU: Intel(R) Core(TM) i7-7700 @ 3.60GHz (4 Cores / 8 Threads, max 4.2GHz)
-
System RAM: 62GiB (64GB)
-
Graphics Accelerators (GPUs):
- NVIDIA RTX A2000 (6GB VRAM, PCIe Gen3 x16, OffPersistence)
- NVIDIA Quadro K2200 (4GB VRAM, PCIe Gen3 x16, OffPersistence)
-
Driver & Tooling: NVIDIA-SMI Driver Version
550.163.01, CUDA12.4
Deimos Hardware Specifications
- CPU: Intel(R) Core(TM) i7-7700T @ 2.90GHz (4 Cores / 8 Threads, max 3.8GHz, Low-Power 35W TDP)
- System RAM: 15GiB (16GB)
- Graphics Accelerator (iGPU): Intel(R) HD Graphics 630 (supports Intel Quick Sync Video (QSV) for low-power media transcoding)
Phobos Hardware Specifications
- CPU: Intel(R) Core(TM) i7-7700T @ 2.90GHz (4 Cores / 8 Threads, max 3.8GHz, Low-Power 35W TDP)
- System RAM: 15GiB (16GB)
- Graphics Accelerator (iGPU): Intel(R) HD Graphics 630 (supports Intel Quick Sync Video (QSV) for low-power media transcoding)
- Note: Phobos is a hardware twin of Deimos, making them a perfect pair for high-availability application replicas.
Europa Hardware Specifications
- CPU: Intel(R) Core(TM) i7-7700T @ 2.90GHz (4 Cores / 8 Threads, max 3.8GHz, Low-Power 35W TDP)
- System RAM: 15GiB (16GB)
- Graphics Accelerator (iGPU): Intel(R) HD Graphics 630 (supports Intel Quick Sync Video (QSV) for low-power media transcoding)
- Note: Europa is a hardware triplet with Phobos and Deimos, forming a highly symmetric, identical node pool on Moya.
GPU Workload Acceleration in k3s
Having dual NVIDIA GPUs on Ganymede makes it the perfect host for Plex/Tdarr hardware transcoding or running local LLMs (like Ollama) for your AI workflows (n8n/Qdrant).
To expose these GPUs to your cluster pods:
- Ensure the NVIDIA Container Toolkit is installed on the host OS of Ganymede.
- In k3s, configure the container runtime to use
nvidia-container-runtimeby default. - Deploy the NVIDIA Device Plugin for Kubernetes inside the cluster. You can then request GPU capacity directly in your Deployments:
resources: limits: nvidia.com/gpu: 1 # Requests one GPU (either RTX A2000 or K2200) - Tip: Use Kubernetes node taints or node selectors (
kubernetes.io/hostname: Ganymede) to guarantee your media containers schedule exclusively on Ganymede!
Note
Taint Management: In lightweight distributions like k3s, control-plane nodes are untainted by default. If your control-plane nodes ever become tainted (preventing pods from scheduling), you can untaint them manually by running:
kubectl taint nodes Enceladus node-role.kubernetes.io/control-plane:NoSchedule- --context=talyn kubectl taint nodes Enceladus node-role.kubernetes.io/master:NoSchedule- --context=talyn
External Homelab Storage Inventory
Your external NAS assets serve crucial data layers, persistent backups, and snapshots:
| Server Type | OS / System | IP Address | Primary Role | Target Integration |
|---|---|---|---|---|
| TrueNAS | TrueNAS Scale/Core | 192.168.1.30 |
Shared Media / Pictures / Docs | Mounted directly inside media pods via NFS |
| DSM | Synology DSM | 192.168.1.34 |
Production Backup Target | Configured as Moya's Longhorn backup target (NFS) |
| OMV | OpenMediaVault | 192.168.1.35 |
Dev Storage / Snapshots / Backups | Configured as Talyn's Longhorn backup target (NFS) |
TrueNAS Server Hardware Specifications (Dell T330)
- Hardware Chassis: Dell PowerEdge T330 Tower Server
- CPU: Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz (4 Cores / 8 Threads, max 3.9GHz, Enterprise Server Grade)
- System RAM: 62GiB (64GB ECC - highly optimized for high-speed ZFS caching via ARC)
- Graphics Accelerator (GPU): NVIDIA Corporation GP106GL [Quadro P2000] (5GB VRAM, Pascal architecture, famous for unlocked unlimited concurrent NVENC transcodes)
- ZFS Storage Pool Layout (Massive ~98TB Raw Pool):
- Data Drives (7 x ~14TB mechanical drives):
sdathroughsdg(each partition is12.7TiB).- Homelab Architecture: Typically running as a RAID-Z2 vdev pool (survives 2 simultaneous drive failures), providing ~70TiB of highly secure, redundant usable storage for Media, Pictures, and Documents.
- System Boot Drive (240GB SSD):
sdh(partitionssdh1,sdh2bootloader,sdh3TrueNAS OS core). - Application / Cache SSD (500GB):
sdi(partitionsdi1@463.8GiB).- Homelab Utility: Serves as the high-speed pool for Plex metadata/index database containers, or as a dedicated ZFS read cache (L2ARC) or write log (SLOG) to accelerate spin-up access on the mechanical pool.
- Data Drives (7 x ~14TB mechanical drives):
- ZFS Storage Pools & Dataset Catalog:
- AppPool (SSD Storage - ~450GB): Located on SSD
sdi, currently utilizing4.73 GiBfor high-speed application runtime caches (e.g., Plex SQLite database, indexing metadata). - Pool1 (Redundant RAID-Z2 - ~50TB Usable): Grouping 7 x 14TB mechanical drives (
sda-sdg), currently utilizing6.62 TiBwith44.1 TiBavailable. It hosts:.truenas_containers(internal TrueNAS containers)Cloud(fully shared via SMB and NFS)Media(mount path:/mnt/Pool1/Media), structured into dedicated datasets:BooksComicsCookingTVHomeTVMovies(~3.4 TiB used)Software(~63 GiB used)TV(~3.15 TiB used)
- AppPool (SSD Storage - ~450GB): Located on SSD
- Active In-Cluster Mounting: Your Kubernetes workloads bind directly to this layout via
/mnt/Pool1/Mediausingtruenas-media-pvc. Because everything resides inside this single/MediaNFS export, your Arrs will successfully perform instant atomic moves and hardlinks between your download directories (e.g.,/mnt/Pool1/Media/downloads) and your final libraries (e.g.,/mnt/Pool1/Media/Moviesand/mnt/Pool1/Media/TV) without crossing filesystems! - Current Primary Workload: Hosting shared datasets (Media, Pictures, Documents) and running Plex Media Server natively as a container app using the Quadro P2000 for hardware-accelerated transcoding.
DNS & Ingress Routing (Cloudflare + Homelab LAN)
Since your domains are hosted on Cloudflare, and these clusters run in your local homelab, you have two elegant options to route traffic securely to Traefik:
Option A: Local DNS Rewrite (Recommended for Homelab Privacy)
To keep your dev/prod web traffic fully internal and secure:
- Keep your Cloudflare DNS public settings clean (no public records pointing to your private IPs).
- On your local LAN DNS server (e.g., your UniFi gateway or Pi-hole), set up a wildcard DNS Rewrite (or local A records):
*.davidcrilly.com-> Point to worker node IPs:192.168.1.31and192.168.1.32
- Traefik's LoadBalancer (integrated via k3s Klipper) listens on all node IPs and will intercept and route the traffic to the correct pods based on the HTTP Host headers.
Option B: Cloudflare Tunnel (Cloudflared)
To securely expose select services (like Overseerr/Seerr) to the internet without opening ports on your router:
- We can deploy a lightweight
cloudflareddeployment to your cluster. - The tunnel creates a secure outbound-only connection to Cloudflare's edge.
- You map subdomains in the Cloudflare Dashboard directly to the internal service name (e.g.
http://overseerr.applications.svc.cluster.local:5055).
Prerequisites
- Make sure your local machine has
kubectlconfigured with access to both clusters. - Verify the context names match your configurations:
kubectl config get-contexts # You should see contexts named: 'talyn' and 'moya'
Step-by-Step Bootstrap Guide
Choose the cluster you want to bootstrap first (e.g., Talyn for Dev):
Phase 1: Install ArgoCD & Sealed Secrets Controller
-
Switch to the desired context:
kubectl config use-context talyn -
Deploy the bootstrap layer (ArgoCD and Sealed Secrets):
kubectl apply --server-side --force-conflicts -k bootstrap/talynThis creates the
argocdandkube-systemresources and spins up both controllers. -
Verify the pods are running:
kubectl get pods -n argocd kubectl get pods -n kube-system -l app.kubernetes.io/name=sealed-secrets
Phase 2: Create Sealed Secrets for Databases & Credentials
Since credentials must not be stored in raw Git manifests, we use Bitnami Sealed Secrets.
1. Download the Public Encryption Key from your cluster:
Each cluster has its own distinct key pair. Fetch the active public key:
kubeseal --controller-name=sealed-secrets --controller-namespace=kube-system --fetch-cert > talyn-sealed-secrets-pub.pem
2. Generate a Secret and Seal it:
Let's create a database secret for PostgreSQL as an example.
Create a local raw secret file (do NOT commit this to Git!):
# Create database credentials
kubectl create secret generic postgres-secret \
--namespace database \
--from-literal=postgres-password="SuperSecurePassword123!" \
--from-literal=password="UserSecurePassword123!" \
--dry-run=client -o yaml > temp-postgres-secret.yaml
Encrypt (Seal) the secret using kubeseal and the cluster's public key:
kubeseal --format=yaml --cert=talyn-sealed-secrets-pub.pem < temp-postgres-secret.yaml > infrastructure/overlays/talyn/postgres-sealedsecret.yaml
Now you can safely delete temp-postgres-secret.yaml and talyn-sealed-secrets-pub.pem. The resulting postgres-sealedsecret.yaml is fully encrypted and safe to commit to your Git repository!
3. Update Kustomization:
Include the SealedSecret in your overlay kustomization.yaml:
resources:
- ../../bases/postgresql
- postgres-sealedsecret.yaml
4. Seal the Cloudflare API Token for cert-manager:
To enable cert-manager to automatically solve Let's Encrypt DNS-01 challenges, we must create a SealedSecret containing your Cloudflare API Token.
- Generate a Cloudflare API Token with
Zone:DNS:EditandZone:Zone:Readpermissions. - Create a local raw secret in the
cert-managernamespace (do NOT commit this to Git!):kubectl create secret generic cloudflare-api-token-secret \ --namespace cert-manager \ --from-literal=api-token="YOUR_CLOUDFLARE_API_TOKEN_HERE" \ --dry-run=client -o yaml > temp-cloudflare-secret.yaml - Seal the secret for the target cluster:
(For Prod, use the Moya context public key and output to the moya overlay directory)kubeseal --format=yaml --cert=talyn-sealed-secrets-pub.pem < temp-cloudflare-secret.yaml > infrastructure/overlays/talyn/cloudflare-sealedsecret.yaml - Include the SealedSecret in your overlay
kustomization.yaml:resources: ... - cloudflare-sealedsecret.yaml
Homelab Storage Configurations & GitOps
1. Mounting TrueNAS NFS Media Share (192.168.1.30)
Inside applications/bases/media-stack/truenas-nfs.yaml, we have defined a PersistentVolume (PV) and PersistentVolumeClaim (PVC) specifically configured for mounting your TrueNAS NFS dataset:
- PV:
truenas-media-pv - PVC:
truenas-media-pvc
To bind your media pods (Sonarr, Radarr, Overseerr, SABnzbd, qBittorrent) to TrueNAS directly, update your container deployments (e.g., sonarr.yaml, radarr.yaml) to mount the TrueNAS NFS PVC:
volumes:
- name: media
persistentVolumeClaim:
claimName: truenas-media-pvc
Tip: Standardizing on this single NFS mount for both download locations and library directories will allow atomic moves and hardlinks directly on your TrueNAS pools!
2. Automated Longhorn Backup Targets (OMV & DSM)
We have fully automated the configuration of Longhorn backup endpoints. When ArgoCD syncs the infrastructure layer, it applies a custom Setting CRD that directs all volume backups and snapshots directly to your target network shares:
- Talyn (Dev): Streams snapshots to OMV at
nfs://192.168.1.35:/export/dev-backups - Moya (Prod): Streams snapshots to DSM at
nfs://192.168.1.34:/volume1/prod-backups
To check these in the Longhorn GUI, navigate to Settings > General > Backup Target and you will see the paths already pre-filled.
High-Availability PostgreSQL with CloudNativePG (CNPG)
Rather than utilizing single-replica databases, this repository deploys CloudNativePG (CNPG), an advanced PostgreSQL operator that implements highly-available, self-healing database clusters:
- HA Design: Automatically provisions 3 instances (1 primary, 2 hot-standby replicas) utilizing Longhorn storage.
- Self-Healing Failovers: If the primary node crashes (e.g., a worker node goes offline), the CNPG operator automatically promotes a healthy standby replica to primary within seconds.
- Connection Routing: Applications connect using the read-write primary service name, eliminating connection configuration errors during failovers:
- Hostname:
postgresql-rw.database.svc.cluster.local(Write operations) - Port:
5432 - Used by: Authentik and n8n.
- Hostname:
To monitor your HA cluster state, install the kubectl cnpg plugin and run:
kubectl cnpg status postgresql -n database --context=moya
Cluster Monitoring & Observability (kube-prometheus-stack)
To keep an eye on your cluster operations, this repository installs kube-prometheus-stack in the monitoring namespace. It bundles the Prometheus Operator, Prometheus Server, node-exporter, kube-state-metrics, and Grafana:
- Grafana Dashboards: Automatically exposes a web UI secured with cert-manager HTTPS:
- Talyn (Dev):
grafana.davidcrilly.com - Moya (Prod):
grafana.thecrillys.com - Default login credentials:
admin/admin(change immediately on first login!).
- Talyn (Dev):
- Longhorn Observability: Automatically scrapes metrics from your Longhorn storage controllers. Import Grafana Dashboard ID
16524or13070to view real-time storage bandwidth, read/write I/O operations, replica health, and volume capacity! - PostgreSQL HA Observability: Automatically discovers your CloudNativePG database nodes and metrics. Import Grafana Dashboard ID
14114to monitor write throughput, transaction latency, and replication lag between your primary database node and replicas. - Resource Constraints (Homelab Friendly): Configured with local storage constraints (7d retention, 15Gi capacity) and strict limits (2Gi RAM ceiling) to prevent Prometheus from draining your worker node resources.
Private Cloud & Local AI Suite (Nextcloud, Paperless, Ollama, Open WebUI)
This repository deploys a high-end, self-hosted private cloud and local AI processing suite in the applications namespace:
1. Nextcloud Private Cloud
- High-Performance DB & Cache: Connected to the CloudNativePG HA PostgreSQL master (
postgresql-rw) and Redis base for session/transient caching. - NFS Mount Target: Nextcloud's core data directory binds directly to your TrueNAS
Clouddataset (/mnt/Pool1/Cloudat192.168.1.30) viatruenas-cloud-pvc. This keeps your files safely stored on your redundant ZFS array rather than filling up local node disks.
2. Paperless-ngx & Paperless-ai Document Management
- Paperless-ngx: Indexes and OCRs your scanned documents using the integrated Tesseract engine. All system configs, databases, and media paths utilize high-availability Longhorn storage.
- Paperless-ai Integration: A background analyzer that runs side-by-side with Paperless-ngx. It automatically watches your uploaded documents, connects to your local LLM engine (Ollama), and applies automated tags, document categories, and metadata using local artificial intelligence!
3. Ollama & Open WebUI Chat (GPU Accelerated)
- Ollama LLM Engine: Configured to download and run local LLMs (like Llama3 or Mistral).
- Dev Overlay (Talyn): Runs on CPU.
- Prod Overlay (Moya): Patched to schedule exclusively on your GPU-accelerated worker node Ganymede (
nodeSelector), requesting direct access to your NVIDIA RTX A2000 / Quadro K2200 via CUDA resources (nvidia.com/gpu: 1). This executes AI model inference in milliseconds!
- Open WebUI Dashboard: Exposes a beautiful, private ChatGPT-like chat dashboard.
- Exposed on Dev:
https://openwebui.davidcrilly.com - Exposed on Prod:
https://openwebui.thecrillys.com
- Exposed on Dev:
Phase 3: Apply the Root Application
ArgoCD uses the "App of Apps" pattern. Applying the root-app.yaml config will cause ArgoCD to recursively scan the repository and manage your apps automatically.
-
Crucial Step: Open
bootstrap/talyn/root-app.yaml(and/orbootstrap/moya/root-app.yaml) and update therepoURLto point to your actual Git repository URL. -
Apply the Root Application:
kubectl apply -f bootstrap/talyn/root-app.yaml --context=talyn -
Access the ArgoCD UI to watch it sync:
- Retrieve the auto-generated admin password:
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($_)) - Port-forward to access the dashboard:
kubectl port-forward svc/argocd-server -n argocd 8080:443 - Open
https://localhost:8080in your browser, log in asadmin, and watch the clusters bootstrap!
- Retrieve the auto-generated admin password: