Skip to main content

Building a production-grade Kubernetes lab

 Author
Author
Sam McGeown
Steely-eyed missile man

Kubernetes provides the essentials: container scheduling, lifecycle management, and a consistent declarative API for describing what you want to run. What it deliberately does not provide is everything else. Ingress, certificate management, secret storage, persistent volumes — these are left to you, by design, because Kubernetes is an extensible framework, not a complete platform. You compose the platform yourself from the ecosystem of controllers, operators, and tooling built on top of it.

I have been running homelabs in various forms for decades, so this is just my current iteration. The hardware runs across two clusters: three Dell 7040s and three Raspberry Pi 4s. The aim is that the patterns running on that hardware are the same ones you would use in a production environment, which is most of the point. Rather than walking through a list of tools, I want to explain the problems I was trying to solve and which tools I reached for, because the tools make considerably more sense in that context.

GitOps as the foundation
#

Before anything else, there is the question of how you manage a Kubernetes cluster over time. You can apply manifests with kubectl apply, which works fine until you need to rebuild the cluster, roll back a change, or understand what is actually running and why. GitOps is the answer I settled on: Git as the single source of truth, with a controller running inside the cluster that continuously reconciles the live state against what is committed.

At the moment, I use Flux CD for this. Flux watches a Git repository and applies whatever is there. If something drifts — a resource is manually edited, or a pod is deleted and comes back in a different state — Flux corrects it on the next reconciliation cycle. This means the repository is the cluster, which makes rebuilding, auditing, and understanding what you have deployed considerably more tractable. (ArgoCD is the other widely-used GitOps operator; it takes a more UI-centric approach and handles things slightly differently, but solves the same core problem.)

One thing Flux does well that is easy to overlook is dependency management. You cannot install an application that needs a PostgreSQL cluster custom resource before you have installed the PostgreSQL operator. Flux’s Kustomization resources have a dependsOn field that lets you express these ordering constraints explicitly — infrastructure controllers must finish before infrastructure configuration is applied, which must finish before applications are deployed. This sounds obvious, but getting it right is the difference between a cluster that bootstraps cleanly and one that requires manual intervention every time.

The same repository also manages both clusters: the three-node Dell 7040 cluster and the three-node Raspberry Pi 4 cluster. Kustomize variable substitution means the same manifests deploy to both, with cluster-specific values (domain names, network CIDRs, cluster names) injected at sync time rather than duplicated across files. This has saved me from a class of drift where the two clusters slowly diverge because a change was made in one place and forgotten in the other. It also makes it really simple for me to spin up a new cluster and deploy a set of infrastructure and applications, without duplicating the configuration.

%%{init: {'flowchart': {'curve': 'step'}}}%%
flowchart LR
    git[("Git Repository")]
    flux["Flux\n(reconciler)"]

    subgraph cluster["Cluster"]
        direction TB
        ic["infrastructure-controllers"]
        cfg["infrastructure-configs"]
        apps["applications"]
        ic -->|"dependsOn"| cfg
        cfg -->|"dependsOn"| apps
    end

    git -->|"desired state"| flux
    flux -->|"applies & corrects"| cluster
    cluster -->|"actual state"| flux

Networking: more than just connecting pods together
#

Kubernetes networking is one of those areas where the defaults get you started but do not get you far. The built-in networking model handles pod-to-pod communication within the cluster, but it does not help with load balancing real traffic from outside, advertising services to your home network, or replacing the aging kube-proxy component that handles service routing.

I use Cilium as my Container Network Interface (CNI) plugin, which is doing considerably more work than the name suggests. Cilium uses eBPF (a Linux kernel technology that allows safe, sandboxed programmes to run in the kernel) to handle packet routing, which is both faster and more observable than the iptables rules that kube-proxy uses. With kubeProxyReplacement: true, Cilium takes over all service routing entirely, and kube-proxy is not deployed at all. (Calico and Flannel are the more traditional CNI alternatives; both are well-established and have smaller operational footprints, though neither offers the same depth of eBPF-based observability.)

More practically for a homelab, Cilium implements the LoadBalancer service type without needing a cloud provider. It manages a pool of IP addresses from my home network range (192.168.5.128/25) and, critically, advertises those addresses to my UniFi gateway using BGP. This means that when a service requests a LoadBalancer IP, it gets one from the pool, and that IP is automatically advertised to my router so the rest of my network can reach it. No static routes, no manual configuration when things change. (MetalLB is the more commonly referenced solution for bare-metal LoadBalancer support and works well independently of whichever CNI you’re running.)

For routing HTTP and HTTPS traffic, I have moved from the classic Ingress resource to the newer Gateway API. Gateway API is a Kubernetes-standard replacement for Ingress that is more expressive and better suited to the way modern load balancers work. Cilium acts as the GatewayClass implementation: I define a Gateway resource that describes a listening endpoint, and HTTPRoute resources that describe how traffic arriving at that gateway should be distributed to backend services. A few older applications in the repository still use Traefik with Ingress annotations, but those are being replaced as I get to them. (NGINX Gateway Fabric and Envoy Gateway are both solid Gateway API implementations if Cilium isn’t your CNI; Traefik and Contour also support Gateway API if you’re already using either of those.)

%%{init: {'flowchart': {'curve': 'step'}}}%%
flowchart LR
    client["Client"]
    router["UniFi Router\n(BGP peer)"]

    subgraph cluster["Cluster"]
        cilium["Cilium\n(LoadBalancer IP pool)"]
        gw["Gateway"]
        hr["HTTPRoute"]
        pod["Pod"]
        cilium --> gw --> hr --> pod
    end

    cilium -.->|"advertises IPs via BGP"| router
    client -->|"request"| router
    router -->|"routes to advertised IP"| cilium

TLS and DNS: automate everything
#

Once you have external traffic reaching your cluster, you need HTTPS, and once you have HTTPS, you need certificates. Managing certificates manually is one of those tasks that seems manageable until it is not, usually at 2am when something has expired.

cert-manager handles the full certificate lifecycle. I have two ClusterIssuer resources configured against Let’s Encrypt — a staging issuer for testing and a production issuer for real certificates — both using the DNS-01 challenge type with Cloudflare. With DNS-01, cert-manager creates a TXT record in your DNS zone to prove you control the domain, which means you can get certificates for internal services that are not publicly reachable, as long as your DNS is hosted with a supported provider. Applications request certificates by annotating their Gateway or Ingress resources; cert-manager notices the annotation, issues the certificate, and handles renewal automatically.

The same Cloudflare integration powers external-dns, which solves a related but separate problem: keeping DNS records in sync with what is actually deployed. When I create an HTTPRoute pointing at gitea.${domain}, external-dns watches for that route, creates the corresponding A record in Cloudflare, and removes it if the route is ever deleted. For a homelab where services come and go, this removes a whole category of forgotten DNS records pointing at things that no longer exist. It also manages a TXT record registry with a cluster-specific prefix, which means both clusters can manage their own DNS entries in the same Cloudflare zone without interfering with each other.

Secret management: two different problems
#

Secrets in Kubernetes deserve more thought than they sometimes get, and I have ended up with two separate tools for secret management. This is not redundancy; they are solving different problems.

The first problem is how to store secrets safely in Git. A Kubernetes Secret resource is base64-encoded, not encrypted, which means committing it to a repository is roughly equivalent to writing the password in the commit message. Sealed Secrets solves this by providing a SealedSecret resource type: you encrypt a secret against a cluster-specific public key, commit the encrypted form to Git, and the Sealed Secrets controller running in the cluster decrypts it back into a regular Secret at apply time. The encrypted form is safe to commit because only that specific cluster’s private key can decrypt it. This is how Cloudflare API tokens, external-dns credentials, and other infrastructure secrets are stored in my repository. (SOPS with age encryption is the other common approach; it encrypts individual values within a file rather than wrapping the whole Secret resource, which some people find more readable in diffs.)

The second problem is how applications get secrets at runtime. Vault addresses this. I run a three-node HashiCorp Vault cluster in HA mode, using Raft for consensus and Longhorn for persistent storage. Applications that need secrets at runtime — database passwords, OAuth credentials, API keys — can get them from Vault rather than having them baked into Kubernetes Secrets at all. The Vault Agent sidecar injector handles the mechanics of this: annotate a pod, and Vault Agent is injected as a sidecar that authenticates to Vault and writes the requested secrets to a shared volume that the application can read. (If you’d rather not run your own Vault cluster, External Secrets Operator is worth looking at; it pulls secrets from external providers — AWS Secrets Manager, GCP Secret Manager, Azure Key Vault, and others — and materialises them as Kubernetes Secrets.)

Vault’s TLS setup is worth mentioning because it is more involved than most services. The cluster uses an internal CA (provisioned by cert-manager) for pod-to-pod communication within the Vault cluster itself, and a separate Let’s Encrypt certificate for the public API endpoint. This gives you end-to-end encryption for inter-pod Raft replication without depending on Let’s Encrypt for internal communication.

Vault unseal: Vault’s Raft cluster requires unsealing after every restart. If you are running this in a homelab where nodes are rebooted regularly, you will need a plan for this — either auto-unseal using a cloud KMS, or a manual process. My current setup requires manual intervention after restarts, which is something I want to address.

Storage: persistent data across a distributed system
#

Kubernetes does not solve storage. It provides an abstraction (PersistentVolumeClaim) and a plugin interface (CSI), but what backs those abstractions is entirely up to you.

I use Longhorn as the primary storage backend. Longhorn is a distributed block storage system that runs as a set of controllers and daemons on your cluster nodes. When a PVC is created, Longhorn provisions a volume that is replicated across nodes, which means a node failure does not lose data. This makes it a reasonable choice for a homelab where you cannot guarantee hardware reliability in the way a cloud provider does. (Rook/Ceph is the other main option for distributed storage on bare metal; it is considerably more powerful but also considerably heavier to operate. For simpler setups with no redundancy requirement, local-path-provisioner gets the job done with minimal overhead.)

For relational databases, I use the CloudNative-PG operator rather than deploying a PostgreSQL container directly. CloudNative-PG treats a PostgreSQL cluster as a first-class Kubernetes resource. You declare a Cluster with a number of instances and a storage size, and the operator handles provisioning, replication, failover, and connection secret generation. Gitea has a three-instance PostgreSQL cluster; my Spellingclash application has its own three-instance cluster. Having the operator manage failover automatically is meaningfully better than a single PostgreSQL container with a mounted PVC, and the automatic secret generation means application deployments can reference a secret that the operator creates rather than a secret that has to be kept in sync manually. (The Zalando postgres-operator and CrunchyData’s PGO are the other established options if CloudNative-PG doesn’t suit your needs.)

For caching and session storage, I run a six-node Valkey cluster (three primary, three replica). Valkey is a Redis-compatible key-value store, forked from Redis following its licence change in 2024. Running it as a cluster rather than a single instance is important for Gitea: with three Gitea replicas and no session affinity at the load balancer, sessions need to be stored somewhere that all three replicas can reach. The cluster mode also means the loss of a single Valkey node does not take down caching entirely. (Redis itself remains the obvious point of comparison; Dragonfly is worth knowing about if you want a more memory-efficient drop-in alternative.)

There is also an NFS CSI driver for network-attached storage, used in cases where multiple pods need to read and write to the same volume simultaneously — something that Longhorn block storage does not support.

%%{init: {'flowchart': {'curve': 'step'}}}%%
flowchart TB
    longhorn["Longhorn\nDistributed Block Storage"]

    cnpg["CloudNative-PG\nPostgreSQL clusters"]
    valkey["Valkey\nCache & session store"]
    direct["Direct consumers\n(Vault, AdGuard, audio files)"]

    longhorn --> cnpg
    longhorn --> valkey
    longhorn --> direct

Observability: the work in progress
#

I will be honest: this is the weakest part of the setup, and the area I understand least well. I have the kube-prometheus-stack deployed — it bundles Prometheus, Alertmanager, node-exporter, and kube-state-metrics, and Prometheus is scraping metrics from the Kubernetes API server, scheduler, controller-manager, kubelet, cadvisor, and etcd. The endpoint is exposed via Gateway API to monitoring.lab.definit.co.uk. The infrastructure is there. (Victoria Metrics is worth knowing about as an alternative if Prometheus’s memory footprint becomes a problem at scale.)

What is not there, yet, is anything I would call a proper observability practice. Grafana is installed but disabled in the Helm values — I have been managing dashboards separately rather than keeping them in the repository, which means they are not version-controlled, not reproducible, and not consistent with the GitOps approach I use for everything else. That is the obvious first thing to fix.

More broadly, I am still learning what good Kubernetes observability actually looks like in practice: what to alert on, what to ignore, how to correlate metrics across the cluster, how to make the data useful rather than just present. This is an area I intend to spend more time on, and I will write about it properly once I have something worth writing about rather than just a Prometheus endpoint and good intentions.

How it fits together
#

The thing I find most useful about thinking in concepts rather than tools is that the dependencies become obvious. You cannot run applications until you have networking; you cannot automate TLS until you have cert-manager and a DNS provider integration; you cannot safely store application secrets in Git without Sealed Secrets; you cannot run HA databases without a storage layer you trust. The order in which things are deployed in the repository reflects this — Flux’s dependency ordering enforces it — but the conceptual order is what makes the decisions legible.

The other thing worth saying is that this is more infrastructure than a homelab strictly requires. A single-node cluster with k3s and a few manifests will run personal projects just fine. But if the goal is to learn the patterns you would use in a production environment, the overhead is worth it, and having the configuration in Git means you can tear it all down and rebuild it without losing anything important — which, in a homelab, you will eventually need to do.

Next: I want to look at how I’m using the Vault and Terraform MCP servers that are also running in this cluster — there’s an interesting pattern around using Kubernetes as the hosting layer for AI tooling.