As more services go live on my Kubernetes clusters and more people start relying on them, I get nervous. For the most part, I try and keep my applications and configurations stateless - relying on ConfigMaps for example to store application configuration. This means with a handful of YAML files in my Git repository I can restore everything to working order. Sometimes though, there’s no choice but to use a PersistentVolume to provide some data persistance where you can’t capture it in a config file. This is where a backup of the cluster - and specifically the PersistentVolume is really important.
Enter Velero - the artist formerly known as Heptio Ark.
Velero is an open source tool to safely backup and restore, perform disaster recovery, and migrate Kubernetes cluster resources and persistent volumes.
Velero uses plugins to integrate with various cloud providers, allowing you to backup to different targets - my aim is to backup my vSphere-based (CSI) Persistent Volumes to AWS S3.
Set up AWS
You can set up all the required components using the AWS console, but my preference is to use the AWS CLI.
Create a new Access Key
To use the AWS CLI you’ll need an Access Key. Log onto your AWS console, go to “My Security Credentials” and create an Access Key (if you’ve not already got one)
Keep the details safe (I store mine in my password manager).
Note: I’m using a named profile as I’ve got a few accounts - you can omit this if you are just setting up the one
Lets set up some variables first:
BUCKET=prod-cluster-backup # The name of your S3 bucket to createREGION=us-west-1 # AWS Region in which to create the S3 bucketPROFILE=my-profile # Only needed if you're creating a named profile for AWS CLI
Configure your AWS profile (omit --profile $PROFILE if you’re using the default profile)
aws configure --profile $PROFILE> AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
> AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
> Default region name [None]: us-west-2
> Default output format [None]: ENTER
At this point you could use velero backup create to start backing things up, but Velero won’t automatically backup your persistent volumes - you need to tell it what to backup using an annotation. Without annotating the pods the backup will complete and look successful but it won’t include your data!
Annotate deployments, stateful sets or pods
Let’s take my Vault deployment, for example. It consists of a stateful set of three pods, each pod has a persistant volume called “data”. Prior to deployment I can add the backup.velero.io/backup-volumes: <volume name> annotation to the template metadata in my YAML configuration: