Backing up Tanzu Kubernetes Grid (TKG) on vSphere Workloads to AWS with Velero

Written by

Published on 7/10/2020 - Read in about 6 min (1209 words)

Published under Cloud-Native and VMware #kubernetes #vpshere #vmware #backup #velero #aws #s3

This article is now 2 years old! It is highly likely that this information is out of date and the author will have completely forgotten about it. Please take care when following any guidance to ensure you have up-to-date recommendations.

As more services go live on my Kubernetes clusters and more people start relying on them, I get nervous. For the most part, I try and keep my applications and configurations stateless - relying on ConfigMaps for example to store application configuration. This means with a handful of YAML files in my Git repository I can restore everything to working order. Sometimes though, there’s no choice but to use a PersistentVolume to provide some data persistance where you can’t capture it in a config file. This is where a backup of the cluster - and specifically the PersistentVolume is really important.

Enter Velero - the artist formerly known as Heptio Ark.

Velero is an open source tool to safely backup and restore, perform disaster recovery, and migrate Kubernetes cluster resources and persistent volumes.

Velero uses plugins to integrate with various cloud providers, allowing you to backup to different targets - my aim is to backup my vSphere-based (CSI) Persistent Volumes to AWS S3.

Set up AWS

You can set up all the required components using the AWS console, but my preference is to use the AWS CLI.

Create a new Access Key

To use the AWS CLI you’ll need an Access Key. Log onto your AWS console, go to “My Security Credentials” and create an Access Key (if you’ve not already got one)

Keep the details safe (I store mine in my password manager).

Install AWS CLI

I’m using homebrew to install the AWS CLI, and other packages, because I’m on a Mac - check out the official install docs for other OSes.

1
brew install awscli

Configure a new profile

Note: I’m using a named profile as I’ve got a few accounts - you can omit this if you are just setting up the one

Lets set up some variables first:

1
2
3
BUCKET=prod-cluster-backup # The name of your S3 bucket to create
REGION=us-west-1 # AWS Region in which to create the S3 bucket
PROFILE=my-profile # Only needed if you're creating a named profile for AWS CLI

Configure your AWS profile (omit --profile $PROFILE if you’re using the default profile)

1
2
3
4
5
aws configure --profile $PROFILE
> AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
> AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
> Default region name [None]: us-west-2
> Default output format [None]: ENTER

Create an S3 bucket for the backups:

1
2
3
4
5
aws s3api create-bucket \
    --bucket $BUCKET \
    --region $REGION \
    --create-bucket-configuration LocationConstraint=$REGION \
    --profile $PROFILE

Create an IAM user

I’m creating a user with the same name as my bucket - since this user and bucket will be used to back up a single cluster, it makes sense for me to be able to identify and link the two by name.

1
aws iam create-user --user-name $BUCKET --profile $PROFILE

Create a JSON file with a policy definition of the permissions velero needs - note that it’s scoped to the specific bucket using the $BUCKET variable:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
cat > velero-policy.json <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeVolumes",
                "ec2:DescribeSnapshots",
                "ec2:CreateTags",
                "ec2:CreateVolume",
                "ec2:CreateSnapshot",
                "ec2:DeleteSnapshot"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::${BUCKET}/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::${BUCKET}"
            ]
        }
    ]
}
EOF

Attach the policy to the user to allow it to access the S3 bucket.

1
2
3
4
5
aws iam put-user-policy \
    --user-name $BUCKET \
    --policy-name $BUCKET \
    --policy-document file://velero-policy.json \
    --profile $PROFILE

We can then create an Access Key for the newly created account, which will be used by Velero to upload the data.

1
aws iam create-access-key --user-name $BUCKET --profile $PROFILE

The response should include an AccessKeyId and SecretAccessKey - make a note of them:

1
2
3
4
5
6
7
8
9
{
    "AccessKey": {
        "UserName": "prod-cluster-backup",
        "AccessKeyId": "AKIA4Z..snip..N7QGT5",
        "Status": "Active",
        "SecretAccessKey": "gqjNLeZ..snip..hnGJjNKU",
        "CreateDate": "2020-10-07T15:14:08+00:00"
    }
}

Next we create a credentials file using the Access Key created above - this will be imported into the Velero deployment as a secret.

1
2
3
4
5
cat > credentials-prod-cluster-backup <<EOF
[default]
aws_access_key_id=<AccessKeyId>
aws_secret_access_key=<SecretAccessKey>
EOF

Install and Configure Velero

Once again I’m using homebrew to install Velero - other installation instructions are a here

1
brew install velero

Install Velero into your Kubernetes cluster using the CLI - your kubectl context should be pointed to the cluster you want to install on!

1
2
3
4
5
6
7
8
velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.1.0 \
    --bucket $BUCKET \
    --backup-location-config region=$REGION \
    --secret-file ./credentials-prod-cluster-backup \
    --use-volume-snapshots=false \
    --use-restic

At this point you could use velero backup create to start backing things up, but Velero won’t automatically backup your persistent volumes - you need to tell it what to backup using an annotation. Without annotating the pods the backup will complete and look successful but it won’t include your data!

Annotate deployments, stateful sets or pods

Let’s take my Vault deployment, for example. It consists of a stateful set of three pods, each pod has a persistant volume called “data”. Prior to deployment I can add the backup.velero.io/backup-volumes: <volume name> annotation to the template metadata in my YAML configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/instance: vault
    app.kubernetes.io/name: vault
  name: vault
  namespace: vault
spec:
  replicas: 3
  selector:
    matchLabels:
      app.kubernetes.io/instance: vault
      app.kubernetes.io/name: vault
  serviceName: vault-internal
  template:
    metadata:
      annotations:
        backup.velero.io/backup-volumes: data
      labels:
        app.kubernetes.io/instance: vault
        app.kubernetes.io/name: vault

Alternatively I could annotate the pods directly, but remember these annotations might be overwritten at a later point by the statefulset

1
2
3
kubectl annotate pod/vault-0 backup.velero.io/backup-volumes=data
kubectl annotate pod/vault-1 backup.velero.io/backup-volumes=data
kubectl annotate pod/vault-2 backup.velero.io/backup-volumes=data

Now I can generate a backup of my vault namespace using:

1
velero backup create vault-backup-test-1 --include-namespaces=vault

You can create a backup of your entire cluster using velero backup create whole-cluster-backup, or you can create scheduled backups using a cron-like schedule

1
velero create schedule whole-cluster-backup-daily --schedule="0 7 * * *"

It’s also worth noting you can exclude specific namespaces as well as include, using --exclude-namespaces.

Once I’ve created a backup, I can view it using:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
velero backup describe vault-backup-test-1
Name:         vault-backup-test-1
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.19.1+vmware.2
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=19

Phase:  Completed

Errors:    0
Warnings:  0

Namespaces:
  Included:  vault
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2020-10-09 16:25:13 +0100 BST
Completed:  2020-10-09 16:25:32 +0100 BST

Expiration:  2020-11-08 15:25:13 +0000 GMT

Total items to be backed up:  48
Items backed up:              48

Velero-Native Snapshots: <none included>

Restic Backups (specify --details for more information):
  Completed:  3