Using Volume Snapshot/Clone in Kubernetes

MySQL Performance Blog October 22, 2020

16 2 minutes read

One of the most exciting storage-related features in Kubernetes is Volume snapshot and clone. It allows you to take a snapshot of data volume and later to clone into a new volume, which opens a variety of possibilities like instant backups or testing upgrades. This feature also brings Kubernetes deployments close to cloud providers, which allow you to get volume snapshots with one click.

Word of caution: for the database, it still might be required to apply fsfreeze and FLUSH TABLES WITH READ LOCK or

LOCK BINLOG FOR BACKUP

It is much easier in MySQL 8 now, because as with atomic DDL, MySQL 8 should provide crash-safe consistent snapshots without additional locking.

Let’s review how we can use this feature with Google Cloud Kubernetes Engine and Percona Kubernetes Operator for XtraDB Cluster.

First, the snapshot feature is still beta, so it is not available by default. You need to use GKE version 1.14 or later and you need to have the following enabled in your GKE: https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/gce-pd-csi-driver#enabling_on_a_new_cluster.

It is done by enabling “Compute Engine persistent disk CSI Driver“.

Now we need to create a Cluster using storageClassName: standard-rwo for PersistentVolumeClaims. So the relevant part in the resource definition looks like this:

persistentVolumeClaim:
        storageClassName: standard-rwo
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 11Gi

Let’s assume we have cluster1 running:

NAME                                               READY   STATUS    RESTARTS   AGE
cluster1-haproxy-0                                 2/2     Running   0          49m
cluster1-haproxy-1                                 2/2     Running   0          48m
cluster1-haproxy-2                                 2/2     Running   0          48m
cluster1-pxc-0                                     1/1     Running   0          50m
cluster1-pxc-1                                     1/1     Running   0          48m
cluster1-pxc-2                                     1/1     Running   0          47m
percona-xtradb-cluster-operator-79d786dcfb-btkw2   1/1     Running   0          5h34m

And we want to clone a cluster into a new cluster, provisioning with the same dataset. Of course, it can be done using backup into a new volume, but snapshot and clone allow for achieving this much easier. There are still some additional required steps, I will list them as a Cheat Sheet.

1. Create VolumeSnapshotClass (I am not sure why this one is not present by default)

apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshotClass
metadata:
        name: onesc
driver: pd.csi.storage.gke.io
deletionPolicy: Delete

2. Create snapshot

apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
  name: snapshot-for-newcluster
spec:
  volumeSnapshotClassName: onesc
  source:
    persistentVolumeClaimName: datadir-cluster1-pxc-0

3. Clone into a new volume

Here I should note that we need to use the following as volume name convention used by Percona XtraDB Cluster Operator, it is:

datadir-<CLUSTERNAME>-pxc-0

Where CLUSTERNAME is the name used when we create clusters. So now we can clone snapshot into a volume:

datadir-newcluster-pxc-0

Where newcluster is the name of the new cluster.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: datadir-newcluster-pxc-0
spec:
  dataSource:
    name: snapshot-for-newcluster
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  storageClassName: standard-rwo
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 11Gi

Important: the volume spec in storageClassName and accessModes and storage size should match the original volume.

After volume claim created, now we can start newcluster, however, there is still a caveat; we need to use:

forceUnsafeBootstrap: true

Because otherwise, Percona XtraDB Cluster will think the data from the snapshot was not after clean shutdown (which is true) and will refuse to start.

There is still some limitation to this approach, which you may find inconvenient: the volume can be cloned in only the same namespace, so it can’t be easily transferred from the PRODUCTION namespace into the QA namespace.

Though it still can be done but will require some extra steps and admin Kubernetes privileges, I will show how in the following blog posts.

MySQL Performance Blog October 22, 2020

16 2 minutes read