kubernetes, persistent storage

Kubernetes Persistent Volumes with Deployment and StatefulSet

Last update:

I get many questions about Kubernetes and persistence. Of course, persistence is essential for stateful apps. We often say that for stateful apps you need to use StatefulSet and for stateless apps a Deployment. It doesn't mean that you couldn't run stateful apps using deployments with persistent volumes. For example, the official MySQL Helm chart is using deployment. So, it can be done, but users get confused about this. What is the deal? When should you use deployment and when stateful set?

Previous blog post

Persistent Volume Claim

To have persistence in Kuberntes, you need to create a Persistent Volume Claim or PVC which is later consumed by a pod. Also, you can get confused here because there is also a Persistent Volume or PV. If you have a default Storage Class or you specify which storage class to use when creating a PVC, PV creation is automatic. PV holds information about physical storage. PVC is just a request for PV. Another way and less desirable is to create a PV manually and attach PVC to it, skipping storage class altogether.

You can define a PVC and set the desired size, access modes, storage class name, etc. Let's create a zookeeper-vol PVC:

⚡ cat <<EOF | kubectl create -f -
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: zookeeper-vol
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi
  storageClassName: rbd
EOF

⚡ kubectl get pvc zookeeper-vol
NAME            STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
zookeeper-vol   Bound     pvc-693857a8-3a8b-11e8-a34e-0238efc27e9c   8Gi        RWO            rbd            10s

In my example, I have a storage class rbd which points to the Ceph cluster. When the new PVC gets created, a new 8GB volume is ready to use. The important thing here are the access modes:

  • ReadWriteOnce – Mount a volume as read-write by a single node
  • ReadOnlyMany – Mount the volume as read-only by many nodes
  • ReadWriteMany – Mount the volume as read-write by many nodes

Access mode defines how a pod consumes this volume. In most cases, you set ReadWriteOnce so that only one node can do read-write. Please note that this means more pods on a single node can still use the same volume. In some cases for stateless apps you want to have read-only volumes and for that, you need to use ReadOnlyMany.

The rare case is ReadWriteMany because only a few storage providers have the support for it. Think of ReadWriteMany as NFS.

Define a Deployment with PVC

It is possible to create a PVC with ReadWriteOnce access mode, and then to create a deployment which runs a stateful application and use this PVC. It works perfectly fine, but only if you don't want to scale your deployment. If you try to do it, you will probably get an error that volume is already in use when pod starts on another node. Even if that is not the case, and both pods end up on the same node, still they will write to the same volume. So you don't want this.

I created a Kubernetes ready Zookeeper Docker image for this blog post. Let's use the zookeeper-vol PVC that I created before and create the new Zookeeper deployment which mounts this volume:

⚡ cat <<EOF | kubectl create -f -
apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: zookeeper
spec:
  selector:
    matchLabels:
      app: zookeeper
  replicas: 1
  template:
    metadata:
      labels:
        app: zookeeper
    spec:
      containers:
      - env:
        - name: ZOOKEEPER_SERVERS
          value: "1"
        image: "komljen/zookeeper:3.4.10"
        imagePullPolicy: IfNotPresent
        name: zookeeper
        ports:
        - containerPort: 2181
          name: client
        - containerPort: 2888
          name: server
        - containerPort: 3888
          name: leader-election
        readinessProbe:
          exec:
            command:
            - /opt/zookeeper/bin/zkOK.sh
          initialDelaySeconds: 10
          timeoutSeconds: 2
          periodSeconds: 5
        livenessProbe:
          exec:
            command:
            - /opt/zookeeper/bin/zkOK.sh
          initialDelaySeconds: 120
          timeoutSeconds: 2
          periodSeconds: 5
        volumeMounts:
        - mountPath: /data
          name: zookeeper-data
      restartPolicy: Always
      volumes:
      - name: zookeeper-data
        persistentVolumeClaim:
          claimName: zookeeper-vol
---
apiVersion: v1
kind: Service
metadata:
  name: zookeeper
spec:
  ports:
  - name: client
    port: 2181
    targetPort: 2181
  selector:
    app: zookeeper
---
apiVersion: v1
kind: Service
metadata:
  name: zookeeper-server
spec:
  clusterIP: None
  ports:
  - name: server
    port: 2888
    targetPort: 2888
  - name: leader-election
    port: 3888
    targetPort: 3888
  selector:
    app: zookeeper
EOF

If you try to scale this deployment, other replicas will try to mount and use the same volume. It is okay if your volume is read-only. So, how to work around it for read-write volumes?

Define a Stateful Set with PVC

When you have an app which requires persistence, you should create a stateful set instead of deployment. There are many benefits. Also, you will not have to create a PVCs in advance, and you will be able to scale it easily. Of course, the scaling depends on the app you are deploying. With the stateful set, you can define a volumeClaimTemplates so that a new PVC is created for each replica automatically. Also, you will end up with only one file which defines your app and also persistent volumes. Now let's try to deploy Zookeeper using a stateful set:

⚡ cat <<EOF | kubectl create -f -
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
  name: zookeeper
spec:
  selector:
    matchLabels:
      app: zookeeper
  replicas: 1
  serviceName: zookeeper-server
  template:
    metadata:
      labels:
        app: zookeeper
    spec:
      containers:
      - env:
        - name: ZOOKEEPER_SERVERS
          value: "1"
        image: "komljen/zookeeper:3.4.10"
        imagePullPolicy: IfNotPresent
        name: zookeeper
        ports:
        - containerPort: 2181
          name: client
        - containerPort: 2888
          name: server
        - containerPort: 3888
          name: leader-election
        readinessProbe:
          exec:
            command:
            - /opt/zookeeper/bin/zkOK.sh
          initialDelaySeconds: 10
          timeoutSeconds: 2
          periodSeconds: 5
        livenessProbe:
          exec:
            command:
            - /opt/zookeeper/bin/zkOK.sh
          initialDelaySeconds: 120
          timeoutSeconds: 2
          periodSeconds: 5
        volumeMounts:
        - mountPath: /data
          name: zookeeper-vol
      restartPolicy: Always
  volumeClaimTemplates:
  - metadata:
      name: zookeeper-vol
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 8Gi
      storageClassName: rbd
---
apiVersion: v1
kind: Service
metadata:
  name: zookeeper
spec:
  ports:
  - name: client
    port: 2181
    targetPort: 2181
  selector:
    app: zookeeper
---
apiVersion: v1
kind: Service
metadata:
  name: zookeeper-server
spec:
  clusterIP: None
  ports:
  - name: server
    port: 2888
    targetPort: 2888
  - name: leader-election
    port: 3888
    targetPort: 3888
  selector:
    app: zookeeper
EOF

The major difference compared to deployment is in this part:

spec:
  volumeClaimTemplates:
  - metadata:
      name: zookeeper-vol
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 8Gi
      storageClassName: rbd

After you create this stateful set the new PVC is also created for a pod zookeeper-0:

⚡ kubectl get pvc | grep zookeeper-0
zookeeper-vol-zookeeper-0              Bound     pvc-68891ba1-3a94-11e8-a34e-0238efc27e9c   8Gi        RWO            rbd            2m

For each new replica, the stateful set will create a separate volume. Also, this way it is much easier to manage pods and PVCs at the same time.

Summary

Stateful sets are somehow left behind, and most users don't even consider it. They are much better at managing stateful apps and persistent volumes. If you want to learn more about the stateful set, in general, check the blog post that I wrote a few months ago - Stateful Applications on Kubernetes. Stay tuned for the next one.