kubernetes, statefulset

Stateful Applications on Kubernetes

Kubernetes and stateless applications work just out of the box. You can create a replicated application really fast. This is because the application doesn't store any data and simple load balancing will do the job when it comes to scaling it. However, if you want to run any application that stores the data somewhere, you care about the order how the application is started, or you want a stable hostname and network ID, then we are talking about the stateful applications and in Kubernetes world StatefulSet is the type of workload object you need to use. Unlike Deployment there are some differences. This post will help you to get started.

Stateful Applications

I will not show just another example of StatefulSet workload definition. There are many examples available out there. So, find one and start it. Now, if you list Pods of stateful application, the first thing you will notice is that the Pods are not named with some autogenerated IDs, instead, you will get something like this:

⚡ kubectl get po
NAME          READY     STATUS    RESTARTS   AGE
zookeeper-0   1/1       Running   0          31m
zookeeper-1   1/1       Running   0          31m
zookeeper-2   1/1       Running   0          31m

As you can see each Pod will get a unique and stable name of the form <StatefulSet-Name>-<Ordinal-Index>. Also, because the controller starts Pods one at a time, zookeeper-1 will not be deployed before zookeeper-0 is running and ready (you can achieve this with liveness and readiness probes).

Headless Service

To ensure stable network ID you need to define a headless service for stateful applications. The definition of headless service is similar to the standard service, but it doesn't have the clusterIP. Simply adding clusterIP: none in yaml definition will create a headless service. Here is one example:

apiVersion: v1
kind: Service
metadata:
  name: zookeeper-server
  labels:
    app: zookeeper
spec:
  clusterIP: None # <--
  ports:
  - port: 2888
    name: server
  - port: 3888
    name: leader-election
  selector:
    app: zookeeper

In above example we defined headless service just for zookeeper servers, clients could use a standard service. Note that you can always define multiple services for one Kubernetes workload object. So, how is headless service different?

The main benefit of using a headless service is to be able to reach each Pod directly. If this is a standard service, then the service would act as a load balancer or proxy and you would access your workload object just using the service name zookeeper-server. With headless service, the Pod zookeeper-0 could use zookeeper-1.zookeeper-server to talk to zookeeper-1 directly. The form is <StatefulSet-Name>-<Ordinal-Index>.<ServiceName>.

Pod Discovery with nslookup

With headless service, we could easily discover other Pods in one StatefulSet workload with a simple tool like nslookup. Let's take a look at those two examples:

Standard service - you will get the clusterIP value:

⚡ kubectl exec zookeeper-0 -- nslookup zookeeper
Server:        10.0.0.10
Address:    10.0.0.10#53

Name:    zookeeper.default.svc.cluster.local
Address: 10.0.0.213

Headless service - you will get the IP of each Pod:

⚡ kubectl exec zookeeper-0 -- nslookup zookeeper
Server:        10.0.0.10
Address:    10.0.0.10#53

Name:    zookeeper.default.svc.cluster.local
Address: 172.17.0.6
Name:    zookeeper.default.svc.cluster.local
Address: 172.17.0.7
Name:    zookeeper.default.svc.cluster.local
Address: 172.17.0.8

Those two examples are just to show you how two types of Kuberntes services are different. Something like this cannot be used in start scripts because Pods are started one by one and only the latest pod could discover other pods.

Operators

All looks great, but there is a minor problem with StatefulSet workloads. Kubernetes cannot provide a general solution for stateful applications, so you might need to look at Kubernetes Operators. If you think about this, each stateful application acts differently and it is almost impossible to generalize all of them to StatefulSet and expect to work seamlessly. Kubernetes would need to have a different workload API for each application type and that is not likely to happen. Instead, there are Operators which are created for specific stateful applications. Here are some of them:

So, the question is, should you create or use an Operator for each stateful application? Probably not, and you could just use StatefulSet to work with your stateful application, but scaling it, upgrading and any operational stuff need to be done manually with a lot of testing and planning. Don't expect for Kubernetes to solve that. If you really don't want to run stateful applications inside Kuberntes and you want to use them externally, you could do that too. That is actually a great idea for easier transition of production workloads. Check my previous blog post if you want to see how to access external services from Kubernetes, the right way.

Summary

This was just a small introduction to StatefulSet workloads. There are many great things about it, rolling upgrades are also available from Kubernetes v1.7 and above, but as many other things they are not perfect and I think that Operators are here to solve that. To be precise, Operators are using StatefulSet features and they are just another layer above to make things easier. Stay tuned for the next one.