Stateful Applications on Kubernetes

Last update: January 14, 2019

Kubernetes and stateless applications work just out of the box. You can create a replicated application fast. The stateless application doesn't store any data, and simple load balancing does the job when it comes to scaling it. However, if you want to run any application that stores the data somewhere, you care about the order how the application starts or you want a stable hostname and network ID, then we are talking about the stateful applications and in Kubernetes world StatefulSet is the type of workload object you need to use. Unlike Deployment there are some differences. This post will help you to get started.

Stateful Applications

I will not show just another example of stateful set workload definition. There are many examples available out there. So, find one and start it. Now, if you list pods of stateful application, the first thing you notice is that the pods don't have autogenerated IDs in their names. Instead, you get something like this:

⚡ kubectl get po
NAME          READY     STATUS    RESTARTS   AGE
zookeeper-0   1/1       Running   0          31m
zookeeper-1   1/1       Running   0          31m
zookeeper-2   1/1       Running   0          31m

As you can see each pod gets a unique and stable name of the form <StatefulSet-Name>-<Ordinal-Index>. Also, because the controller starts one pod at a time, zookeeper-1 is not deployed before zookeeper-0 is running and ready (you can achieve this with liveness and readiness probes).

Headless Service

To ensure a stable network ID, you need to define a headless service for stateful applications. The definition of headless service is similar to the standard service, but it doesn't have the clusterIP. Simply adding clusterIP: none in service definition creates a headless service. Here is one example:

apiVersion: v1
kind: Service
metadata:
  name: zookeeper-server
  labels:
    app: zookeeper
spec:
  clusterIP: None # <--
  ports:
  - port: 2888
    name: server
  - port: 3888
    name: leader-election
  selector:
    app: zookeeper

In the above example we defined headless service just for zookeeper servers, clients could use a standard service. Note that you can always define multiple services for one Kubernetes workload object. So, how is headless service different?

The main benefit of using a headless service is to be able to reach each pod directly. If this is a standard service, then the service would act as a load balancer or proxy, and you would access your workload object just using the service name zookeeper-server. With headless service, the pod zookeeper-0 could use zookeeper-1.zookeeper-server to talk to zookeeper-1 directly. The form is <StatefulSet-Name>-<Ordinal-Index>.<ServiceName>.

Pod Discovery with nslookup

With headless service, you could easily discover other pods in the stateful set with a simple tool like nslookup. Let's take a look at those two examples:

Standard service - you get the clusterIP value:

⚡ kubectl exec zookeeper-0 -- nslookup zookeeper
Server:        10.0.0.10
Address:    10.0.0.10#53

Name:    zookeeper.default.svc.cluster.local
Address: 10.0.0.213

Headless service - you get the IP of each pod:

⚡ kubectl exec zookeeper-0 -- nslookup zookeeper
Server:        10.0.0.10
Address:    10.0.0.10#53

Name:    zookeeper.default.svc.cluster.local
Address: 172.17.0.6
Name:    zookeeper.default.svc.cluster.local
Address: 172.17.0.7
Name:    zookeeper.default.svc.cluster.local
Address: 172.17.0.8

Those two examples showed you how two types of Kuberntes services are different. Something like this cannot be used in start scripts because pods are started one by one, and only the latest pod could discover other pods.

Operators

All looks great, but there is a minor problem with stateful set workloads. Kubernetes cannot provide a general solution for stateful applications, so you might need to look at Kubernetes Operators. If you think about this, each stateful application acts differently, and it is almost impossible to generalize all of them to stateful set and expect to work seamlessly. Kubernetes would need to have a different workload API for each application type, and that is not likely to happen. Instead, operators are specific to one stateful application. Here are some of them:

So, the question is, should you create or use an operator for each stateful application? Probably not, and you could use stateful set to work with your stateful application, but scaling it, upgrading and any operational stuff need to be done manually with a lot of testing and planning. Don't expect Kubernetes to solve that. If you don't want to run stateful applications inside Kuberntes and you want to use them externally, you could do that too. That is an excellent idea for a smoother transition of production workloads. Check my previous blog post if you want to see how to access external services from Kubernetes, the right way.

Summary

This article is just a small introduction to stateful set workloads. There are many great things about it, rolling upgrades are also available from Kubernetes v1.7 and above, but as many other things they are not perfect and I think that operators are here to solve that. To be precise, operators are using stateful set features, and they are just another layer above to make things easier. Stay tuned for the next one.

Stateful Applications on Kubernetes