Kubernetes and stateless applications work just out of the box. You can create a replicated application really fast. This is because the application doesn't store any data and simple load balancing will do the job when it comes to scaling it. However, if you want to run any application that stores the data somewhere, you care about the order how the application is started, or you want a stable hostname and network ID, then we are talking about the stateful applications and in Kubernetes world
StatefulSet is the type of workload object you need to use. Unlike
Deployment there are some differences. This post will help you to get started.
I will not show just another example of
StatefulSet workload definition. There are many examples available out there. So, find one and start it. Now, if you list Pods of stateful application, the first thing you will notice is that the Pods are not named with some autogenerated IDs, instead, you will get something like this:
$ kubectl get po NAME READY STATUS RESTARTS AGE zookeeper-0 1/1 Running 0 31m zookeeper-1 1/1 Running 0 31m zookeeper-2 1/1 Running 0 31m
As you can see each Pod will get a unique and stable name of the form
<StatefulSet-Name>-<Ordinal-Index>. Also, because the controller starts Pods one at a time,
zookeeper-1 will not be deployed before
zookeeper-0 is running and ready (you can achieve this with liveness and readiness probes).
To ensure stable network ID you need to define a headless service for stateful applications. The definition of headless service is similar to the standard service, but it doesn't have the
clusterIP. Simply adding
clusterIP: none in yaml definition will create a headless service. Here is one example:
apiVersion: v1 kind: Service metadata: name: zookeeper-server labels: app: zookeeper spec: clusterIP: None # <-- ports: - port: 2888 name: server - port: 3888 name: leader-election selector: app: zookeeper
In above example we defined headless service just for zookeeper servers, clients could use a standard service. Note that you can always define multiple services for one Kubernetes workload object. So, how is headless service different?
The main benefit of using a headless service is to be able to reach each Pod directly. If this is a standard service, then the service would act as a load balancer or proxy and you would access your workload object just using the service name
zookeeper-server. With headless service, the Pod
zookeeper-0 could use
zookeeper-1.zookeeper-server to talk to
zookeeper-1 directly. The form is
Pod discovery with nslookup
With headless service, we could easily discover other Pods in one
StatefulSet workload with a simple tool like
nslookup. Let's take a look at those two examples:
Standard service - you will get the
$ kubectl exec zookeeper-0 -- nslookup zookeeper Server: 10.0.0.10 Address: 10.0.0.10#53 Name: zookeeper.default.svc.cluster.local Address: 10.0.0.213
Headless service - you will get the IP of each Pod:
$ kubectl exec zookeeper-0 -- nslookup zookeeper Server: 10.0.0.10 Address: 10.0.0.10#53 Name: zookeeper.default.svc.cluster.local Address: 172.17.0.6 Name: zookeeper.default.svc.cluster.local Address: 172.17.0.7 Name: zookeeper.default.svc.cluster.local Address: 172.17.0.8
Those two examples are just to show you how two types of Kuberntes services are different. Something like this cannot be used in start scripts because Pods are started one by one and only the latest pod could discover other pods.
All looks great, but there is a minor problem with
StatefulSet workloads. Kubernetes cannot provide a general solution for stateful applications, so you might need to look at Kubernetes Operators. If you think about this, each stateful application acts differently and it is almost impossible to generalize all of them to
StatefulSet and expect to work seamlessly. Kubernetes would need to have a different workload API for each application type and that is not likely to happen. Instead, there are Operators which are created for specific stateful applications. Here are some of them:
So, the question is, should you create or use an Operator for each stateful application? Probably not, and you could just use
StatefulSet to work with your stateful application, but scaling it, upgrading and any operational stuff need to be done manually with a lot of testing and planning. Don't expect for Kubernetes to solve that. If you really don't want to run stateful applications inside Kuberntes and you want to use them externally, you could do that too. That is actually a great idea for easier transition of production workloads. Check my previous blog post if you want to see how to access external services from Kubernetes, the right way.
This was just a small introduction to
StatefulSet workloads. There are many great things about it, rolling upgrades are also available from Kubernetes v1.7 and above, but as many other things they are not perfect and I think that Operators are here to solve that. To be precise, Operators are using
StatefulSet features and they are just another layer above to make things easier.