Kubernetes and stateless applications work just out of the box. You can create a replicated application fast. The stateless application doesn't store any data, and simple load balancing does the job when it comes to scaling it. However, if you want to run any application that stores the data somewhere, you care about the order how the application starts or you want a stable hostname and network ID, then we are talking about the stateful applications and in Kubernetes world
StatefulSet is the type of workload object you need to use. Unlike
Deployment there are some differences. This post will help you to get started.
I will not show just another example of stateful set workload definition. There are many examples available out there. So, find one and start it. Now, if you list pods of stateful application, the first thing you notice is that the pods don't have autogenerated IDs in their names. Instead, you get something like this:
⚡ kubectl get po NAME READY STATUS RESTARTS AGE zookeeper-0 1/1 Running 0 31m zookeeper-1 1/1 Running 0 31m zookeeper-2 1/1 Running 0 31m
As you can see each pod gets a unique and stable name of the form
<StatefulSet-Name>-<Ordinal-Index>. Also, because the controller starts one pod at a time,
zookeeper-1 is not deployed before
zookeeper-0 is running and ready (you can achieve this with liveness and readiness probes).
To ensure a stable network ID, you need to define a headless service for stateful applications. The definition of headless service is similar to the standard service, but it doesn't have the
clusterIP. Simply adding
clusterIP: none in service definition creates a headless service. Here is one example:
apiVersion: v1 kind: Service metadata: name: zookeeper-server labels: app: zookeeper spec: clusterIP: None # <-- ports: - port: 2888 name: server - port: 3888 name: leader-election selector: app: zookeeper
In the above example we defined headless service just for zookeeper servers, clients could use a standard service. Note that you can always define multiple services for one Kubernetes workload object. So, how is headless service different?
The main benefit of using a headless service is to be able to reach each pod directly. If this is a standard service, then the service would act as a load balancer or proxy, and you would access your workload object just using the service name
zookeeper-server. With headless service, the pod
zookeeper-0 could use
zookeeper-1.zookeeper-server to talk to
zookeeper-1 directly. The form is
Pod Discovery with nslookup
With headless service, you could easily discover other pods in the stateful set with a simple tool like
nslookup. Let's take a look at those two examples:
Standard service - you get the
⚡ kubectl exec zookeeper-0 -- nslookup zookeeper Server: 10.0.0.10 Address: 10.0.0.10#53 Name: zookeeper.default.svc.cluster.local Address: 10.0.0.213
Headless service - you get the IP of each pod:
⚡ kubectl exec zookeeper-0 -- nslookup zookeeper Server: 10.0.0.10 Address: 10.0.0.10#53 Name: zookeeper.default.svc.cluster.local Address: 172.17.0.6 Name: zookeeper.default.svc.cluster.local Address: 172.17.0.7 Name: zookeeper.default.svc.cluster.local Address: 172.17.0.8
Those two examples showed you how two types of Kuberntes services are different. Something like this cannot be used in start scripts because pods are started one by one, and only the latest pod could discover other pods.
All looks great, but there is a minor problem with stateful set workloads. Kubernetes cannot provide a general solution for stateful applications, so you might need to look at Kubernetes Operators. If you think about this, each stateful application acts differently, and it is almost impossible to generalize all of them to stateful set and expect to work seamlessly. Kubernetes would need to have a different workload API for each application type, and that is not likely to happen. Instead, operators are specific to one stateful application. Here are some of them:
So, the question is, should you create or use an operator for each stateful application? Probably not, and you could use stateful set to work with your stateful application, but scaling it, upgrading and any operational stuff need to be done manually with a lot of testing and planning. Don't expect Kubernetes to solve that. If you don't want to run stateful applications inside Kuberntes and you want to use them externally, you could do that too. That is an excellent idea for a smoother transition of production workloads. Check my previous blog post if you want to see how to access external services from Kubernetes, the right way.
This article is just a small introduction to stateful set workloads. There are many great things about it, rolling upgrades are also available from Kubernetes v1.7 and above, but as many other things they are not perfect and I think that operators are here to solve that. To be precise, operators are using stateful set features, and they are just another layer above to make things easier. Stay tuned for the next one.