I will say that "starting" a Kubernetes cluster is a relatively easy job. Deploying your application to work on top of Kubernetes requires more effort especially if you are new to containers. For people that worked with Docker this can also be a relatively easy job, but of course, you need to master new tools like Helm for example. Then, when you put all together and when you try to run your application in production you will find out there are a lot of missing pieces. Probably Kubernetes doesn't do much, right? Well, Kubernetes is extensible, and there are some plugins or add-ons that will make your life easier.
What are Kubernetes Add-ons?
In short, add-ons extend the functionality of Kubernetes. There are many of them, and chances are that you already using some. For example, network plugins or CNIs like Calico or Flannel, or CoreDNS (now a default DNS manager), or famous Kubernetes Dashboard. I say famous because that is probably the first thing that you will try to deploy once the cluster is running :). Those listed above are some core components, CNIs are must have, the same for DNS to have your cluster function properly. But there is much more you can do once you start deploying your applications. Enter the Kubernetes add-ons for more efficient computing!
Cluster Autoscaler - CA
Cluster Autoscaler scales your cluster nodes based on utilization. CA will scale up the cluster if you have pending pods and scale it down if nodes are not utilized that much - default set to
0.5 and configurable with
--scale-down-utilization-threshold. You definitely don't want to have pods in pending state and at the same time, you don't want to run underutilized nodes - waste of money!
Use case: You have two instance groups or autoscaling groups in your AWS cluster. They are running in two availability zones 1 and 2. You want to scale your cluster based on utilization, but also you want to have a similar number of nodes in both zones. Also, you want to use CA auto-discovery feature, so that you don't need to define min and max number of nodes in CA as those are already defined in your auto scaling groups. And you want to deploy CA on your master nodes.
Here is the example installation of CA via Helm to match above use case:
⚡ helm install --name autoscaler \ --namespace kube-system \ --set autoDiscovery.clusterName=k8s.test.akomljen.com \ --set extraArgs.balance-similar-node-groups=true \ --set awsRegion=eu-west-1 \ --set rbac.create=true \ --set rbac.pspEnabled=true \ --set nodeSelector."node-role\.kubernetes\.io/master"="" \ --set tolerations.effect=NoSchedule \ --set tolerations.key=node-role.kubernetes.io/master \ stable/cluster-autoscaler
There are some additional changes you need to make for this to work. Please check this post for more details - Kubernetes Cluster Autoscaling on AWS.
Horizontal Pod Autoscaler - HPA
The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization. With custom metrics support, on some other application-provided metrics as well.
HPA is not something new in the Kubernetes world, but Banzai Cloud recently released HPA Operator which simplifies it. All you need to do is to provide annotations to your Deployment or StatefulSet and HPA operator will do the rest. Take a look at supported annotations here.
Installation of HPA operator is fairly simple with Helm:
⚡ helm repo add akomljen-charts https://raw.githubusercontent.com/komljen/helm-charts/master/charts/ ⚡ helm install --name hpa \ --namespace kube-system \ akomljen-charts/hpa-operator ⚡ kubectl get po --selector=release=hpa -n kube-system NAME READY STATUS RESTARTS AGE hpa-hpa-operator-7c4d47dd4-9khpv 1/1 Running 0 1m hpa-metrics-server-7766d7bc78-lnhn8 1/1 Running 0 1m
With Metrics Server deployed you also have
kubectl top pods command available. It could be useful to monitor your CPU or memory usage for pods! ;)
HPA can fetch metrics from a series of aggregated APIs (
external.metrics.k8s.io). But, usually HPA will use
metrics.k8s.io API provided by Heapster (deprecated as of Kubernetes 1.11) or Metrics Server.
After you add annotations to your Deployment you should be able to monitor it with:
⚡ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE test-app Deployment/test-app 0%/70% 1 4 1 10m
Keep in mind that the CPU target that you see above is based on defined CPU requests for this particular pod, not overall CPU available on the node.
Addon resizer is an interesting plugin that you could use with Metrics Server in the above scenario. As you deploy more pods to your cluster eventually Metrics Server will need more resources. Addon resizer container watches over another container in a Deployment (Metrics Server for example) and vertically scales the dependent container up and down. Addon resizer can scale Metrics Server linearly based on the number of nodes. For more details check official docs.
Vertical Pod Autoscaler - VPA
You need to define CPU and memory requests for services that you will deploy on Kubernetes. If you don't default CPU request is set to
0,1 of available CPUs. Resource requests help
kube-scheduler to decide on which node to run a particular pod. But, it is hard to define "good enough" values that will be suitable for more environments. Vertical Pod Autoscaler adjusts CPU and memory requests automatically based on the resource used by a pod. It uses Metrics Server to get pod metrics. Keep in mind that you still need to define resource limits manually.
I will not cover the details here as VPA really needs a dedicated blog post, but there are a few things that you should know:
- VPA is still an early stage project, so be aware
- Your cluster must support
MutatingAdmissionWebhooks, which is enabled by default since Kubernetes 1.9
- It doesn't work together with HPA
- It will restart all your pods when resource requests are updated, kind of expected
kube-scheduler is a component responsible for scheduling in Kubernetes. But, sometimes pods can end up on the wrong node due to Kubernetes dynamic nature. You could be editing existing resources, to add node affinity or (anti) pod affinity, or you have more load on some servers and some are running almost on idle. Once the pod is running
kube-scheduler will not try to reschedule it again. Depending on the environment you might have a lot of moving parts.
Descheduler checks for pods that can be moved and evicts them based on defined policies. Descheduler is not a default scheduler replacement and depends on it. This project is currently in Kubernetes incubator and not ready for production yet. But, I found it very stable and it worked nicely. Descheduler will run in your cluster as CronJob.
I wrote a dedicated post Meet a Kubernetes Descheduler which you should check for more details.
k8s Spot Rescheduler
I was trying to solve an issue of managing multiple auto scaling groups on AWS, where one group are on-demand instances and others are a spot. The problem is that once you scale up the spot instance group you want to move the pods from on-demand instances so you can scale it down. k8s spot rescheduler tries to reduce the load on on-demand instances by evicting pods to spots if they are available. In reality, the rescheduler can be used to remove load from any group of nodes onto a different group of nodes. They just need to be labeled appropriately.
I also created a Helm chart for easier deployment:
⚡ helm repo add akomljen-charts https://raw.githubusercontent.com/komljen/helm-charts/master/charts/ ⚡ helm install --name spot-rescheduler \ --namespace kube-system \ --set image.tag=v0.2.0 \ --set cmdOptions.delete-non-replicated-pods="true" \ akomljen-charts/k8s-spot-rescheduler
For a full list of
cmdOptions check here.
For k8s spot rescheduler to work properly you need to label your nodes:
- on-demand nodes -
- spot nodes -
PreferNoSchedule taint on on-demand instances to ensure that k8s spot rescheduler prefers spots when making scheduling decisions.
Please keep in mind that some of the above add-ons are not compatible to work together! Also, there might be some interesting add-on that I missed here, so please let us know in comments. Stay tuned for the next one.