applications, kubernetes, troubleshooting

Troubleshoot applications running on Kubernetes

I'm writing a lot of about Kubernetes for a few months already. The reason for that is that I like the idea of cloud-native apps. There is a huge amount of easily installable apps for Kubernetes out there. One of the biggest resources of "Kube ready" apps is Kubeapps Hub. A week ago, when I submitted the Sematext Docker Agent to the stable charts repo there were 149 apps. At the time of writing, there is already 157 of them. It keeps growing really fast. So, all this is a good thing. But, there is also a small problem. People run some commands that they don't quite understand. It can be a copy/paste from some document or a blog post, but it will not always work. Sometimes there are some unpredictable issues due to the different setup or a wrong command used. The problem is that people don't know how to troubleshoot Kubernetes applications when something goes wrong. I get those questions from a community almost on a daily basis. Because of that, I decided to write a small guide for you to follow when the app that you want to use is not working. If you are new to Kubernetes ecosystem and want to use available Kube apps, please read more about Helm first.

Narrow down the issue

The first check you need to do after you install an application is to list all the pods. You need to start from the top. If you have some errors they will probably be visible here:

⚡  kubectl get pods --all-namespaces

Before moving forward you need to understand the naming of the pod. As an example, I will use this pod name my-app-fd77c448c-7p4jr. From its name you can get different resource types:

# Get the pod
⚡  kubectl get po my-app-fd77c448c-7p4jr
# Get the replica set
⚡  kubectl get rs my-app-fd77c448c
# Get the deployment
⚡  kubectl get deploy my-app

In case of a stateful set, you don't have replica set and the pod name is slightly different - zookeeper-1. The error can happen in any of them, so you need to be aware. Let's cover some issues.

ImagePullBackOff - It can happen due to a wrong image name or repository in your config. Also because of wrong tag or simply because you are trying to pull the image from a private repository. Check if you can pull the image directly with docker docker pull repo/image_name:tag. If the docker image is not accessible due to access restrictions you should create a secret. For images stored on DockerHub you can create a secret named regsecret like this:

⚡  kubectl create secret docker-registry regsecret --docker-username=USERNAME --docker-password='PASSWORD' [email protected]

And then in deployments, you need to refer to this secret:

spec:
  containers:
  - name: private-reg-container
    image: <your-private-image>
  imagePullSecrets:
  - name: regsecret

Running, but with 0/1 ready state - Even if the container is in running state you need to check for the readiness. In this case, health check failed. You may wonder what it means if you get something like 0/2 or 0/3. Remember, one pod can have many containers. Usually, it takes 20-30 seconds until this turns to 1/1, but if it doesn't change for a while please check the logs. You can use logs command to do that:

⚡  kubectl logs <POD_NAME> -n default

NOTE: Kubetail is an interesting tool to aggregate the logs from many containers.

Sometimes the deployment has an Init container. If the Init container is failing the above command will not have any effect. To check the logs of an Init container use this command instead:

⚡  kubectl logs <POD_NAME> -c <POD_INIT_NAME>

Pending - Due to the wrong config, the logs command can be useless because the container will never start. This means it waits for something. It can be that your cluster doesn't have enough compute resources, or maybe you defined pod affinity rules. Whatever is the reason you should find more information about the error using describe command:

⚡  kubectl describe po <POD_NAME>

I use above command a lot. With describe command you will see all the information about pod and also the latest events. All errors are displayed in the events section. To see cluster events, there is also a command for that. Events are your friend:

⚡  kubectl get events

NOTE: With kubewatch you can send all Kubernetes events to a Slack channel.

If you want to check how the configuration of your pod looks like, you can always get the yalm file:

⚡  kubectl get po <POD_NAME> -o yaml

Commands like describe or get can be used on all resources. So, for example, if using describe command on pod you found that something is wrong with persistent volume claim, the next step is to get and describe the particular PVC. You need to narrow down the issue until you find the problem. Also, you can watch your pods status with watch command when you wait for some app to be running:

⚡  kubectl get pods -w

If you want to get more information with kubectl you can use wide output. For example, to see on which host pods is running:

⚡  kubectl get pods -o wide

This will give you pod IP addresses and worker node names. It can be used with other commands as well.

Connection between pods

If pods are running without errors, but you still can not access them, check the services. You can track the errors using the get and describe commands, but there are a few more tricks that will help. Each service creates the endpoint, so it should be visible here:

⚡  kubectl get endpoints

If there are some missing endpoints you can check if services selectors are ok:

⚡  kubectl get pods --selector=app=my-app,type=frontend

Sometimes connected apps can not reach each other. To troubleshoot this kind of issue I usually use nslookup and curl. You don't have to have them installed in your app container. Use available docker images for the test. Try with curl first:

⚡  kubectl run client --image=appropriate/curl --rm -ti --restart=Never --command -- curl http://my-service:80

And if the name is not resolved try nslookup:

⚡  kubectl run busybox --image=busybox --rm -ti --restart=Never --command -- nslookup my-service

Kubernetes API access

If your app needs to access to Kubernetes API (almost any addons) you can check if the API is reachable first:

⚡  kubectl run curl --image=appropriate/curl --rm -ti --restart=Never --command -- sh -c 'KUBE_TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) && curl -sSk -H "Authorization: Bearer $KUBE_TOKEN" \
https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT/api/v1/namespaces/default/pods'

Please note that on RBAC enabled Kubernetes cluster default service account doesn't have any permissions and above command would result in error code 403 - Forbidden. You should create and use a dedicated service account for your app in that case. This will probably happen if you deployed the cluster with kubeadm because RBAC is enabled by default.

Summary

This guide will be handy for you to start with Kubernetes. Maybe you are using a Kubernetes for a while already, but you weren't aware of some commands and ways to troubleshoot and check your apps.

Author image

Alen Komljen

Building and automating infrastructure with Docker, Kubernetes, kops, Helm, Rancher, Terraform, Ansible, SaltStack, Jenkins, AWS, GKE and many others.
  • Sarajevo