kubernetes, troubleshooting

Learn How to Troubleshoot Applications Running on Kubernetes

Last update:

I'm writing a lot of about Kubernetes for a few months already. I like the idea of cloud-native applications. There are a lot of easily installable applications for Kubernetes out there. One of the biggest resources of "Kube ready" applications is Kubeapps Hub. The list keeps growing really fast. So, all this is a good thing. But, there is also a small problem. People run some commands that they don't quite understand. It can be a copy/paste issue from some document or a blog post. Sometimes there are some unpredictable issues due to the different setup or a wrong command used. The problem is that people don't know how to troubleshoot Kubernetes applications when something goes wrong. I get those questions from a community almost on a daily basis. Because of that, I decided to write a small guide for you to follow when the application that you want to use is not working. If you are new to Kubernetes ecosystem and want to use available Kube apps, please read more about Helm first.

Narrow Down the Issue

The first check you need to do after you install an application is to list all pods. You need to start from the top. If you have some errors they will be visible here:

⚡ kubectl get pods --all-namespaces

Before moving forward you need to understand the naming of the pod. As an example, I will use this pod name my-app-fd77c448c-7p4jr. From its name you can get different resource types:

# Get the pod
⚡ kubectl get po my-app-fd77c448c-7p4jr

# Get the replica set
⚡ kubectl get rs my-app-fd77c448c

# Get the deployment
⚡ kubectl get deploy my-app

In case of a statefulset, you don't have a replica set and the pod name is slightly different, for example zookeeper-1. The error can happen in any of them, so you need to be aware. Let's cover some issues.

ImagePullBackOff - It can happen due to a wrong image name or repository in your config. Also because of wrong tag or simply because you are trying to pull the image from the private repository. Check if you can pull the image directly with docker on your localhost docker pull repo/image_name:tag. If the docker image is not accessible due to access restrictions you should create a registry secret. For images stored on DockerHub you can create a secret named regsecret like this:

⚡ kubectl create secret docker-registry regsecret \
    --docker-username=USERNAME \
    --docker-password='PASSWORD' \
    [email protected]

And then in the deployments for example, just refer to this secret:

spec:
  containers:
  - name: private-reg-container
    image: <your-private-image>
  imagePullSecrets:
  - name: regsecret

Running, but with 0/1 ready state - Even if the container is in running state you need to check for the readiness. In this case, health check failed. You may wonder what it means if you get something like 0/2 or 0/3. Remember, one pod can have many containers. Usually, it takes 20-30 seconds until this turns to 1/1, but if it doesn't change for a while please check the logs. You can use logs command to do that:

⚡ kubectl logs <POD_NAME>

NOTE: Kubetail is an interesting tool to aggregate the logs from many containers.

Sometimes the deployment has an init container. If the init container is failing the above command will not have any effect. To check the logs of an init container, or when the pod has multiple containers, use this command instead:

⚡ kubectl logs <POD_NAME> -c <CONTAINER_NAME>

Pending - Due to the wrong config, the logs command can be useless because the container will never start. This means it waits for something. It can be that your cluster doesn't have enough compute resources, or maybe you defined pod affinity rules. Whatever is the reason you should find more information about the error using describe command:

⚡ kubectl describe po <POD_NAME>

I use above command a lot. With describe command you will see all the information about pod and also the latest events. All errors are displayed in the events section. To see the cluster events, there is also a command for that. Events are your friend:

⚡ kubectl get events

NOTE: With kubewatch you can send all Kubernetes events to a Slack channel.

If you want to check how the configuration of your pod looks like, you can always get the yaml file:

⚡ kubectl get po <POD_NAME> -o yaml

Commands like describe or get can be used on all resources. So, for example, if with describe command on pod you find that something is wrong with persistent volume claim, the next step is to get and describe the particular PVC. You need to narrow down the issue until you find the problem. Also, you can watch your pods status with watch command when you wait for some application to be up and running:

⚡ kubectl get po -w

If you want to get more information with kubectl you can use wide output. For example, to see on which host pods are running:

⚡ kubectl get po -o wide

This will give you pod IP addresses and worker node names. It can be used with other commands as well.

Connection Between Pods

If pods are running without errors, but you still can not access them, check the services. You can track the errors using the get and describe commands, but there are a few more tricks that will help. Each service creates the endpoint, so it should be visible here:

⚡ kubectl get endpoints

If there are some missing endpoints you can check if services selectors are ok:

⚡ kubectl get pods --selector=app=my-app,type=frontend

Sometimes connected applications can not reach each other. To troubleshoot this kind of issues I usually use nslookup and curl. You don't have to have them installed in your application container. Use available docker images for the test. Try with curl first:

⚡ kubectl run client --image=appropriate/curl --rm -ti --restart=Never --command -- curl http://my-service:80

And if the name is not resolved try nslookup:

⚡ kubectl run busybox --image=busybox --rm -ti --restart=Never --command -- nslookup my-service

NOTE: If something is wrong with name resolving, your probably encounter kubedns or coredns issues. Please check those pods first.

Also you can simply forward container port to your localhost for troubleshooting:

⚡ kubectl port-forward <POD_NAME> <POD_PORT>

Kubernetes API Access

If your application needs the access to Kubernetes API (almost all addons will require it) you can check if the API is reachable first:

⚡ kubectl run curl --image=appropriate/curl --rm -ti --restart=Never --command -- sh -c 'KUBE_TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) && curl -sSk -H "Authorization: Bearer $KUBE_TOKEN" \
https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT/api/v1/namespaces/default/pods'

Please note that on RBAC enabled Kubernetes cluster default service account doesn't have any permissions and above command would result in error code 403 Forbidden. You should create and use a dedicated service account for your application in that case.

Summary

This guide will be handy for you to start with Kubernetes. Maybe you are using a Kubernetes for a while already, but you weren't aware of some commands and ways to troubleshoot and check your applications. Stay tuned for the next one.