Having a Kubernetes cluster up and running is pretty easy these days. But, when you start to use the cluster and deploy some applications you might expect some issues over time. Kubernetes being a distributed system is not easy to troubleshoot. You need a good monitoring solution and because the Prometheus is CNCF project as Kubernetes it is probably the best fit. In this post, I will show you how to get the Prometheus running and start monitoring your Kubernetes cluster in 5 minutes.
CoreOS introduced operators as a business logic in the first place. I wrote about Elasticsearch operator and how it works a few months ago so you might check it out. In my opinion, operators are the best way to deploy stateful applications on Kubernetes.
CoreOS team also provided Prometheus operator that I will use for deployment. Here is the official operator workflow and relationships view:
From the picture above you can see that you can create a
ServiceMonitor resource which will scrape the Prometheus metrics from the defined set of pods. For example, if you have a frontend app which exposes Prometheus metrics on port
web, all you need to do is to create a service monitor which will configure Prometheus server:
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: frontend-app labels: app: frontend-app spec: selector: matchLabels: app: frontend-app endpoints: - port: web interval: 10s
Installing operator is pretty easy with Helm. Let's add CoreOS repository and install it:
⚡ helm repo add coreos https://s3-eu-west-1.amazonaws.com/coreos-charts/stable/ ⚡ helm install \ --name prometheus-operator \ --namespace monitoring \ --set rbacEnable=false \ coreos/prometheus-operator ⚡ kubectl get pods -n monitoring NAME READY STATUS RESTARTS AGE prometheus-operator-67f87d659c-rtpwq 1/1 Running 0 1m
When you install the Prometheus operator you will get the new Custom Resource Definitions or CRDs. You can check that with this command:
⚡ kubectl get CustomResourceDefinition NAME AGE alertmanagers.monitoring.coreos.com 1m prometheuses.monitoring.coreos.com 1m servicemonitors.monitoring.coreos.com 1m
As you can see, the Prometheus operator will manage alert manager, Prometheus server, and service monitors.
For Prometheus installation I will use the Helm chart
kube-prometheus. This chart has a lot of options, so I encourage you to take a look at default values file and override some values if needed. Among other services, this chart installs Grafana and exporters ready to monitor your cluster.
kube-prometheus is an umbrella chart with many dependencies that you can find in requirements file.
I will enable persistent storage for all components and disable RBAC. You should have RBAC, but in my test cluster, it is not enabled. I will expose Grafana with Ingress, so I disabled anonymous authentication and changed the admin password. This is my custom values file:
⚡ cat > custom-values.yaml <<EOF global: rbacEnable: false alertmanager: storageSpec: volumeClaimTemplate: spec: storageClassName: rbd accessModes: ["ReadWriteOnce"] resources: requests: storage: 10Gi prometheus: storageSpec: volumeClaimTemplate: spec: storageClassName: rbd accessModes: ["ReadWriteOnce"] resources: requests: storage: 10Gi grafana: auth: anonymous: enabled: "false" adminPassword: "YourPass123#" ingress: enabled: true annotations: kubernetes.io/ingress.class: nginx kubernetes.io/tls-acme: "true" hosts: - grafana.test.akomljen.com tls: - secretName: grafana-tls hosts: - grafana.test.akomljen.com storageSpec: class: rbd accessMode: "ReadWriteOnce" resources: requests: storage: 10Gi EOF
The next step is to install the
kube-prometheus chart using custom values file I created before:
⚡ helm install \ --name mon \ --namespace monitoring \ -f custom-values.yaml \ coreos/kube-prometheus
NOTE: Don't use
prometheus as Helm release name! You might experience some issues if you do.
You should wait a few minutes and the whole stack will be up and running. Check for all pods in monitoring namespace:
⚡ kubectl get pods -n monitoring NAME READY STATUS RESTARTS AGE alertmanager-mon-0 1/2 Running 0 1m mon-exporter-kube-state-77b4847f76-wxzcz 1/2 Running 0 1m mon-exporter-kube-state-7cbfc65568-rxs54 1/2 Running 0 1m mon-exporter-node-9bngl 1/1 Running 0 1m mon-exporter-node-d2hnb 1/1 Running 0 1m mon-exporter-node-l7fgh 1/1 Running 0 1m mon-exporter-node-rvxlg 1/1 Running 0 1m mon-grafana-969d44bff-ctmd2 2/2 Running 0 1m prometheus-mon-0 1/2 Running 0 1m prometheus-operator-67f87d659c-rtpwq 1/1 Running 0 10m ⚡ kubectl get ingress -n monitoring NAME HOSTS ADDRESS PORTS AGE mon-grafana grafana.test.akomljen.com 80, 443 1m
When you login to Grafana by default those dashboards will be available:
- Kubernetes Capacity Planning
- Kubernetes Cluster Health
- Kubernetes Cluster Status
- Kubernetes Control Plane Status
- Kubernetes Resource Requests
Of course, you can always update them, or create a completely new dashboard if you need to. In the example below you can see how the node view looks like:
If you want to access other services you can forward the port to localhost, for example:
# Alert manager ⚡ kubectl port-forward -n monitoring alertmanager-mon-0 9093 # Prometheus server ⚡ kubectl port-forward -n monitoring prometheus-mon-0 9090
When you expose Prometheus server to your localhost, you can also check for alerts at
http://localhost:9090/alerts. You could also use Ingress to expose those services but, they don't have authentication so you would need something like OAuth Proxy in front.
It is almost impossible to not experience any issues with Kubernetes cluster once you start to use it. This monitoring setup will help you along the way. Of course, this is only one part of monitoring and it is cluster related only. Many cloud native applications have Prometheus support out of the box, so getting application metrics should be easy. I will cover this in some future blog post. Stay tuned for the next one.