Alen Komljen

Running Java Apps on Kubernetes ARM Nodes

Alen Komljen — Fri, 29 Jan 2021 17:29:15 GMT

This article is originally posted on the Faire’s technical blog - The Craft.

Major cloud providers like Amazon are betting on custom-built ARM processors. Amazon built the first version of the Graviton processor in 2018. Two years later, they introduced a new version, Graviton2, with some significant improvements and a 40% better price/performance over comparable x86-based instances. Those are big numbers.

Also, you probably heard about Apple's M1 ARM-based SoC and how good it is. Soon, likely all Mac lineups will be powered by ARM. There is a really interesting post on why M1 is superior and different from traditional CPUs, which I highly recommend you to read if you want to learn more.

At Faire, we are exploring Amazon's Graviton2 based instances to run Java/Kotlin apps on the Kubernetes platform for some gains in performance and lower prices. Running Kubernetes on ARM instances and building ARM containers sounds like a significant change. However, it is not that complicated because Kubernetes and docker are built with multiple architectures in mind. Let’s see how it works.

Kubernetes ARM Nodes on AWS

Our Kubernetes cluster is running on AWS EKS. The setup can have small differences for other cloud providers or standalone installations, but it should be similar. It doesn't matter which platform your master nodes are running, as you will not run any of your apps on those.

Before you can utilize Amazon ARM instances for Kubernetes worker nodes, there are a few preparation steps. The first step would be to add another node group of ARM instances. There is nothing special about it, except that you need to choose an ARM-based instance type, for example, M6g (g stands for Graviton2). It is pretty simple and officially supported in EKS, so please check the docs.

In a nutshell, ARM support means that you have to use ARM-based instance type, ARM OS, with all dependencies like docker, kubelet daemon, etc., built for ARM. When adding a new Kubernetes node group, I suggest that you taint and label those nodes to prevent running non-compatible containers.

We use the following services on each worker node, and in practice, this means that each one of those containers needs to be compatible with ARM architecture:

AWS node (AWS VPC native CNI plugin)
kube-proxy
kube2iam (IAM authentication)
Datadog agent
Linkerd proxy

Only the kube2iam image wasn't available for the ARM platform when writing this post, and we had to build it. The process is straightforward so let's get into that.

Docker Multi Architecture Images

There is an architecture label for each docker image. You can check the architecture of the image with the docker image inspect command, for example:

$ docker pull openjdk:15
$ docker image inspect openjdk:15 --format='{{.Architecture}}'
amd64

Docker pulled the amd64 image because it's running on an amd64 machine in this case. If you run the same command on the ARM platform, you would probably get arm64 as an architecture label. So, how is this possible, the image has the same tag?

One way to achieve this is by creating a docker manifest list for identical images in function available for different architectures. First, you can create two images with separate tags, e.g., openjdk:15-amd64 and openjdk:15-arm64, and then create a manifest list combining those two images as openjdk:15 and push it to the registry. You will see the whole process with the Java app example below.

Docker Buildx Plugin

Buildx is a docker plugin that extends the docker build command with the full support of BuildKit. BuildKit library is bundled into docker daemon. One of many interesting features of BuildKit is that it is designed to work for building multi-platform images and not relying on underneath architecture and operating system.

If you are using Docker for Desktop on Mac, you can enable experimental features in preferences to use buildx. Make sure you are running Docker for Desktop version 3.0.0 and up. Older versions had experimental features in Edge builds only.

To verify that buildx is available on the system, run:

$ docker buildx ls
NAME/NODE DRIVER/ENDPOINT STATUS  PLATFORMS
default * docker
  default default         running linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6

From the above output, you can see all supported platform versions.

On Linux nodes, the easiest way to build multi-architecture images is by utilizing QEMU, a well-known machine emulator and virtualizer. The setup is pretty straightforward as well, and here is a short version of it.

First, you have to install the latest buildx plugin:

$ wget https://github.com/docker/buildx/releases/download/v0.5.1/buildx-v0.5.1.linux-amd64
$ mkdir ~/.docker/cli-plugins
$ mv buildx-v0.5.1.linux-amd64 ~/.docker/cli-plugins/docker-buildx
$ chmod a+x ~/.docker/cli-plugins/docker-buildx

The command docker buildx should work now. The next step is to install a cross-platform emulator collection distributed as a docker image:

$ docker run --privileged --rm tonistiigi/binfmt --install all

The above command should install and print out all supported emulators. Now let's start the new multi-platform builder instance:

$ docker buildx create --name multiplatform
$ docker buildx use multiplatform

Running docker buildx inspect --bootstrap should show all supported platforms. For more details on this setup and how it works under the hood, please check buildx docs.

Once the buildx is ready, you can try to build an ARM kube2iam image with it. Kube2iam is written in Go, which means that you need to cross-compile the code for different platforms to produce a valid binary. This is true for many other programming languages as well. Let’s try it:

$ git clone https://github.com/jtblin/kube2iam.git
$ cd kube2iam

$ docker buildx build \
    --platform linux/arm64 \
    -t fairewholesale/kube2iam:0.10.11-arm64 .

The above build image process with buildx and arm64 platform automatically produce ARM binaries because Dockerfile for kube2iam has a build stage in it as well, and docker engine will automatically pull the ARM version of builder image golang:1.14.0.

Java Apps on ARM

The whole point of this blog post was to see how you can run the Java apps. It turns out to be the easiest part after you figure out how to run ARM nodes and build docker images that can run on them.

You already saw how to build a kube2iam container for ARM, so you might be wondering how Java is different? For Java apps, the only thing you need to have is JVM binaries capable of running on the ARM platform. You don't need to cross-compile the code like in the previous example. When producing an ARM container that will run Java code, you can compile code on any platform, then copy the build artifact to the ARM platform and build the image there. What you will get, however, is a docker image capable of running on the ARM platform only.

Let's see this through a simple Java app and docker build process. Get the app and run it locally:

$ git clone https://github.com/Faire/javaosarch
$ cd javaosarch
$ gradle run

> Task :run
OS name : Mac OS X
OS arch : x86_64

BUILD SUCCESSFUL in 1s
2 actionable tasks: 2 executed

Assuming you also have the x86_64 platform above, now build the jar and create a docker image on the arm64 platform:

$ gradle build

BUILD SUCCESSFUL in 943ms
5 actionable tasks: 4 executed, 1 up-to-date

$ docker buildx build \
    --platform linux/arm64 \
    -t fairewholesale/javaosarch:1.0-arm64 .

$ docker run \
    --platform linux/arm64 \
    fairewholesale/javaosarch:1.0-arm64 \
    java -cp javaosarch-1.0.jar javaosarch.JavaOsArch
OS name : Linux
OS arch : aarch64

From this example, you can see that it is not important on which platform you compile the Java code. The only thing that matters is that you build a container for ARM.

Now, let's build the amd64 version of the above image and create a manifest to combine both images under the same tag:

$ docker buildx build \
    --platform linux/amd64 \
    -t fairewholesale/javaosarch:1.0-amd64 .

$ docker push fairewholesale/javaosarch:1.0-arm64
$ docker push fairewholesale/javaosarch:1.0-amd64

$ docker manifest create \
    fairewholesale/javaosarch:1.0 \
    --amend fairewholesale/javaosarch:1.0-arm64 \
    --amend fairewholesale/javaosarch:1.0-amd64

$ docker manifest push fairewholesale/javaosarch:1.0

You can also inspect the manifest, which will show the information about available image architectures:

$ docker manifest inspect fairewholesale/javaosarch:1.0 | jq .manifests[].platform
{
  "architecture": "amd64",
  "os": "linux"
}
{
  "architecture": "arm64",
  "os": "linux"
}

And that should be it, the fairewholesale/javaosarch:1.0 will work on both amd64 and arm64 machines.

When setting up a deployment for Kubernetes, you don’t need to take care of which image a particular node will pull. Docker engine will figure that out and pull the right image based on its running platform.

Some other important details you need to be aware of:

If you are using natively-built libraries through JNI, you may need to get those libraries for the ARM platform.
The official OpenJDK image starting from openjdk:15 is available for the ARM platform. Older versions are not supported.
The final container image may have some other dependencies, small binaries that you want to include, and all of those need to be compiled for ARM. In our case, that was tini and linkerd-await.

Summary

We are still experimenting with ARM builds. The hardest part is changing the CI pipeline, where we build a bunch of other stuff. Jenkins nodes are one of the most significant computing expenses, and the speed of our builds is critical to us. So, using Graviton2 instances would be a big money saver, maybe a time saver as well. I cannot talk about real-world performance, as this is still a work in progress. One thing is sure, Java is ready for production deployments on ARM, and it seems like the right time to explore it further.

Stopping Docker Containers Gracefully

Alen Komljen — Wed, 27 May 2020 10:31:38 GMT

This is a post from my old blog, originally written in 2015. The old blog is gone, and I decided to repost it here to redirect old links. The content is just slightly adjusted.

I started to work with Docker containers seven years ago. I made my first Docker playground with a bunch of different images. As I began to work on enterprise-level applications deployment, I found out that there were a lot of things I was doing wrong. One of them was how I started applications inside a container.

Almost all my Dockerfiles have some bash script at the end to make some minor changes before starting the application. I usually add a bash script to CMD instruction in the Dockerfile. I thought there isn't anything wrong with that, except that docker stop didn't work as it should.

Docker Containers and PID 1

When you run a bash script in a container, it will get PID 1, and the application will be a child process as PPID 1. This is a problem because Bash will not forward the termination signal SIGTERM to the app on container stop instruction. Instead, Docker kills the container after 10 seconds. You can adjust the stop timeout, of course, but the main reason it exists is if the application needs more time to stop gracefully.

There is an easy way to handle this with the exec command inside a bash script. Exec will replace the shell without creating a new process, and the application will get PID 1 instead.

Let’s test both scenarios first. For testing purposes, you can use this simple Redis Dockerfile:

FROM ubuntu:trusty
ENV DEBIAN_FRONTEND noninteractive

RUN \
  apt-get update && \
  apt-get -y install \
          software-properties-common && \
  add-apt-repository -y ppa:chris-lea/redis-server && \
  apt-get update && \
  apt-get -y install \
          redis-server && \
  rm -rf /var/lib/apt/lists/*

COPY start.sh start.sh
RUN chmod +x start.sh

EXPOSE 6379

RUN rm /usr/sbin/policy-rc.d
CMD ["/start.sh"]

And here is the start.sh script which will change recommended kernel settings and start Redis server:

#!/usr/bin/env bash

# Disable THP Support in kernel
echo never > /sys/kernel/mm/transparent_hugepage/enabled
# TCP backlog setting (defaults to 128)
sysctl -w net.core.somaxconn=16384
#---------------------------------------------------------------
/usr/bin/redis-server

Now let’s build and run this container (privileged option must be set to true when starting a container because of kernel changes):

docker build -t my/redis .
docker run -d --privileged --name test my/redis

Then check for the processes running inside the container:

docker exec test ps -ef
  UID        PID  PPID  C STIME TTY          TIME CMD
  root         1     0  0 13:20 ?        00:00:00 bash /start.sh
  root         6     1  0 13:20 ?        00:00:00 /usr/bin/redis-server *:6379

As you can see, Redis is running as PID 6, which is why graceful stop doesn't work. Let’s try to stop this container with a docker stop test. Docker will kill the container after 10 seconds. If you check container logs docker logs test, the last message will be Ready to accept connections, meaning Redis didn't receive termination signal.

The simplest way of gracefully stopping Redis container is to change the last line in start.sh script to exec /usr/bin/redis-server:

#!/usr/bin/env bash

# Disable THP Support in kernel
echo never > /sys/kernel/mm/transparent_hugepage/enabled
# TCP backlog setting (defaults to 128)
sysctl -w net.core.somaxconn=16384
#---------------------------------------------------------------
exec /usr/bin/redis-server

Rebuild the image, start the container, and again check for running processes:

docker exec test ps -ef
  UID        PID  PPID  C STIME TTY          TIME CMD
  root         1     0  1 13:24 ?        00:00:00 /usr/bin/redis-server *:6379

As you can see now, Redis is running as PID 1, and docker stop will work just fine. Let's try it first and recheck the Docker logs. You should see this message in the Redis log Received SIGTERM scheduling shutdown...

In the above cases, Redis is running as root user, which a bad practice. Here are a few more examples of how I’m using exec with Postgres and Tomcat containers where processes are not running with root user:

exec sudo -E -u tomcat7 ${CATALINA_HOME}/bin/catalina.sh run
exec su postgres -c "${POSTGRES_BIN} -D ${PGDATA} -c config_file=${CONF}"

I will not go into the details for the above commands, but in this case, processes will not be running as PID 1 because of sudo and su commands. However, Docker stop works correctly in both cases. Those commands will forward the SIGTERM signal to child processes, unlike Bash.

Interesting post on "Stopping @Docker #Containers Gracefully" by @alenkomljen via last week's #DockerWeekly: http://t.co/KSibQSBJ8c #Docker
— Docker (@Docker) August 5, 2015

Kubernetes Backup and Restore with Velero

Alen Komljen — Sat, 23 May 2020 20:02:59 GMT

Recently I migrated some Kubernetes clusters, managed by Amazon EKS. The clusters were running in public subnets, so I wanted to make them more secure by utilizing private and public subnets where needed. Changing networking settings is not possible once you create the service in AWS. Any service, not just EKS. Since I already had Velero installed for backups with S3 provider, the most natural thing was to use it to restore all resources on the new cluster as well.

Velero Installation

Velero (formerly Heptio Ark) gives you tools to back up and restore your Kubernetes cluster resources and persistent volumes.

With Velero, you can do disaster recovery, data migration, and data protection. Once installed and running, it will backup all Kubernetes resources to S3 compatible object store and make a snapshot of persistent volumes. It supports all major cloud providers and many more.

NOTE: Installation instructions assume that you are familiar with tools like kube2iam or kiam for providing secure access to AWS resources. If you are not, please check my post first Integrating AWS IAM and Kubernetes with kube2iam.

You have to do some preparation steps. Install AWS CLI, and follow the below commands.

1. Create an S3 bucket, and make it private:

aws s3api create-bucket \
  --bucket velero-test-backups \
  --region us-east-1

aws s3api put-public-access-block --bucket velero-test-backups \
  --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true" \
  --region us-east-1

2. Create an IAM Role which will be assumed by Velero pod (more details on node trust policy in kube2iam blog post mentioned before):

cat > node-trust-policy.json <

3. Create and attach IAM policy to previously created IAM role for EC2 and S3 access:

cat > s3-velero-policy.json <

4. Install Velero with official Helm chart:

cat > velero-values.yaml <

5. Install Velero CLI to manage backups and restores:

brew intstall velero

If everything is ok, you should be able to run Velero commands, for example, to create a new backup:

velero -n kube-system backup create test

velero -n kube-system backup get
NAME	STATUS      CREATED                          EXPIRES   STORAGE LOCATION   SELECTOR
test	Completed   2020-05-22 14:06:12 +0200 CEST   29d       default

NOTE: If you have persistent volumes backed by EBS, each time you create a new backup, Velero will create an EBS snapshot. Snapshot creation can take some time, and the completed status you get from the Velero doesn't mean that snapshot is ready.

Or, you can run backups on a schedule, for example, to create a daily backup with expiration set to 90 days:

velero -n kube-system create schedule daily \
  --schedule="0 0 * * *" \
  --ttl 2160h0m0s

Here are some more backup and restore command examples to get you started:

Backup:

# create a backup containing all resources
velero backup create backup1

# create a backup including only the nginx namespace
velero backup create nginx-backup --include-namespaces nginx

# create a backup excluding the velero and default namespaces
velero backup create backup2 --exclude-namespaces velero,default

# view the YAML for a backup that doesn't snapshot volumes, without sending it to the server
velero backup create backup3 --snapshot-volumes=false -o yaml

# wait for a backup to complete before returning from the command
velero backup create backup4 --wait

Restore:

# create a restore named "restore-1" from backup "backup-1"
velero restore create restore-1 --from-backup backup-1

# create a restore with a default name ("backup-1-") from backup "backup-1"
velero restore create --from-backup backup-1

# create a restore from the latest successful backup triggered by schedule "schedule-1"
velero restore create --from-schedule schedule-1

# create a restore from the latest successful OR partially-failed backup triggered by schedule "schedule-1"
velero restore create --from-schedule schedule-1 --allow-partially-failed

# create a restore for only persistentvolumeclaims and persistentvolumes within a backup
velero restore create --from-backup backup-2 --include-resources persistentvolumeclaims,persistentvolumes

Instead of relying on this post, which will be outdated in a few months, do yourself a favor and just run commands with a help argument to check on all available options.

`Cluster Migration`

As I mentioned at the beginning, I used Velero to migrate to a new cluster. Since all backups are in the S3 bucket, you can do a full restore fairly easily. 1. Install Velero on the new cluster, using the same config. If you are using kube2iam, you will have to install it as well. At this point, if you try to get backups on the new cluster, you should see the same data. 2. Create a manual backup on the old cluster and wait for it to finish velero -n kube-system backup create migration. 3. Switch to a new cluster and do a full restore velero -n kube-system restore create --from-backup migration. Depending on what you run in the cluster, you might need to adjust a few more things, but all resources that you have in the old cluster should be in the new one as well. A few more notes:

If some resource already exists and differs from one in the backup, Velero will not overwrite it. Instead, you should get a warning message.
Even if you remove a restore point from Velero with velero restore delete, Kubernetes resources will left intact.
All resources after a restore will have additional Velero labels velero.io/backup-name and velero.io/restore-name.
The pods that had persistent volume attached will have a new volume created from a snapshot. I suggest stopping stateful services in the original cluster before tacking a backup, to have all the data.

`Summary`

Velero is a must-have tool when running critical apps on the Kubernetes cluster. This post is just a quick introduction to show you how it works, but feel free to explore all options and make it work for your particular use case.

Check out my latest blog post on Kubernetes Velero https://t.co/IvQ0JkHrZp
— Alen (@alenkomljen) May 23, 2020



Moving the Remote Terraform State Items
Alen Komljen — Sat, 28 Mar 2020 19:56:51 GMT
Due to recent refactoring, I figured out that I need to move some Terraform state items from one S3 path to another. And then to merge configurations with other stuff at the destination directory. Terraform can move state items around, but this feature doesn't work with remote states. Here is one way of doing it.
Example Use Case
First, let's consider the following situation, this is configuration directory tree output:
.
├── db
│   └── test
│       ├── main.tf (s3 key: aws/db/test/terraform.tfstate)
│       └── rds.tf
├── test
│   ├── main.tf (s3 key: aws/test/terraform.tfstate)
│   ├── sqs.tf
You want to merge db/test state items into aws/test/terraform.tfstate state and move Terraform files together to match the following directory structure:
.
├── test
│   ├── main.tf (s3 key: aws/test/terraform.tfstate)
│   ├── sqs.tf
│   ├── rds.tf
NOTE: If you are using multiple workspaces, make sure they are the same when going through directories.

First, make local backups of both remote state files with the following command:
terraform state pull > terraform.tfstate
Then go to the directory which state items you want to migrate, in this case, db/test. Run the following commands:
# List all available items
terraform state list

# Move item from one state to another
terraform state mv -state-out=../test/terraform.tfstate aws_rds_cluster.test aws_rds_cluster.test
Each time you run terraform state mv command, Terraform automatically creates a backup of state file as well. Keep in mind that all this is happening locally, regardless of how you configured the backend. Go to the consolidated directory test and push the state file to the remote:
terraform state push terraform.tfstate
Then, if you do terraform state list in this directory, you should see moved state items. You can copy the remaining .tf files in this directory and make changes as needed.
If you want to move all state items with one command you could use simple bash one-liner:
for i in $(terraform state list); do terraform state mv -state-out=../test/terraform.tfstate $i $i; done
After you check that all is good, you can delete local state files and all auto-created backups with rm terraform.tfstate*.
Summary
Maybe there is a better way to do this, but I didn't want to spend too much time researching. I hope this article will help someone with a similar issue.


Quick Working From Home Tips
Alen Komljen — Sun, 22 Mar 2020 12:57:16 GMT
You will see hundreds of new working from home tips blog posts in the coming months. Most tech companies shifted to working from home (WFH) because of a global pandemic. We are all different, and what works for me will not work for you. I wrote this post to tell you that working from home is not rocket science. All you need is some routine. Don't try to copy, or blindly follow the rules. Build your own. Here is my daily routine, what works for me, and what I discovered during the last four years. Keep in mind, some of those will not work during the global lockdown, but here you have it, to get some new ideas.
My Daily Routine
I usually woke up at 7 am, brush my teeth, take a shower, eat 1000 kcal breakfast, and then start my working day. But, sometimes, I broke this routine by waking up and laying in my bed for an hour, checking the phone for no reason. Yeah, it happened, and I'm aware that this is not an excellent way to start my day. I try to leave my phone far away from the bed because I don't want to grab it first thing in the morning.
While in-home, I wear some pants, a t-shirt, and socks, like a typical day, spent indoor. Wearing a suit will not make me more productive. I like to make myself comfortable, but not too comfortable. Yeah, I had moments of working in pajama. It doesn't work, period. Taking your pajama off tells your brain that time is for something new. I'm making this up, but that is how I see it.
I have built my working space in an extra room, with a standing desk. But I'm too lazy to work standing. I bought a comfortable and expensive chair; it would be a bad investment if I'm standing all day. Jokes aside, a good chair and desk are essential, and rising from time to time will help to maintain a healthy lifestyle. Also, no matter how good your laptop is, posture while working on a laptop is very bad for your back. If you don't have space or don't want to invest in a monitor, consider getting a laptop stand to bring up display a little bit higher.
Lunchtime: Do not eat at your desk! That is my friendly advice. I had moments, but it is a bad habit. Have time for lunch even when working from home. My routine is to go to the kitchen, prepare a meal if I didn't do it a day before, eat and take some rest after. Also, I try to stay away from a phone, or a TV during lunch break. Meditation is great, but I have a hard time sticking to it. And if I want to take a quick snack, I eat it while standing in the kitchen. Sometimes I go outside for lunch with my friends.
I make short pauses during the day. Yeah, coding is all fun, but I need to stay healthy. Walking around in the home is an excellent way to think if you live alone. Also, I do some other stuff, like going to a grocery store or walking outside when the weather is nice.
Finishing a working day from home can be hard to do. You don't see other people leaving the office, and you don't need to catch public transport or something. Writing sync of what I did that day, and the plan for the next one helps me a lot. Then, going to the gym helps me to stay away from my laptop for a few hours, which makes an internal switch in my body that the working day ended.
And last, do whatever makes you happy afterward. Talk to people, go out, enjoy your family time. Simple stuff done today will make you more productive tomorrow.
Summary
After you read all this, you will see that productivity has nothing to do with working from home. Some of those things are good habits when working in the office as well. Organized people will be ok working from home, no questions. Some will even thrive more. Don't try to spice it up too much; it will go wrong eventually. Good luck to everyone!


Alerting on Kubernetes Events with EFK Stack
Alen Komljen — Sun, 03 Nov 2019 07:09:45 GMT
You probably care about gathering application logs only. Still, since the application is running on Kubernetes, you could get a lot of information about what is happening in the cluster by gathering events as well. Whatever happens inside the cluster, an event is recorded. You can check those events with kubectl events, but they are short-lived. To search or alert on a particular activity, you need to store them in a central place first. Now, let's see how to do that and then how to configure alerts.
Storing Events In Elasticsearch
The main requirement for this setup is the Elasticsearch cluster. If you don't know how to run EFK stack on Kubernetes, I suggest that you go through my post Get Kubernetes Logs with EFK Stack in 5 Minutes to learn more about it. If you already use my helm chart to deploy EFK stack, you should know that I improved it and added a switch to enable gathering events as well. However, if you already have your "version" of the EFK cluster, you could install Elastic's Metricbeat agent and configure it to ship events to that cluster instead. So, assuming you already have EFK stack, go ahead and install Metricbeat with helm:
$ cat > values-metricbeat.yaml<
NOTE: Use your hostname in the above configuration.
Events are available through Kubernetes API, and only one Metricbeat agent pod is enough to feed all events into the Elasticsarch. The next step is to configure Kibana for the new index. Go to settings, configure index to kubernetes_events-*, choose a @timestamp, and Kibana is ready. In the discovery tab, you should see all the events from all namespaces in your Kubernetes cluster. You can search for events as needed.
NOTE: Metricbeat adds quite a lot of fields, and by default, the Kibana wildcard search will not work as expected because it is limited to 1024 fields. You can still search a particular field, or increase the limit.
Configuring Alerts
Now when all events are indexed, you can send alerts when a particular query matches. After some research, I found ElastAlert quite excellent and simple to configure. You can install it with helm as well, again matching to your Elasticsearch host:
$ cat > values-elastalert.yaml<
    slack_webhook_url: 
    slack_msg_color: warning
EOF

$ helm install --name efk-alerts \
    --namespace logging \
    -f values-elastalert.yaml \
    stable/elastalert
In the above example, I configured ElastAlert to send an alert to a Slack channel when the pod gets killed because of the liveness probe. You need to set slack_webhook_url and slack_title_link. For slack title link I usually put saved Kibana search URL that matches the same query kubernetes.event.message: Killing*probe*.
ElastAlert instance can be used to add other alerts as well, like matching particular application log messages. Just add a new alert rule to values-elastalert.yaml and upgrade the helm chart to configure it:
$ helm upgrade efk-alerts \
    --namespace logging \
    -f values-elastalert.yaml \
    stable/elastalert
    
To learn more about all the options for ElastAlert, please check official documents. There are a lot of options and ways to configure it.
Summary
This article was just a short introduction to the primary use case where you want to gather all Kubernetes events in one place and to send an alert when a particular circumstance happens. I found it very useful, and I hope it will help you as well. Stay tuned for the next one.



Installing Kubernetes Dashboard per Namespace
Alen Komljen — Sun, 26 May 2019 12:05:22 GMT
Even though I'm not Kubernetes Dashboard user, I understand why it is the easiest way for most people to interact with their apps running on top of Kubernetes. If you are interacting with it daily or managing the cluster itself, you are probably more fine with CLI, aka kubectl. Kubernetes Dashboard is easy to install, but you might want to have it per namespace and to limit what users can do. Let's see how to install and configure it for this scenario.
The Problem
The latest version of Kubernetes dashboard v2.0 is running without a cluster-admin role, which was too dangerous. On top of that, all secrets are explicitly created, and ServiceAccount doesn't have permission to create any secret. This is excellent news. But, you probably want to limit users for touching anything that is not part of their namespace. Or even more, make it read-only and disable access to some sensitive info, like secrets.
Installation and Configuration
Kubernetes Dashboard does have namespace support. Installing the dashboard is a pretty straightforward process. So, let's say you want to install it in the default namespace. First, create a custom config for kubernetes-dashboard helm chart:
cat > values-dashboard.yaml<
NOTE: If you have a single sign-on solution with ingress, you can set --enable-skip-login and --enable-insecure-login extra args to disable dashboard authentication. With ingress terminating SSL, you need to set protocolHttp: true as well.
In the above config, I enabled metrics scraper. It is required to have a metrics server running in the cluster for this to work. If you don't have it installed, you can enable it via config.

Let's make this dashboard read only by adding custom role:
cat > kubernetes-dashboard-role.yaml< kubernetes-dashboard-rolebinding.yaml<
NOTE: Set a preferred namespace - default in this case.
The last step it to apply all those resources and to install the dashboard:
kubectl apply -f kubernetes-dashboard-role.yaml -n default
kubectl apply -f kubernetes-dashboard-rolebinding.yaml -n default

helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/

helm install --name dash \
  --namespace default \
  -f values-dashboard.yaml \
  kubernetes-dashboard/kubernetes-dashboard
You can go ahead and check if the dashboard works as expected. If you try to list namespaces or check cluster-wide resources, you will get the namespaces is forbidden error message, which is what you wanted to achieve by running it per namespace. You can further adjust `dash-kubernetes-dashboard-read-only` role to make it work for your use case.
Summary
Installing a Kubernetes Dashboard per namespace and with limited options is a simple way of adding more visibility to the Kubernetes cluster without violating security. Stay tuned for the next one.



Integrating AWS IAM and Kubernetes 
with kube2iam
Alen Komljen — Sat, 18 May 2019 10:57:34 GMT
Containers deployed on top of Kubernetes sometimes requires easy access to AWS services. You have a few options to configure this. Most common is providing AWS access credentials to a particular pod or updating existing worker nodes IAM role with additional access rules. Pods in the AWS environment, by default, have the same access rules as underlying nodes. However, both solutions are a terrible practice, because there are projects that resolve this issue more elegantly. Two most popular are kube2iam and KIAM. They are pretty similar, but let's focus on kube2iam in this post.
The Problem and a Solution
I usually go ahead with installation and configuration, but you should understand AWS IAM and the problem in the environments like Kubernetes where containers are sharing the underlying nodes. I am noting a few sentences from the official kube2iam readme.

Traditionally in AWS, service level isolation is done using IAM roles. IAM roles are attributed through instance profiles and are accessible by services through the transparent usage by the aws-sdk of the ec2 metadata API. When using the aws-sdk, a call is made to the EC2 metadata API which provides temporary credentials that are then used to make calls to the AWS service.
The problem with this approach is that you cannot isolate a particular container for access to some AWS service with IAM roles - shared nodes.
The solution is to redirect the traffic that is going to the ec2 metadata API for docker containers to a container running on each instance, make a call to the AWS API to retrieve temporary credentials and return these to the caller. Other calls will be proxied to the EC2 metadata API. This container will need to run with host networking enabled so that it can call the EC2 metadata API itself.
Installation and Configuration
Tools that you need to follow this guide are helm for installation and AWS CLI for interacting with AWS. First, gather some info about your cluster to be able to configure kube2iam pods. For EKS based clusters use eni+ as interface name. You can find more interfaces based on your CNI provider here. Also, to get Amazon Resource Name (ARN) from instance profiles, you can use this command:
$ aws iam list-instance-profiles | jq -r '.InstanceProfiles[].Roles[].Arn'
With output like this arn:aws:iam::1234567890:role/test-worker-nodes-NodeInstanceRole-1W9NK0A56SMQ6, the first part is base role ARN arn:aws:iam::1234567890:role/ and the second part is node instance role name test-worker-nodes-NodeInstanceRole-1W9NK0A56SMQ6. You will need those below.
Here is the finalized config and installation command:
$ cat > values-kube2iam.yaml <
NOTE: iptables rules prevent containers from having direct access to EC2 metadata API. Please read this part carefully to understand what is happening in the background.
Kube2iam works by intercepting traffic from the containers to the EC2 Metadata API, calling the AWS Security Token Service (STS) API to obtain temporary credentials using the pod configured role, then using these temporary credentials to perform the original request.
You have to create a policy file to add permissions for AWS STS to assume roles on worker nodes:
$ cat > kube2iam-policy.json <
NOTE: When you create the roles that the pods can assume, they need to start with k8s-, and that is why I put a wildcard in the above policy.
If everything works as expected, curl command from a new pod to a metadata API, should return kube2iam:
$ curl http://169.254.169.254/latest/meta-data/iam/security-credentials/
kube2iam
Real World Examples
Let's see how to use kube2iam to give Cert Manager pods access to Route53 to manage records. DNS cluster issuer needs access to Route53 for DNS records validation. 
First, you need to define the trust policy of the role to allow kube2iam (via the worker node IAM Instance Profile Role) to assume the pod role:
$ cat > node-trust-policy.json <
Then define and attach Route53 policy to the above role name:
$ cat > route53-policy.json <
If you want to add some other services and to use kube2iam, reuse the existing node trust policy file to define a new role. For example, if you want to deploy Cluster Autoscaler:
$ aws iam create-role \
    --role-name k8s-cluster-autoscaler \
    --assume-role-policy-document \
    file://node-trust-policy.json
Then define a new policy and attach it to k8s-cluster-autoscaler role. 

The last step is to configure pods to use particular role name by providing annotation iam.amazonaws.com/role: k8s-cert-manager or iam.amazonaws.com/role: k8s-cluster-autoscaler, as defined in those examples.
Another useful feature of kube2iam is namespace restrictions, but I'm sure you would figure it out after reading this post.
Summary
When I was working with Kubernetes and AWS IAM roles for the first time, I spent more time than planned to figure it out. Maybe lack of AWS IAM knowledge, but I hope that this guide will help you to get started easier. I also recommend trying KIAM before deciding which solution works best for you.



An Easy Way to Track New Releases on GitHub
Alen Komljen — Sat, 12 Jan 2019 13:35:24 GMT
As a software developer, you need to keep track of many projects/tools hosted on GitHub. While GitHub has a watch feature, I found it too noisy. I open GitHub notifications once in a year, or maybe less. If you like GitHub notifications, great, you can watch a repo only for releases; they made this possible two months ago. This blog post probably doesn't make any sense for you then.
However, I don't want to get a notification on new releases; I want a simple way to check if new release came out for the projects that I use when I want.
Background story, I switched from Firefox to Chrome a few years ago, when Firefox was so slow, that I needed to ditch it. When I finally switched to Chrome, I immediately missed the Firefox live bookmarks feature, to be able to subscribe to RSS/Atom feeds and read them from bookmarks toolbar. For Chrome I found Foxish Live RSS add-on which does, well, almost the same job. So, you need to install this or similar plugin for this guide.
Now, go to chrome://bookmarks/, and in the right up corner create a new folder, I called it GitHub releases. You can show a bookmarks toolbar under the URL bar to be always visible, but I guess you already know how to do it.
So, how do you subscribe for GitHub releases? For example, I want to add the Kubernetes project. Go to releases page of this project https://github.com/kubernetes/kubernetes/releases, and add .atom in the URL path, like this https://github.com/kubernetes/kubernetes/releases.atom. Yes, GitHub supports Atom feeds. The new page pops up (if you have Foxish Live RSS add-on installed):
Choose a parent folder, GitHub releases in my case, and click on subscribe. Now, you can check if there is a new Kubernetes release with one click:
You can click on the version number, and the new page opens with release details. I hope this will help you to track some exciting projects and boost your productivity while having fewer notifications.


AWS ALB Ingress Controller for Kubernetes
Alen Komljen — Sun, 06 Jan 2019 18:54:52 GMT
More than one year ago CoreOS introduced AWS ALB (Application Load Balancer) support for Kubernetes. This project was born out of Ticketmaster's tight relationship with CoreOS. It was in an alpha state for a long time, so I waited for some beta/stable release to put my hands on it. The project is donated to Kubernetes SIG-AWS on June 1, 2018, and now there is a lot more activity. A few months ago the first stable version got released. Let's try the ALB ingress and see how it compares to Nginx ingress or more advanced Contour ingress that I wrote about in some previous posts.
How Does it Work?
On this picture you can see how ALB ingress fits together with Kubernetes:
1) ALB ingress controller (https://github.com/kubernetes-sigs/aws-alb-ingress-controller/blob/master/docs/imgs/controller-design.png)
And here is a standard ingress controller like Nginx for comparison:
2) Standard ingress controller
The significant difference when comparing these two is that standard ingress is running in the cluster. Name-based routing and SSL termination are happening inside the pod which shares the same cluster resources as your app. ELB/NLB (Network Load Balancer) acts just like a gateway to the outside world, and only ingress controller service is connected to it.
In the case of ALB, you can clearly see that app services are exposed to nodes via node port and all routing happens inside the ALB. This also means that you cannot use cert-manager with ALB ingress to automatically get SSL certificates for example, because ALB is outside of cluster scope. Instead, you could use AWS certificates, create them in advance and just select which one to use. More on SSL with ALB here.
Managing ALBs is automatic, and you only need to define your ingress resources as you would typically do. ALB ingress controller pod which is running inside the Kubernetes cluster communicates with Kubernetes API and does all the work. However, this pod is only a control plane; it doesn't do any proxying and stuff like that.
Keep in mind that ALB is layer 7 load balancer, so no TCP here. If you want TCP capabilities, you could define NLB and put it in front of ALB. Putting NLB in front also helps if you want static IPs, but it is a manual step.
Deployment
Let's deploy ALB ingress controller with Helm:
$ helm repo add akomljen-charts https://raw.githubusercontent.com/komljen/helm-charts/master/charts/

$ helm install --name=alb \
    --namespace ingress \
    --set-string autoDiscoverAwsRegion=true \
    --set-string autoDiscoverAwsVpcID=true \
    --set clusterName=k8s.test.akomljen.com \
    --set extraEnv.AWS_ACCESS_KEY_ID= \
    --set extraEnv.AWS_SECRET_ACCESS_KEY= \
    akomljen-charts/alb-ingress
NOTE: Keep in mind that using AWS access and secret keys is not a good idea for production. Check this post - Integrating AWS IAM and Kubernetes with kube2iam.
After a few minutes the ALB controller should be up and running:
 $ kubectl get pods -l "app=alb-ingress,release=alb" -n ingress
NAME                               READY   STATUS    RESTARTS   AGE
alb-alb-ingress-5bcd44fb59-mtf65   1/1     Running   0          1m
Actual ALB will not be created until you create an ingress object which is expected. Let's try to create some sample app:
$ cat > sample-app.yaml <        80/TCP    20s

NAME                              DESIRED   CURRENT   READY   AGE
replicaset.apps/blog-696457695f   3         3         3       21s
Service of the app needs to be exposed as NodePort (picture 1) for ALB to function properly.
When creating an ALB ingress resource you need to specify at least two subnets using alb.ingress.kubernetes.io/subnets annotation. You could also rely on subnet auto-discovery, but then you need to tag your subnets with:
kubernetes.io/cluster/: owned
kubernetes.io/role/internal-elb: 1 (for internal ELB)
kubernetes.io/role/elb: 1 (for external ELB)
If you deployed Kubernetes cluster with kops, subnet tags already exist and you should be fine. By default ALB will be internal, so you need to add alb.ingress.kubernetes.io/scheme: internet-facing annotation if you want to access the app externally. Check the full list of annotations supported by ALB ingress to suit your needs. Now, let's create the ALB ingress resource for the above app:
$ cat > ingress.yaml <
NOTE: If you specify a host for ingress, you need to add ALB address to Route53 to be able to access it externally. Or, deploy external DNS to manage Route53 records automatically, which is also recommended.
The sample app should be available using the above ingress address.
One ALB for All Hosts
For each additional ALB ingress resource, the completely new ALB will be created. You maybe want as few ALBs as possible for all ingresses instead of 1-to-1 mapping - check this issue for more details. Luckily there is a workaround that you could use, meet the ingress merge controller.
This is how it works:
You create ingress objects like usual, but they need to be annotated with kubernetes.io/ingress.class: merge and merge.ingress.kubernetes.io/config:  where you specify a config map name which holds the annotations for resulting ingress
Merge controller watches for ingress resources annotated with kubernetes.io/ingress.class: merge and using defined config map, merges them together resulting in new ingress object
Let's deploy the ingress merge controller and try it:
$ helm repo add akomljen-charts https://raw.githubusercontent.com/komljen/helm-charts/master/charts/

$ helm install --name imc \
    --namespace ingress \
    akomljen-charts/merge-ingress

$ kubectl get pods --selector=app=merge-ingress -n ingress
NAME                                 READY   STATUS    RESTARTS   AGE
imc-merge-ingress-579fcd6f54-kdg8f   1/1     Running   0          55s
First, you need to create a config map which holds annotations for resulting ingress:
$ cat > merged-ingress-cm.yaml <
Then you can delete existing ingress created in the previous step and create the new one, but now use kubernetes.io/ingress.class: merge and merge.ingress.kubernetes.io/config: blog-ingress annotations:
$ kubectl delete -f ingress.yaml

$ cat > merged-ingress.yaml <
Seeing two ingresses with same ALB address is confusing, but merge ingress controller is just propagating the status of merged ingress blog-ingress to blog ingress.
The downside of using ingress merge controller is that all ingresses shares the same annotations defined in the config map. However, you can create more config maps per ALB ingress group. Then you have the flexibility of using just enough ALBs to cover all groups of ingress resources. Also, for each namespace, you still need another ALB.
Summary
Here you go, a new option to expose your services when running the Kubernetes cluster on AWS. I hope that this post will help you to decide which ingress controller to use and to understand some significant differences when comparing different ingress controllers.



10 Most Read Kubernetes Articles on My Blog in 2018
Alen Komljen — Tue, 01 Jan 2019 19:37:41 GMT
Let's start this year with some stats from the last one, 2018. Probably 99% of the articles on this blog are Kubernetes related. I wrote 28 articles in 2018 which is good, but my goal was 50 actually. I think that Kubernetes adoption in 2019 will grow, at least stats from my blog shows that and you will see the same in Google trends:
Before listing 10 most read Kubernetes articles on my blog in 2018, I will share some stats from Google Analytics.
Stats from Google Analytics
Most of the users on this blog come from organic search, actually, 74,8% of them, 13,4% direct, 7,1% referral and 4.7% come from social networks, email, and other channels:
This blog visited 146,367 unique users that come from all over the world. Here are the top 10 countries:
10 Most Read Kubernetes Articles in 2018
Now let's see which articles are the most visited:
Set Up a Jenkins CI/CD Pipeline with Kubernetes - 43,717 (13,85%)
Get Kubernetes Cluster Metrics with Prometheus in 5 Minutes - 29,744 (9.43%)
Kubernetes Nginx Ingress Controller - 28,528 (9.04%)
Kubernetes Persistent Volumes with Deployment and StatefulSet - 21,092 (6.68%)
Get Kubernetes Logs with EFK Stack in 5 Minutes - 17,151 (5.44%)
Kubernetes Service Mesh - 15,589 (4.94%)
Kubernetes Environment Variables - 13,843 (4.39%)
Get Automatic HTTPS with Let's Encrypt and Kubernetes Ingress - 12,930 (4.10%)
Rook: Cloud Native On-Premises Persistent Storage for Kubernetes on Kubernetes - 10,305 (3.27%)
Kubernetes Cluster Autoscaling on AWS - 9,350 (2,96%)
Please keep in mind that some posts are written earlier and those will probably have more visits. For example, the first on the list is written in February, while the second is written two months later, so normally it will have more visits.
I wish you a happy new year and a smooth transition to Kubernetes if that is your tool of choice!


Kubernetes API Resources: Which Group and Version to Use?
Alen Komljen — Fri, 30 Nov 2018 13:43:23 GMT
Kubernetes uses declarative API which makes the system more robust. But, this means that we create an object using CLI or REST to represent what we want the system to do. For representation, we need to define things like API resource name, group, and version. But users get confused. The main reason for the confusion is that we as humans are not good at remembering things like this. In one deployment definition you could see this apiVersion: apps/v1beta2, and in another apiVersion: apps/v1. Which one is correct? Which you should use? How to check which are supported on your Kubernetes cluster? Those are all valid questions and I will try to explain it using simple trick, the kubectl.
API Resources
You can get all API resources supported by your Kubernetes cluster using this command:
$ kubectl api-resources -o wide
NAME                              SHORTNAMES   APIGROUP                       NAMESPACED   KIND                             VERBS
bindings                                                                      true         Binding                          [create]
componentstatuses                 cs                                          false        ComponentStatus                  [get list]
configmaps                        cm                                          true         ConfigMap                        [create delete deletecollection get list patch update watch]
endpoints                         ep                                          true         Endpoints                        [create delete deletecollection get list patch update watch]
events                            ev                                          true         Event                            [create delete deletecollection get list patch update watch]
limitranges                       limits                                      true         LimitRange                       [create delete deletecollection get list patch update watch]
namespaces                        ns                                          false        Namespace                        [create delete get list patch update watch]
nodes                             no                                          false        Node                             [create delete deletecollection get list patch proxy update watch]
persistentvolumeclaims            pvc                                         true         PersistentVolumeClaim            [create delete deletecollection get list patch update watch]
persistentvolumes                 pv                                          false        PersistentVolume                 [create delete deletecollection get list patch update watch]
pods                              po                                          true         Pod                              [create delete deletecollection get list patch proxy update watch]
podtemplates                                                                  true         PodTemplate                      [create delete deletecollection get list patch update watch]
replicationcontrollers            rc                                          true         ReplicationController            [create delete deletecollection get list patch update watch]
resourcequotas                    quota                                       true         ResourceQuota                    [create delete deletecollection get list patch update watch]
secrets                                                                       true         Secret                           [create delete deletecollection get list patch update watch]
serviceaccounts                   sa                                          true         ServiceAccount                   [create delete deletecollection get list patch update watch]
services                          svc                                         true         Service                          [create delete get list patch proxy update watch]
mutatingwebhookconfigurations                  admissionregistration.k8s.io   false        MutatingWebhookConfiguration     [create delete deletecollection get list patch update watch]
validatingwebhookconfigurations                admissionregistration.k8s.io   false        ValidatingWebhookConfiguration   [create delete deletecollection get list patch update watch]
customresourcedefinitions         crd          apiextensions.k8s.io           false        CustomResourceDefinition         [create delete deletecollection get list patch update watch]
apiservices                                    apiregistration.k8s.io         false        APIService                       [create delete deletecollection get list patch update watch]
controllerrevisions                            apps                           true         ControllerRevision               [create delete deletecollection get list patch update watch]
daemonsets                        ds           apps                           true         DaemonSet                        [create delete deletecollection get list patch update watch]
deployments                       deploy       apps                           true         Deployment                       [create delete deletecollection get list patch update watch]
replicasets                       rs           apps                           true         ReplicaSet                       [create delete deletecollection get list patch update watch]
statefulsets                      sts          apps                           true         StatefulSet                      [create delete deletecollection get list patch update watch]
...
I trimmed the output as there are many of them. There is a lot of useful information here, let's explain some interesting ones:
SHORTNAMES - you can use those shortcuts with kubectl
APIGROUP - check the official docs to learn more, but in short, you will use it like this apiVersion: /v1 in yaml files
KIND - the resource name
VERBS - available methods, also useful when you want to define ClusterRole RBAC rules
You also have the option to get API resources for a particular API group, for example:
$ kubectl api-resources --api-group apps -o wide
NAME                  SHORTNAMES   APIGROUP   NAMESPACED   KIND                 VERBS
controllerrevisions                apps       true         ControllerRevision   [create delete deletecollection get list patch update watch]
daemonsets            ds           apps       true         DaemonSet            [create delete deletecollection get list patch update watch]
deployments           deploy       apps       true         Deployment           [create delete deletecollection get list patch update watch]
replicasets           rs           apps       true         ReplicaSet           [create delete deletecollection get list patch update watch]
statefulsets          sts          apps       true         StatefulSet          [create delete deletecollection get list patch update watch]
For each of those kinds you could use kubectl explain to get more info about the particular resource:
$ kubectl explain configmap
KIND:     ConfigMap
VERSION:  v1

DESCRIPTION:
     ConfigMap holds configuration data for pods to consume.

FIELDS:
   apiVersion	
     APIVersion defines the versioned schema of this representation of an
     object. Servers should convert recognized schemas to the latest internal
     value, and may reject unrecognized values. More info:
     https://git.k8s.io/community/contributors/devel/api-conventions.md#resources

   data	
     Data contains the configuration data. Each key must consist of alphanumeric
     characters, '-', '_' or '.'.

   kind	
     Kind is a string value representing the REST resource this object
     represents. Servers may infer this from the endpoint the client submits
     requests to. Cannot be updated. In CamelCase. More info:
     https://git.k8s.io/community/contributors/devel/api-conventions.md#types-kinds

   metadata	
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata
Please note that explain may show an old group/version, but you can explicitly set it with --api-version, for example, kubectl explain replicaset --api-version apps/v1. Thanks @markoluksa for the tip!
API Versions
You can also get all API versions supported by your cluster using this command:
$ kubectl api-versions
admissionregistration.k8s.io/v1beta1
apiextensions.k8s.io/v1beta1
apiregistration.k8s.io/v1beta1
apps/v1
apps/v1beta1
apps/v1beta2
authentication.k8s.io/v1
authentication.k8s.io/v1beta1
authorization.k8s.io/v1
authorization.k8s.io/v1beta1
autoscaling/v1
autoscaling/v2beta1
batch/v1
batch/v1beta1
certificates.k8s.io/v1beta1
certmanager.k8s.io/v1alpha1
enterprises.upmc.com/v1
events.k8s.io/v1beta1
extensions/v1beta1
metrics.k8s.io/v1beta1
monitoring.coreos.com/v1
networking.k8s.io/v1
policy/v1beta1
rbac.authorization.k8s.io/v1
rbac.authorization.k8s.io/v1beta1
storage.k8s.io/v1
storage.k8s.io/v1beta1
v1
The output is presented in form of "group/version". Check this page, to learn more about API versioning in Kubernetes.

Sometimes, you just want to check if a particular group/version is available for some resource. Most resources have available get method, so just try to get a resource while providing API version and group kubectl get ... For example:
$ kubectl get deployments.v1.apps -n kube-system
NAME                                    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
autoscaler-aws-cluster-autoscaler       1         1         1            1           55d
calico-kube-controllers                 1         1         1            1           317d
calico-policy-controller                0         0         0            0           317d
dns-controller                          1         1         1            1           317d
elasticsearch-operator                  1         1         1            1           274d
hpa-hpa-operator                        1         1         1            1           60d
hpa-metrics-server                      1         1         1            1           60d
kube-dns                                2         2         2            2           317d
kube-dns-autoscaler                     1         1         1            1           317d
spot-rescheduler-k8s-spot-rescheduler   1         1         1            1           136d
tiller-deploy                           1         1         1            1           315d
You will get the error if the resource doesn't exist with specified group/version combination or if the resource doesn't exist at all.
Summary
This article will help you to understand what are those two lines in yaml, kind and apiVersion next time you see them. If you want to learn more about Kubernetes design I recommend checking this post. Stay tuned for the next one!


How to Run Dev.to on Kubernetes
Alen Komljen — Sun, 25 Nov 2018 19:31:57 GMT
Recently I was checking Dev.to community. I must say, I really like how the application looks, clean and simple. And more important I like the community there. I also started to republish some posts because I want to show Kubernetes to the larger audience, preferably developers. But, any time I check something new I get some new ideas. This time I saw that Dev.to is open source and thought, it would be pretty interesting for people to see how to run it on Kubernetes. I will explain how I do it as an experienced Kubernetes user. So, let's start!
Check the Installation Docs
Kind of obvious, this is the first step. Look into repo docs, getting started guides, check for Dockerfiles and other goodies. If the repo has a Dockerfile available, great, you will save some time on building docker image yourself.
I also found one Dev.to image on DockerHub, but it is old and I will not use it. Dockerfile from the repo is last updated a month ago at the time of writing, which is good. I will just have to build the image and store it under my namespace on DockerHub. You can use any Docker registry, doesn't matter.
Also, I will need to run PostgreSQL. I know that there is an image available for sure, so I went right ahead and checked if there is also a helm chart available. I will use helm to package this app for easier installation on Kubernetes. I was writing about helm before so you might check it out - Package Kubernetes Applications with Helm. Seems like there is an upstream PostgreSQL chart, so less work for me 😎.
The helm is not a must have, but is the way how I like to run apps on Kubernetes.
Build Docker Image
I see that there is a getting started guide for Docker in the repo. This is good, if you can run it and test using docker compose, it will be easier to migrate it to Kubernetes later. Also, you can learn a lot about app architecture just by looking into Dockerfile, docker-compose.yaml, and other config files.
While checking the available Dockerfile I figure out that it will not work for Kubernetes. The reason for this is not because it needs to be special, but this Dockerfile is for development only which assume that you will mount the local repo data into the container. I will have to clone repo inside the Dockerfile to get the code. Here is updated Dockerfile:
FROM ruby:2.5.3

# Make nodejs and yarn as dependencies
RUN curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | apt-key add -
RUN curl -sL https://deb.nodesource.com/setup_8.x | bash
RUN echo "deb https://dl.yarnpkg.com/debian/ stable main" | tee /etc/apt/sources.list.d/yarn.list

# Install dependencies and perform clean-up
RUN apt-get update -qq && apt-get install -y \
   build-essential \
   nodejs \
   yarn \
 && apt-get -q clean \
 && rm -rf /var/lib/apt/lists

RUN git clone https://github.com/thepracticaldev/dev.to /usr/src/app
WORKDIR /usr/src/app
ENV RAILS_ENV development

# Installing Ruby dependencies
RUN gem install bundler
RUN bundle install --jobs 20 --retry 5

# Install JavaScript dependencies
ENV YARN_INTEGRITY_ENABLED "false"
RUN yarn install && yarn check --integrity

ENTRYPOINT ["bundle", "exec"]

CMD ["rails", "server", "-b", "0.0.0.0", "-p", "3000"]
NOTE: This Dockerfile is not good for production, I'm pretty much aware of its size and other things but this is not the point of this blog post.
And the diff when compared to original Dockerfile in the repo:
< RUN git clone https://github.com/thepracticaldev/dev.to /usr/src/app
> COPY Gemfile* ./
> COPY yarn.lock ./
As you can see, just a simple change. Let's build the Dev.to docker image:
$ docker build -t komljen/devto:latest .
$ docker push komljen/devto:latest
Building image will take some time. The image is a little bit on a large side, 1.76GB, but ok. Don't go into optimizations at this point.
I always skip some steps from getting started guides. Definitely not proud of that. Didn't even try to follow instructions, but that is how I learn, the hard way.
Time for Kubernetes
And here is the tricky part, but only because I need to learn more about how everything is glued together for the app to work correctly. After some research, I think I got it. So, this is how it will look:
Create the new helm chart for Dev.to frontend
Put PostgreSQL as a requirement, so when you install Dev.to chart, PG will be installed also. PG will be persistent of course
Set default values for both, PG and Dev.to in one values.yaml file
Run rails db:setup as init container on install, and on upgrade, it will become rails db:migrate. Would be nicer if I could run only migrate
For config, create a config map and secret. Both will be available as env variables in the container
For DB access you simply use config map/secret from PG chart
Worker jobs will be running in a separate container, but the same pod
And after some trial and error here is Dev.to helm chart. Most of the issues I had during the development of this chart was because I don't have much experience with Ruby apps and especially about running Dev.to. Also, I found it very difficult to run the app in production mode because of too many dependencies on external services. So I gave up on that after I saw 30+ API keys missing. Definitely, the app is not developed with Kubernetes or containers in mind and some changes might be required to make it a first-class citizen.
I will not go into the details of this helm chart, it would be too much for a single post, but here are some tips when developing a new chart.
When writing the new helm chart I often reuse some existing that I wrote before. Who would say? Then I just adjust what is needed for the new app. You could also run helm create  and helm would prepare some templates, to begin with. During testing, I often use helm template command which populates templates without using Tiller (the server component of helm which is running in Kubernetes cluster). Also, you could use --dry-run --debug options to see populated files with helm install.
Installation
First, prepare your values file to match with your environment. You only need Algolia Search API access for the development environment and you can find instructions for setting up a new account here.
Now, you can install the chart:
$ cat > myvalues.yaml <
  ALGOLIASEARCH_APPLICATION_ID: 
  ALGOLIASEARCH_SEARCH_ONLY_KEY: 

ingress:
  enabled: true
  annotations:
    kubernetes.io/ingress.class: nginx
    kubernetes.io/tls-acme: "true"
  hosts:
    - devto.test.akomljen.com
  tls:
   - secretName: devto
     hosts:
       - devto.test.akomljen.com
EOF

$ helm repo add akomljen-charts https://raw.githubusercontent.com/komljen/helm-charts/master/charts/      
      
$ helm install --name dev \
    -f myvalues.yaml \
    akomljen-charts/devto

$ kubectl get po
NAME                        READY     STATUS    RESTARTS   AGE
dev-devto-d7f847969-dbtkq   2/2       Running   0          3m
dev-postgresql-0            1/1       Running   0          3m      
My Kubernetes cluster has a Nginx ingress running with automatic SSL, so after installation, I was able to access the app immediately. It is slow when accessing the app for the first time because in development mode it runs webpack compile.
Also, I see that app fails to load profile pictures, but I didn't have time to dig further into it. Maybe you know the answer. Either way, you saw how I run the not so ready for containers app in Kubernetes.
Summary
The time that I spent to write this post and prepare everything was around 6h. I would say not too much considering that Dev.to is pretty new to me. And also most of the time was some polishing and proofreading this post. Hope it will help you to understand the process of running open source apps on Kubernetes better. Stay tuned for the next one!


Kubernetes Contour Ingress Controller for Envoy Proxy
Alen Komljen — Thu, 08 Nov 2018 18:49:16 GMT
Most users while starting to learn Kubernetes will get to the point of exposing some resources outside the cluster. This is like a Hello World example in the Kubernetes world. And in most cases, the solution to this problem is the ingress controller. Think of ingress as a reverse proxy. Ingress sits between the Kubernetes service and Internet. It provides name-based routing, SSL termination, and other goodies. Often when approaching this problem users will choose Nginx. And the reason is simple, it is all over the place, almost every article about ingress refers to Nginx. The main reason for this is that Nginx was here from the start, almost. I was referring to it in my blog post as well. But, the situation is quite different today as we have some great alternatives. Welcome to Heptio Contour ingress controller.
The Raise of CRDs
Before talking about Contour and how it is different compared to Nginx for example, or any other "standard" ingress controller I have to mention Custom Resource Definitions or CRDs. Actually, I'm mentioning it a lot on this blog, but you need to appreciate how easy is to extend Kubernetes with custom resources.
As the name suggests, with custom resources you can define additional objects and extend your Kubernetes cluster with new features. Contour team did a great job introducing IngressRoute object which doesn't depend on standard ingress. I encourage you to take a look at design doc to learn more. This means that the team behind Contour can extend its functionality without depending on the whole community, but at the same time they give us new ideas. In the end, we can expect that some of those things will end up in upstream Kubernetes as well. Maybe an ingress v2 😉.
Deployment
I created a helm chart for Contour deployment. The chart will install the Contour and Envoy proxy as deployment, both running in the same pod. We could have those separate, or even run it as daemon set. Maybe I will add it as an option to the helm chart later. I know, I also need to add a README.
Some notes:
If you are running on-premises you could expose Envoy proxy as node port and then you will be able to access your service on each k8s node.
When running in the cloud you will have an additional component that sits between Envoy proxy and the Internet, load balancer. If you are running on AWS preferred load balancer is NLB, which compared to classic ELB, doesn't terminate the connection and has a lower latency. Also, it is cheaper.
Let's deploy Contour ingress controller with Envoy proxy, and use NLB as my cluster is running on AWS:
$ helm repo add akomljen-charts https://raw.githubusercontent.com/komljen/helm-charts/master/charts/

$ helm install --name heptio \
  --namespace ingress \
  --set proxy.service.annotations."service\.beta\.kubernetes\.io/aws-load-balancer-type"=nlb \
  akomljen-charts/contour

$ kubectl get pod -n ingress --selector=app=contour
NAME                              READY     STATUS    RESTARTS   AGE
heptio-contour-7b7694f98d-cxfnx   2/2       Running   0          1m
NOTE: If you are running k8s v1.9 or lower, NLB will not work! More info here.
If everything goes well you should get ELB/NLB running in your cluster. You can get its address with:
$ kubectl get svc heptio-contour -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' -n ingress
a00950ebcfd0411e740ee0207cf10ce8-1089949860.eu-west-1.nlb.amazonaws.com
And then use this address to create a wildcard DNS A record *.test.example.com in Route53.
NOTE: External DNS is the project that you might want to look at, but not the scope of this post and above wildcard DNS will be ok for ingress testing.
Example Workloads
You can now run different workloads and use ingress route objects to create ingress rules. Of course, standard ingress is also supported. Let's test a few examples. First I need to run some test app. I will create a simple web app based on dockersamples/static-site docker image. This is a Nginx container that will display a unique name which will help us to identify which app we are accessing. Let's create a deployment:
$ cat > blog.yaml <
And a service:
$ cat > s1.yaml <
Let's create both:
$ kubectl apply -f blog.yaml -f s1.yaml

$ kubectl get po --selector=app=blog
NAME                    READY     STATUS    RESTARTS   AGE
app1-5d4d466cc7-4vh9l   1/1       Running   0          10s
app1-5d4d466cc7-fj489   1/1       Running   0          10s
app1-5d4d466cc7-wndhn   1/1       Running   0          10s
Ok, so the service is running and we can expose it now. Let's say I want to have this service available on app.test.example.com:
$ cat > main.yaml <
If you try to access app.test.example.com you should get this page:
Nothing special here, but let's adjust a different path now. Instead of match: / set match: /blog and apply changes. If you try to access app.test.example.com/blog it will not work. This is expected because the service itself doesn't have /blog path available. You can resolve this issue with rewriting to /. Just add prefixRewrite: "/" and apply the changes again:
spec: 
  routes: 
    - match: /blog
      prefixRewrite: "/"
      services: 
        - name: s1
          port: 80
Now it should work again. The big difference, when compared to standard ingress object, is the ability to set prefix rewrite per route. This is not possible with Nginx because it uses annotations. You could do some workaround, but it's messy.
All the above is not much different from standard ingress. The key features of ingress route are:
Better support of multi-team Kubernetes clusters
A delegation of routing configuration for a path or namespace
Multiple services within a single route
Supports defining service weighting and load balancing strategy (no annotations here)
Probably the most interesting Contour feature is the ability to delegate one route to another. Basically, you can connect multiple ingress route objects to work like one. In above example, you might want to delegate / path to another ingress route object. That object can also be in the different namespace.
Let's create another deployment app2 in test namespace this time:
$ cat > app2.yaml <
I already have ingress route main in the default namespace, and now I want for that ingress route to delegate / path to ingress route in test namespace:
$ cat > delegate-from-main.yaml <
As you can see the status is orphaned because this ingress route doesn't have a host. The last step is to edit the existing main ingress route in default namespace and add a delegate rule:
apiVersion: contour.heptio.com/v1beta1
kind: IngressRoute
metadata:
  name: main
spec:
  virtualhost:
    fqdn: app.test.example.com
  routes:
    - match: /blog
      prefixRewrite: "/"
      services:
        - name: s1
          port: 80
    - match: /
      delegate:
        name: delegate-from-main
        namespace: test
And if you check the status of new ingress route, it has changed from orphaned to valid. Finally, app.test.example.com will point to app2 and app.test.example.com/blog to blog.
There are other interesting features which I didn't cover here:
The ability to run health checks from Envoy proxy (those are completely separate from k8s health checks)
You can add weights to different routes (canary deployments) 
Support for different load-balancing strategies
WebSocket support
So, is there anything missing? Most users are using automatic Let's Encrypt SSL with cert manager. Unfortunately, cert manager will not work with ingress route, yet. For more details please check this issue. In any case, you can still use Contour with standard ingress objects and have SSL.
Summary
I hope I give you some ideas when considering Contour as your default ingress controller and embracing the ingress route. The more we use it, the better it gets. Stay tuned for the next one!



Kubernetes Add-ons for more Efficient Computing
Alen Komljen — Sun, 30 Sep 2018 21:23:32 GMT
I will say that "starting" a Kubernetes cluster is a relatively easy job. Deploying your application to work on top of Kubernetes requires more effort especially if you are new to containers. For people that worked with Docker this can also be a relatively easy job, but of course, you need to master new tools like Helm for example. Then, when you put all together and when you try to run your application in production you will find out there are a lot of missing pieces. Probably Kubernetes doesn't do much, right? Well, Kubernetes is extensible, and there are some plugins or add-ons that will make your life easier.
What are Kubernetes Add-ons?
In short, add-ons extend the functionality of Kubernetes. There are many of them, and chances are that you already using some. For example, network plugins or CNIs like Calico or Flannel, or CoreDNS (now a default DNS manager), or famous Kubernetes Dashboard. I say famous because that is probably the first thing that you will try to deploy once the cluster is running :). Those listed above are some core components, CNIs are must have, the same for DNS to have your cluster function properly. But there is much more you can do once you start deploying your applications. Enter the Kubernetes add-ons for more efficient computing!
Cluster Autoscaler - CA
Cluster Autoscaler scales your cluster nodes based on utilization. CA will scale up the cluster if you have pending pods and scale it down if nodes are not utilized that much - default set to 0.5 and configurable with --scale-down-utilization-threshold. You definitely don't want to have pods in pending state and at the same time, you don't want to run underutilized nodes - waste of money!
Use case: You have two instance groups or autoscaling groups in your AWS cluster. They are running in two availability zones 1 and 2. You want to scale your cluster based on utilization, but also you want to have a similar number of nodes in both zones. Also, you want to use CA auto-discovery feature, so that you don't need to define min and max number of nodes in CA as those are already defined in your auto scaling groups. And you want to deploy CA on your master nodes.
Here is the example installation of CA via Helm to match above use case:
⚡ helm install --name autoscaler \
    --namespace kube-system \
    --set autoDiscovery.clusterName=k8s.test.akomljen.com \
    --set extraArgs.balance-similar-node-groups=true \
    --set awsRegion=eu-west-1 \
    --set rbac.create=true \
    --set rbac.pspEnabled=true \
    --set nodeSelector."node-role\.kubernetes\.io/master"="" \
    --set tolerations[0].effect=NoSchedule \
    --set tolerations[0].key=node-role.kubernetes.io/master \
    stable/cluster-autoscaler
There are some additional changes you need to make for this to work. Please check this post for more details - Kubernetes Cluster Autoscaling on AWS.
Horizontal Pod Autoscaler - HPA
The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization. With custom metrics support, on some other application-provided metrics as well. 
HPA is not something new in the Kubernetes world, but Banzai Cloud recently released HPA Operator which simplifies it. All you need to do is to provide annotations to your Deployment or StatefulSet and HPA operator will do the rest. Take a look at supported annotations here.
Installation of HPA operator is fairly simple with Helm:
⚡ helm repo add akomljen-charts https://raw.githubusercontent.com/komljen/helm-charts/master/charts/

⚡ helm install --name hpa \
    --namespace kube-system \
    akomljen-charts/hpa-operator

⚡ kubectl get po --selector=release=hpa -n kube-system
NAME                                  READY     STATUS    RESTARTS   AGE
hpa-hpa-operator-7c4d47dd4-9khpv      1/1       Running   0          1m
hpa-metrics-server-7766d7bc78-lnhn8   1/1       Running   0          1m
With Metrics Server deployed you also have kubectl top pods command available. It could be useful to monitor your CPU or memory usage for pods! ;)
HPA can fetch metrics from a series of aggregated APIs ( metrics.k8s.io, custom.metrics.k8s.io, and external.metrics.k8s.io). But, usually, HPA will use metrics.k8s.io API provided by Heapster (deprecated as of Kubernetes 1.11) or Metrics Server.
After you add annotations to your Deployment you should be able to monitor it with:
⚡ kubectl get hpa
NAME       REFERENCE             TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
test-app   Deployment/test-app   0%/70%    1         4         1          10m
Keep in mind that the CPU target that you see above is based on defined CPU requests for this particular pod, not overall CPU available on the node.
Addon Resizer
Addon resizer is an interesting plugin that you could use with Metrics Server in the above scenario. As you deploy more pods to your cluster eventually Metrics Server will need more resources. Addon resizer container watches over another container in a Deployment (Metrics Server for example) and vertically scales the dependent container up and down. Addon resizer can scale Metrics Server linearly based on the number of nodes. For more details check official docs.
Vertical Pod Autoscaler - VPA
You need to define CPU and memory requests for services that you will deploy on Kubernetes. If you don't default CPU request is set to 100m or 0,1 of available CPUs. Resource requests help kube-scheduler to decide on which node to run a particular pod. But, it is hard to define "good enough" values that will be suitable for more environments. Vertical Pod Autoscaler adjusts CPU and memory requests automatically based on the resource used by a pod. It uses Metrics Server to get pod metrics. Keep in mind that you still need to define resource limits manually.
I will not cover the details here as VPA really needs a dedicated blog post, but there are a few things that you should know:
VPA is still an early stage project, so be aware
Your cluster must support MutatingAdmissionWebhooks, which is enabled by default since Kubernetes 1.9
It doesn't work together with HPA
It will restart all your pods when resource requests are updated, kind of expected
Descheduler
The kube-scheduler is a component responsible for scheduling in Kubernetes. But, sometimes pods can end up on the wrong node due to Kubernetes dynamic nature. You could be editing existing resources, to add node affinity or (anti) pod affinity, or you have more load on some servers and some are running almost on idle. Once the pod is running kube-scheduler will not try to reschedule it again. Depending on the environment you might have a lot of moving parts.
Descheduler checks for pods that can be moved and evicts them based on defined policies. Descheduler is not a default scheduler replacement and depends on it. This project is currently in Kubernetes incubator and not ready for production yet. But, I found it very stable and it worked nicely. Descheduler will run in your cluster as CronJob.
I wrote a dedicated post Meet a Kubernetes Descheduler which you should check for more details.
k8s Spot Rescheduler 
I was trying to solve an issue of managing multiple auto scaling groups on AWS, where one group are on-demand instances and others are a spot. The problem is that once you scale up the spot instance group you want to move the pods from on-demand instances so you can scale it down. k8s spot rescheduler tries to reduce the load on on-demand instances by evicting pods to spots if they are available. In reality, the rescheduler can be used to remove load from any group of nodes onto a different group of nodes. They just need to be labeled appropriately.
I also created a Helm chart for easier deployment:
⚡ helm repo add akomljen-charts https://raw.githubusercontent.com/komljen/helm-charts/master/charts/

⚡ helm install --name spot-rescheduler \
    --namespace kube-system \
    --set image.tag=v0.2.0 \
    --set cmdOptions.delete-non-replicated-pods="true" \
    akomljen-charts/k8s-spot-rescheduler
For a full list of cmdOptions check here. 
For k8s spot rescheduler to work properly you need to label your nodes:
on-demand nodes - node-role.kubernetes.io/worker: "true"
spot nodes - node-role.kubernetes.io/spot-worker: "true"
and add PreferNoSchedule taint on on-demand instances to ensure that k8s spot rescheduler prefers spots when making scheduling decisions.
Summary
Please keep in mind that some of the above add-ons are not compatible to work together! Also, there might be some interesting add-on that I missed here, so please let us know in comments. Stay tuned for the next one.