Last update:
Recently I migrated some Kubernetes clusters, managed by Amazon EKS. The clusters were running in public subnets, so I wanted to make them more secure by utilizing private and public subnets where needed. Changing networking settings is not possible once you create the service in AWS. Any service, not just EKS. Since I already had Velero installed for backups with S3 provider, the most natural thing was to use it to restore all resources on the new cluster as well.
Velero Installation
Velero (formerly Heptio Ark) gives you tools to back up and restore your Kubernetes cluster resources and persistent volumes.
With Velero, you can do disaster recovery, data migration, and data protection. Once installed and running, it will backup all Kubernetes resources to S3 compatible object store and make a snapshot of persistent volumes. It supports all major cloud providers and many more.
NOTE: Installation instructions assume that you are familiar with tools like kube2iam or kiam for providing secure access to AWS resources. If you are not, please check my post first Integrating AWS IAM and Kubernetes with kube2iam.
You have to do some preparation steps. Install AWS CLI, and follow the below commands.
1. Create an S3 bucket, and make it private:
aws s3api create-bucket \
--bucket velero-test-backups \
--region us-east-1
aws s3api put-public-access-block --bucket velero-test-backups \
--public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true" \
--region us-east-1
2. Create an IAM Role which will be assumed by Velero pod (more details on node trust policy in kube2iam blog post mentioned before):
cat > node-trust-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
},
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::1234567890:role/k8s-worker-nodes-NodeInstanceRole-1W9NK0A56SMQ6"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
aws iam create-role \
--role-name k8s-velero \
--assume-role-policy-document \
file://node-trust-policy.json
3. Create and attach IAM policy to previously created IAM role for EC2 and S3 access:
cat > s3-velero-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeVolumes",
"ec2:DescribeSnapshots",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:CreateSnapshot",
"ec2:DeleteSnapshot"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:DeleteObject",
"s3:PutObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Resource": [
"arn:aws:s3:::velero-test-backups/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::velero-test-backups"
]
}
]
}
EOF
aws iam put-role-policy \
--role-name k8s-velero \
--policy-name s3 \
--policy-document file://s3-velero-policy.json
4. Install Velero with official Helm chart:
cat > velero-values.yaml <<EOF
podAnnotations:
iam.amazonaws.com/role: k8s-velero
configuration:
provider: aws
backupStorageLocation:
name: aws
bucket: velero-test-backups
config:
region: us-east-1
volumeSnapshotLocation:
name: aws
config:
region: us-east-1
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:v1.0.0
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /target
name: plugins
EOF
helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
helm install backup \
--namespace kube-system \
-f velero-values.yaml \
vmware-tanzu/velero
5. Install Velero CLI to manage backups and restores:
brew intstall velero
If everything is ok, you should be able to run Velero commands, for example, to create a new backup:
velero -n kube-system backup create test
velero -n kube-system backup get
NAME STATUS CREATED EXPIRES STORAGE LOCATION SELECTOR
test Completed 2020-05-22 14:06:12 +0200 CEST 29d default <none>
NOTE: If you have persistent volumes backed by EBS, each time you create a new backup, Velero will create an EBS snapshot. Snapshot creation can take some time, and the completed status you get from the Velero doesn't mean that snapshot is ready.
Or, you can run backups on a schedule, for example, to create a daily backup with expiration set to 90 days:
velero -n kube-system create schedule daily \
--schedule="0 0 * * *" \
--ttl 2160h0m0s
Here are some more backup and restore command examples to get you started:
Backup:
# create a backup containing all resources
velero backup create backup1
# create a backup including only the nginx namespace
velero backup create nginx-backup --include-namespaces nginx
# create a backup excluding the velero and default namespaces
velero backup create backup2 --exclude-namespaces velero,default
# view the YAML for a backup that doesn't snapshot volumes, without sending it to the server
velero backup create backup3 --snapshot-volumes=false -o yaml
# wait for a backup to complete before returning from the command
velero backup create backup4 --wait
Restore:
# create a restore named "restore-1" from backup "backup-1"
velero restore create restore-1 --from-backup backup-1
# create a restore with a default name ("backup-1-<timestamp>") from backup "backup-1"
velero restore create --from-backup backup-1
# create a restore from the latest successful backup triggered by schedule "schedule-1"
velero restore create --from-schedule schedule-1
# create a restore from the latest successful OR partially-failed backup triggered by schedule "schedule-1"
velero restore create --from-schedule schedule-1 --allow-partially-failed
# create a restore for only persistentvolumeclaims and persistentvolumes within a backup
velero restore create --from-backup backup-2 --include-resources persistentvolumeclaims,persistentvolumes
Instead of relying on this post, which will be outdated in a few months, do yourself a favor and just run commands with a help argument to check on all available options.
Cluster Migration
As I mentioned at the beginning, I used Velero to migrate to a new cluster. Since all backups are in the S3 bucket, you can do a full restore fairly easily.
1. Install Velero on the new cluster, using the same config. If you are using kube2iam, you will have to install it as well. At this point, if you try to get backups on the new cluster, you should see the same data.
2. Create a manual backup on the old cluster and wait for it to finish velero -n kube-system backup create migration
.
3. Switch to a new cluster and do a full restore velero -n kube-system restore create --from-backup migration
.
Depending on what you run in the cluster, you might need to adjust a few more things, but all resources that you have in the old cluster should be in the new one as well.
A few more notes:
- If some resource already exists and differs from one in the backup, Velero will not overwrite it. Instead, you should get a warning message.
- Even if you remove a restore point from Velero with
velero restore delete
, Kubernetes resources will left intact. - All resources after a restore will have additional Velero labels
velero.io/backup-name
andvelero.io/restore-name
. - The pods that had persistent volume attached will have a new volume created from a snapshot. I suggest stopping stateful services in the original cluster before tacking a backup, to have all the data.
Summary
Velero is a must-have tool when running critical apps on the Kubernetes cluster. This post is just a quick introduction to show you how it works, but feel free to explore all options and make it work for your particular use case.