Sunday, April 26, 2020

Backup and restore Kubernetes cluser

I am checking how can I back up and restore my Kubernetes cluster.



First, we need Object storage like S3.
Minio is opensource Object storage which is compatible for Amazon S3

Setting Up a Minio:
====================
We can run Minio in a docker environment
Stable
docker pull minio/minio
docker run -p 9000:9000 minio/minio server /data


root@testing:/home/ubuntu# docker pull minio/minio
Using default tag: latest
latest: Pulling from minio/minio
4167d3e14976: Already exists
275c32df8f5e: Pull complete
cf0c84ce4772: Pull complete
70885164616a: Pull complete
Digest: sha256:6f8db3d7a1060cb1fcd6855791e9befe2d7f51644be65183680c1189eb196177
Status: Downloaded newer image for minio/minio:latest
root@testing:/home/ubuntu# docker run --name minio -p 9000:9000 -v data:/data minio/minio server /data
Endpoint:  http://172.17.0.3:9000  http://127.0.0.1:9000
Browser Access:
   http://172.17.0.3:9000  http://127.0.0.1:9000
Object API (Amazon S3 compatible):
   Go:         https://docs.min.io/docs/golang-client-quickstart-guide
   Java:       https://docs.min.io/docs/java-client-quickstart-guide
   Python:     https://docs.min.io/docs/python-client-quickstart-guide
   JavaScript: https://docs.min.io/docs/javascript-client-quickstart-guide
   .NET:       https://docs.min.io/docs/dotnet-client-quickstart-guide
Detected default credentials 'minioadmin:minioadmin', please change the credentials immediately using 'MINIO_ACCESS_KEY' and 'MINIO_SECRET_KEY'
VELERO SETUP:
==========================
Installing the Velero binary
ajeesh@Aspire-A515-51G:~/Downloads/valero$ wget https://github.com/vmware-tanzu/velero/releases/download/v1.3.2/velero-v1.3.2-linux-amd64.tar.gz
--2020-04-26 21:27:41--  https://github.com/vmware-tanzu/velero/releases/download/v1.3.2/velero-v1.3.2-linux-amd64.tar.gz
Resolving github.com (github.com)... 13.234.176.102
Connecting to github.com (github.com)|13.234.176.102|:443... connected.

velero-v1.3.2-linux-amd64.tar.gz      100%[=======================================================================>]  23.39M  1.30MB/s    in 25s

2020-04-26 21:28:08 (956 KB/s) - ‘velero-v1.3.2-linux-amd64.tar.gz’ saved [24528427/24528427]


ajeesh@Aspire-A515-51G:~/Downloads/valero$ tar zxf velero-v1.3.2-linux-amd64.tar.gz
ajeesh@Aspire-A515-51G:~/Downloads/valero$ sudo mv velero-v1.3.2-linux-amd64/velero /usr/local/bin/



Next, you need to update your Minio logins for Velero to configure.

# cat <> minio.credentials
> [default]
> aws_access_key_id=minioadmin
> aws_secret_access_key=minioadmin
> EOF
root@Aspire-A515-51G:
root@Aspire-A515-51G:/velero# ls
minio.credentials


velero$ echo $KUBECONFIG
/home/ajeesh/.kube/config

velero$ /usr/local/bin/velero install   --provider aws  --plugins velero/velero-plugin-for-aws:v1.0.0 --bucket bucketone  --secret-file ./minio.credentials    --backup-location-config region=minio,s3ForcePathStyle=true,s3Url=http://myip:9000

CustomResourceDefinition/backups.velero.io: attempting to create resource
CustomResourceDefinition/backups.velero.io: created
CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource
CustomResourceDefinition/backupstoragelocations.velero.io: created
CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource
CustomResourceDefinition/deletebackuprequests.velero.io: created
CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource
CustomResourceDefinition/downloadrequests.velero.io: created
CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource
CustomResourceDefinition/podvolumebackups.velero.io: created
CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource
CustomResourceDefinition/podvolumerestores.velero.io: created
CustomResourceDefinition/resticrepositories.velero.io: attempting to create resource
CustomResourceDefinition/resticrepositories.velero.io: created
CustomResourceDefinition/restores.velero.io: attempting to create resource
CustomResourceDefinition/restores.velero.io: created
CustomResourceDefinition/schedules.velero.io: attempting to create resource
CustomResourceDefinition/schedules.velero.io: created
CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource
CustomResourceDefinition/serverstatusrequests.velero.io: created
CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource
CustomResourceDefinition/volumesnapshotlocations.velero.io: created
Waiting for resources to be ready in cluster...
Namespace/velero: attempting to create resource
Namespace/velero: created
ClusterRoleBinding/velero: attempting to create resource
ClusterRoleBinding/velero: created
ServiceAccount/velero: attempting to create resource
ServiceAccount/velero: created
Secret/cloud-credentials: attempting to create resource
Secret/cloud-credentials: created
BackupStorageLocation/default: attempting to create resource
BackupStorageLocation/default: created
VolumeSnapshotLocation/default: attempting to create resource
VolumeSnapshotLocation/default: created
Deployment/velero: attempting to create resource
Deployment/velero: created
Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.
ajeesh@Aspire-A515-51G:~/test

ajeesh@Aspire-A515-51G:~$ kubectl get all -n velero
NAME                          READY   STATUS    RESTARTS   AGE
pod/velero-795c8d58cd-fc86d   1/1     Running   2          5m17s

NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/velero   1/1     1            1           5m17s

NAME                                DESIRED   CURRENT   READY   AGE
replicaset.apps/velero-795c8d58cd   1         1         1       5m17s
ajeesh@Aspire-A515-51G:~$

ajeesh@Aspire-A515-51G:~$ kubectl -n velero get crds
NAME                                          CREATED AT
backups.velero.io                             2020-04-26T16:42:32Z
backupstoragelocations.velero.io              2020-04-26T16:42:32Z
bgpconfigurations.crd.projectcalico.org       2019-11-06T07:35:55Z
bgppeers.crd.projectcalico.org                2019-11-06T07:35:55Z
blockaffinities.crd.projectcalico.org         2019-11-06T07:35:55Z
clusterinformations.crd.projectcalico.org     2019-11-06T07:35:55Z
deletebackuprequests.velero.io                2020-04-26T16:42:32Z
downloadrequests.velero.io                    2020-04-26T16:42:32Z
felixconfigurations.crd.projectcalico.org     2019-11-06T07:35:55Z
globalnetworkpolicies.crd.projectcalico.org   2019-11-06T07:35:55Z
globalnetworksets.crd.projectcalico.org       2019-11-06T07:35:55Z
hostendpoints.crd.projectcalico.org           2019-11-06T07:35:55Z
ipamblocks.crd.projectcalico.org              2019-11-06T07:35:55Z
ipamconfigs.crd.projectcalico.org             2019-11-06T07:35:55Z
ipamhandles.crd.projectcalico.org             2019-11-06T07:35:55Z
ippools.crd.projectcalico.org                 2019-11-06T07:35:55Z
networkpolicies.crd.projectcalico.org         2019-11-06T07:35:55Z
networksets.crd.projectcalico.org             2019-11-06T07:35:55Z
podvolumebackups.velero.io                    2020-04-26T16:42:32Z
podvolumerestores.velero.io                   2020-04-26T16:42:32Z
resticrepositories.velero.io                  2020-04-26T16:42:32Z
restores.velero.io                            2020-04-26T16:42:32Z
schedules.velero.io                           2020-04-26T16:42:32Z
serverstatusrequests.velero.io                2020-04-26T16:42:32Z
volumesnapshotlocations.velero.io             2020-04-26T16:42:32Z
ajeesh@Aspire-A515-51G:~$

Here I have used the following values and variables for configuring the Velero installation.
=========
Velero Version: v1.3.2
velero plugin for AWS = velero/velero-plugin-for-aws:v1.0.0
http://myip:9000 = is the address for Minio container
$KUBECONFIG= home/ajeesh/.kube/config
===========

ajeesh@Aspire-A515-51G:~$ kubectl get ns
NAME                   STATUS   AGE
default                Active   172d
kube-node-lease        Active   172d
kube-public            Active   172d
kube-system            Active   172d
kubernetes-dashboard   Active   146d
metallb-system         Active   161d
velero                 Active   14m
ajeesh@Aspire-A515-51G:~$

For Velero commands to autocomplete:

ajeesh@Aspire-A515-51G:~$ source <(velero completion bash)
ajeesh@Aspire-A515-51G:~$ velero backup
backup           backup-location

For testing I am creating a test namespace for Velero backup:

ajeesh@Aspire-A515-51G:~$ kubectl create ns nginxtest
namespace/nginxtest created

ajeesh@Aspire-A515-51G:~$ kubectl get ns
NAME                   STATUS   AGE
default                Active   172d
kube-node-lease        Active   172d
kube-public            Active   172d
kube-system            Active   172d
kubernetes-dashboard   Active   146d
metallb-system         Active   161d
nginxtest              Active   4s
velero                 Active   20m
ajeesh@Aspire-A515-51G:~$ kubectl -n nginxtest run nginx --image nginx --replicas 2
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
deployment.apps/nginx created
ajeesh@Aspire-A515-51G:~$

Create a VELERO BACKUP
----------------------
ajeesh@Aspire-A515-51G:~$ velero backup create namespacenginx --include-namespaces=nginxtest
Backup request "namespacenginx" submitted successfully.
Run `velero backup describe namespacenginx` or `velero backup logs namespacenginx` for more details.
ajeesh@Aspire-A515-51G:~$

ajeesh@Aspire-A515-51G:~$ velero backup get
NAME             STATUS   CREATED   EXPIRES   STORAGE LOCATION   SELECTOR
namespacenginx   New           29d                         
ajeesh@Aspire-A515-51G:~$


ajeesh@Aspire-A515-51G:~$ kubectl -n velero get backups
NAME             AGE
namespacenginx   97s
ajeesh@Aspire-A515-51G:~$

While checking the logs i can see the following

ajeesh@Aspire-A515-51G:~$ velero backup logs namespacenginx
Logs for backup "namespacenginx" are not available until it's finished processing. Please wait until the backup has a phase of Completed or Failed and try again.
ajeesh@Aspire-A515-51G:~$ 

This seems to be some issue and my backup has some issue, I need to further check this.


ajeesh@Aspire-A515-51G:~$ velero backup describe namespacenginx
Name:         namespacenginx
Namespace:    velero
Labels:       
Annotations: 
Phase:  New
Namespaces:
  Included:  nginxtest
  Excluded: 
Resources:
  Included:        *
  Excluded:       
  Cluster-scoped:  auto
Label selector: 
Storage Location:
Snapshot PVs:  auto
TTL:  720h0m0s
Hooks: 
Backup Format Version:  0
Started:   
Completed: 
Expiration: 
Persistent Volumes:
ajeesh@Aspire-A515-51G:~$

ajeesh@Aspire-A515-51G:~$ velero restore create -help
Error: unknown shorthand flag: 'e' in -elp
Usage:
  velero restore create [RESTORE_NAME] [--from-backup BACKUP_NAME | --from-schedule SCHEDULE_NAME] [flags]

Examples:
  # create a restore named "restore-1" from backup "backup-1"
  velero restore create restore-1 --from-backup backup-1

  # create a restore with a default name ("backup-1-") from backup "backup-1"
  velero restore create --from-backup backup-1
  # create a restore from the latest successful backup triggered by schedule "schedule-1"
  velero restore create --from-schedule schedule-1

  # create a restore from the latest successful OR partially-failed backup triggered by schedule "schedule-1"
  velero restore create --from-schedule schedule-1 --allow-partially-failed

  # create a restore for only persistentvolumeclaims and persistentvolumes within a backup
  velero restore create --from-backup backup-2 --include-resources persistentvolumeclaims,persistentvolumes


For a stand-alone cluster we would require the following data for the backup
1. The root certificate files /etc/kubernetes/pki/ca.crt and /etc/kubernetes/pki/ca.key
2. ECTD backup

ECTD backup:
I have followed below steps:

$ kubectl get pods -n kube-system | grep etcd
etcd-kmaster.example.com                      1/1     Running   21         186d
$ kubectl exec -it -n kube-system etcd-kmaster.example.com  -- /bin/sh

# etcdctl snapshot save
No help topic for 'snapshot'

In this case, I would require to issue the following command.

# export ETCDCTL_API=3

Then our backup command

# export ETCDCTL_API=3
# etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key snapshot save etcd-snapshot-$(date +%Y-%m-%d_%H:%M:%S_%Z).db
{"level":"warn","ts":"2020-05-10T17:01:10.150Z","caller":"clientv3/retry_interceptor.go:116","msg":"retry stream intercept"}
Snapshot saved at etcd-snapshot-2020-05-10_17:01:10_UTC.db
#du -shc etcd-snapshot-2020-05-10_17:01:10_UTC.db
3.9M etcd-snapshot-2020-05-10_17:01:10_UTC.db
3.9M total