Extend Physical Volume Claim (PVC) Size Without Data Loss in OpenShift 4

9 min readMay 20, 2024

In some cases we need to extend our Physical Volume Claim (PVC) due to the insufficient storage pvc in our Openshift/Kubernetes applications. Extending PVC can be challenging because it could result in data loss. Here I’ll share how to extend our PVC, in this case will extend PVC in cluster monitoring Openshift.

Depending on your application and cluster, this step may likely differ. Please utilize the steps below if you want to extend in the same way as me. I hope this will be interesting for your, so don’t forgot to prepare your coffee :)

Disable Cluster Monitoring

Create yaml file to disable cluster monitoring since prometheus is running under this operator.

$ cat <<EOF > disable-monitoring.yaml
- op: add
  path: /spec/overrides
  value:
  - kind: Deployment
    group: apps
    name: cluster-monitoring-operator
    namespace: openshift-monitoring
    unmanaged: true
  - kind: Deployment
    group: apps
    name: prometheus-operator
    namespace: openshift-monitoring
    unmanaged: true
EOF

2. Patch CVO (Cluster Version Operator)

$ oc patch clusterversion version --type json -p "$(cat disable-monitoring.yaml)"
clusterversion.config.openshift.io/version patched <-- output

3. Scale Down Deployments

$ oc get deployment -n openshift-monitoring|grep -v NAME| awk {'print $1'}
-- output
cluster-monitoring-operator
grafana
kube-state-metrics
openshift-state-metrics
prometheus-adapter
prometheus-operator
telemeter-client
thanos-querier

$ deployments=$(oc get deployment -n openshift-monitoring|grep -v NAME| awk {'print $1'}); for i in $deployments;do oc scale deployment/$i --replicas=0 -n openshift-monitoring; done
--output
deployment.apps/cluster-monitoring-operator scaled
deployment.apps/grafana scaled
deployment.apps/kube-state-metrics scaled
deployment.apps/openshift-state-metrics scaled
deployment.apps/prometheus-adapter scaled
deployment.apps/prometheus-operator scaled
deployment.apps/telemeter-client scaled
deployment.apps/thanos-querier scaled

Before scale down deployment openshift-monitoring

After scale down deployment openshift-monitoring

4. Scale Down Statefullsets

$ oc get statefulsets -n openshift-monitoring|grep -v NAME|awk {'print $1'}
--output
alertmanager-main
prometheus-k8s

$ statefulsets=$(oc get statefulsets -n openshift-monitoring|grep -v NAME|awk {'print $1'}); for i in $statefulsets; do oc scale statefulset/$i --replicas=0 -n openshift-monitoring; done
--output 
statefulset.apps/alertmanager-main scaled
statefulset.apps/prometheus-k8s scaled

It’s only remaining pod of node-exporter, so all pod under openshift-monitoring will be down during the extension.

Migrate PVC Data Manually

Switch to the openshift-monitoring project

$ oc project openshift-monitoring

2. Scale down the prometheus-k8s Statefulset

$ oc scale --replicas=0 sts/prometheus-k8s

3. Start a new deployment

We are deploying dump pod using redhat image to mount the old and new pvc.

$ oc create deployment sleep --image=registry.access.redhat.com/rhel7/rhel-tools -- tail -f /dev/null
deployment.apps/sleep created

$ oc get pod 
NAME                     READY   STATUS    RESTARTS
sleep-659d44b8c7-m5szk   1/1     Running   0

4. Mount old PVC

The current PVC (size 510Gi)

PVC before extend

We’ll extend pvc prometheus-k8s-db-prometheus-k8s-0 from 510Gi to 1Tb.

$ oc set volume deployment/sleep --add -t pvc --name=old-claim --claim-name=prometheus-k8s-db-prometheus-k8s-0 --mount-path=/old-claim
deployment.apps/sleep volume updated

$ oc get pod 
sleep-55cc7d449d-djkbd   0/1     ContainerCreating   0          9s
sleep-659d44b8c7-m5szk   1/1     Running             0          8m53s

The pod will be auto re-deploy to mount the old pvc

$ oc get pod sleep-55cc7d449d-djkbd -o yaml  | grep volumeMounts -A2
    volumeMounts:
    - mountPath: /old-claim
      name: old-claim

$ oc get pod sleep-55cc7d449d-djkbd -o yaml  | grep volumes -A2
  volumes:
  - name: old-claim
    persistentVolumeClaim:
      claimName: prometheus-k8s-db-prometheus-k8s-0

From that command, we’ll attach the old/current pvc with name old-claim from the pvc prometheus-k8s-db-prometheus-k8s-0 and mounting using /old-claim in the pod level from deployment sleep.

Once pod is up, it will shown in the volumes and volumesMount inside of the pod configuration and we can verify by ssh to the pod.

$ oc rsh sleep-55cc7d449d-djkbd
sh-4.2$ ls -l /old-claim
total 20
drwxrws---.  2 root       1000420000 16384 Jan 26 14:45 lost+found
drwxrwsr-x. 26 1000420000 1000420000  4096 May 13 13:31 prometheus-db
sh-4.2$ ls -l /old-claim/prometheus-db/
total 116
drwxrwsr-x. 3      65534 1000420000  4096 Apr 29 23:00 01HWP1CWAXQ9WVVEB2A7FSHX0T
drwxrwsr-x. 3      65534 1000420000  4096 Apr 30 17:00 01HWQZ6N92JTH2C9HBDSM3NFZ4
drwxrwsr-x. 3      65534 1000420000  4096 May  1 11:00 01HWSWZZCHN71G9JTA4YSVVSQ7
....
....

5. Create and mount new PV/PVC (make the new size as same with the old/current pvc size or 510Gi which needed to sync before migrating)

$ oc set volume deployment/sleep --add -t pvc --name=new-claim --claim-name=new-claim --mount-path=/new-claim --claim-mode=ReadWriteOnce --claim-size=510Gi --claim-class=sc-xx-prometheus-k8s-ds01
deployment.apps/sleep volume updated

$ oc get pvc
NAME                                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                    AGE
new-claim                            Pending                                                                        sc-xx-prometheus-k8s-ds01       15s

-> wait till the status is Bound, it will take much time depend on the size
-> Openshift will request to Vcenter through API to provision new PVC

Why we are creating the new PVC with the same size? because it’s a best practice of rsync process to avoid any issues during syncing. Before run the command make sure the backing storage has enough space.

The process behind is Openshift will request to vcenter to provision new virtual disk, hence we need to wait till the VPC is available and bound to the pod.

Once the new PVC with the same size already created, it will shown as Bound.

NAME                                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES 
new-claim                            Bound    pvc-e3a2f0c8-4fd7-48b5-82fb-a7f1b2ae3998   510Gi      RWO          
prometheus-k8s-db-prometheus-k8s-0   Bound    pvc-5f48ba54-6d7e-4814-abb0-e07f17513f11   510Gi      RWO

$ oc get pod sleep-6d446bc5cb-t22dz -o yaml | grep volumes -A6
  volumes:
  - name: old-claim
    persistentVolumeClaim:
      claimName: prometheus-k8s-db-prometheus-k8s-0
  - name: new-claim
    persistentVolumeClaim:
      claimName: new-claim

$ oc get pod sleep-6d446bc5cb-t22dz -o yaml | grep volumeMount -A4
    volumeMounts:
    - mountPath: /old-claim
      name: old-claim
    - mountPath: /new-claim
      name: new-claim

Pod has two mount PVC with different path

6. Connect to the pod and migrate the data

$ oc rsh sleep-X-XXXXX

sh-4.2$ ls -l | grep claim
drwxrwsr-x.   3 root 1000420000 4096 May 13 15:17 new-claim
drwxrwsr-x.   4 root 1000420000 4096 Jan 26 14:46 old-claim
sh-4.2$

sh-4.2$ rsync -avxHAX --progress /old-claim/* /new-claim
sending incremental file list
rsync: failed to set times on "/new-claim/lost+found": Operation not permitted (1)
lost+found/
prometheus-db/
prometheus-db/queries.active
         20,001 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=161/164)
prometheus-db/01HWP1CWAXQ9WVVEB2A7FSHX0T/
...
...
sent 13,341,548,468 bytes  received 2,676 bytes  45,612,140.66 bytes/sec
total size is 13,338,281,889  speedup is 1.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1179) [sender=3.1.2]
sh-4.2$

It is okey we can ignored due to the lost+found file

sh-4.2$ du -sh /old-claim/prometheus-db
13G     /old-claim/prometheus-db
sh-4.2$ du -sh /new-claim/prometheus-db
13G     /new-claim/prometheus-db
sh-4.2$

7. Scale down the sleep deployment

$ oc scale --replicas=0 deployment/sleep
deployment.apps/sleep scaled

sleep-6d446bc5cb-t22dz   1/1     Terminating

wait until pod is gone

8. Delete the PVC prometheus-k8s-db-prometheus-k8s-0 as the data is now copied to new pvc new-claim.

$ oc delete pvc prometheus-k8s-db-prometheus-k8s-0

$ oc get pvc
NAME                                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   
new-claim                            Bound    pvc-e3a2f0c8-4fd7-48b5-82fb-a7f1b2ae3998   510Gi      RWO            
prometheus-k8s-db-prometheus-k8s-1   Bound    pvc-04b34ca3-534b-48f3-9795-1c9a8b55a964   476838Mi   RWO

9. Mount the new-claim PVC to /old-claim path

Because the data already move and old pvc was deleted, we can mount the temporary new-claim path to the old-claim path (/old-claim).. Below is the illustration:

$ oc set volume deployment/sleep --add -t pvc --name=old-claim --claim-name=new-claim --mount-path=/old-claim  --overwrite

If we see from the deployment configuration, we could see the new pvc new-claim is mounted to the old-claim as well.

$ oc get deploy sleep -o yaml | grep volumeMounts -A4
        volumeMounts:
        - mountPath: /new-claim
          name: new-claim
        - mountPath: /old-claim
          name: old-claim
$ oc get deploy sleep -o yaml | grep volumes -A6
      volumes:
      - name: new-claim
        persistentVolumeClaim:
          claimName: new-claim
      - name: old-claim
        persistentVolumeClaim:
          claimName: new-claim

10. Create PVC prometheus-k8s-db-prometheus-k8s-0 and mount it under /new-claim

Right now the /new-claim is mounted to the temporary new-claim but it’s already mount to the /old-claim as well, hence we can overwrite that new-claim with the pvc which has higher size.

$ oc set volume deployment/sleep --add -t pvc --name=new-claim --claim-name=prometheus-k8s-db-prometheus-k8s-0 --mount-path=/new-claim --claim-mode=ReadWriteOnce --claim-size=953675Mi --claim-class=sc-xx-prometheus-k8s-ds01 --overwrite

Again, it will take much time depend on the size.

Once the PVC already created, now the mount deployment of the pod will become as the below:

.....
.....        
        volumeMounts:
        - mountPath: /old-claim
          name: old-claim
        - mountPath: /new-claim
          name: new-claim
.....
.....
      volumes:
      - name: old-claim
        persistentVolumeClaim:
          claimName: new-claim
      - name: new-claim
        persistentVolumeClaim:
          claimName: prometheus-k8s-db-prometheus-k8s-0

$ oc get pvc 
NAME                                 STATUS   VOLUME                                     CAPACITY  
new-claim                            Bound    pvc-e3a2f0c8-4fd7-48b5-82fb-a7f1b2ae3998   510Gi     
prometheus-k8s-db-prometheus-k8s-0   Bound    pvc-0d314eb5-2fce-4f3c-9866-552b3a737915   953675Mi  
prometheus-k8s-db-prometheus-k8s-1   Bound    pvc-04b34ca3-534b-48f3-9795-1c9a8b55a964   476838Mi

11. Scale up the deployment sleep

$ oc scale --replicas=1 deploy/sleep

12. Take rsh into the container and migrate data to the new storage

$ oc rsh sleep-X-XXXXX
sh-4.2$ ls -l new-claim/
total 16
drwxrws---. 2 root 1000420000 16384 May 19 14:21 lost+found
sh-4.2$ ls -l old-claim/
total 20
drwxrws---.  2 root       1000420000 16384 May 13 15:17 lost+found
drwxrwsr-x. 26 1000420000 1000420000  4096 May 13 13:31 prometheus-db
sh-4.2$ ls -l old-claim/prometheus-db/
total 116
drwxrwsr-x. 3 1000420000 1000420000  4096 Apr 29 23:00 01HWP1CWAXQ9WVVEB2A7FSHX0T
drwxrwsr-x. 3 1000420000 1000420000  4096 Apr 30 17:00 01HWQZ6N92JTH2C9HBDSM3NFZ4
....
...
sh-4.2$ rsync -avxHAX --progress /old-claim/* /new-claim
....
....
prometheus-db/wal/checkpoint.00004293/00000000
     24,903,680 100%   42.64MB/s    0:00:00 (xfr#115, to-chk=0/164)

sent 13,341,548,615 bytes  received 2,676 bytes  43,106,789.31 bytes/sec
total size is 13,338,281,889  speedup is 1.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1179) [sender=3.1.2]
sh-4.2$

13. Delete the sleep deployment and PVC of temporaray new-claim

$ oc delete deploy/sleep
$ oc delete pvc new-claim

14. Scale prometheus-k8s pod statefulset back

$ oc scale --replicas=2 sts/prometheus-k8s

wait till pod is ready

$ oc get pod | grep k8s 
prometheus-k8s-0      6/6     Running   0          2m40s
prometheus-k8s-1      6/6     Running   0          2m40s

Enable Back Cluster Monitoring

Create the enable-monitoring.yaml file

$ cat <<EOF > enable-monitoring.yaml
 
- op: remove
  path: /spec/overrides
  value:
  - kind: Deployment
    group: apps
    name: cluster-monitoring-operator
    namespace: openshift-monitoring
    unmanaged: true
  - kind: Deployment
    group: apps
    name: prometheus-operator
    namespace: openshift-monitoring
    unmanaged: true
EOF

2. Patch CVO

$ oc patch clusterversion version --type json -p "$(cat enable-monitoring.yaml)"

Wait till all deployment are in READY condition.

$ oc get deploy
NAME                          READY   UP-TO-DATE   AVAILABLE   AGE
cluster-monitoring-operator   1/1     1            1           213d
grafana                       1/1     1            1           213d
kube-state-metrics            1/1     1            1           213d
openshift-state-metrics       1/1     1            1           213d
prometheus-adapter            2/2     2            2           213d
prometheus-operator           1/1     1            1           213d
telemeter-client              1/1     1            1           213d
thanos-querier                2/2     2            2           213d

Verify Prometheus Data

Connect to the current pod and see the data (we see all data already back to the current pod)

That’s the journey, hopefully this will be helpful. Thanks a million for reading my journey :)

References:

https://access.redhat.com/documentation/en-us/openshift_container_platform/4.10/html/storage/expanding-persistent-volumes

Extend Physical Volume Claim (PVC) Size Without Data Loss in OpenShift 4

Written by Ari Sukarno

No responses yet