Rook-Ceph crush map has legacy tunables (require firefly, min is hammer)
Rook-Ceph
Rook turns distributed storage systems into self-managing, self-scaling, self-healing storage services using Ceph. It automates the tasks of a storage administrator: deployment, bootstrapping, configuration, provisioning, scaling, upgrading, migration, disaster recovery, monitoring, and resource management.
So I’m working on upgrading Rook-Ceph on a few Kubernetes cluster and running into some issues. This time after upgrading from v1.2.4 to v1.2.7, I’m getting a HEALTH_WARN on our Ceph cluster.
% kubectl -n rook-ceph exec -it rook-ceph-tools-c8dff9fb6-lxvzv -- ceph status cluster: id: 41610384-2020-4655-837c-f7f71ae578e3 health: HEALTH_WARN crush map has legacy tunables (require firefly, min is hammer) 617 daemons have recently crashed services: mon: 3 daemons, quorum v,x,aa (age 9m) mgr: a(active, since 8m) mds: clusterfs:1 {0=clusterfs-a=up:active} 1 up:standby-replay osd: 5 osds: 5 up (since 6m), 5 in (since 10h) data: pools: 3 pools, 300 pgs objects: 211.67k objects, 20 GiB usage: 194 GiB used, 692 GiB / 885 GiB avail pgs: 300 active+clean io: client: 852 B/s rd, 1 op/s rd, 0 op/s wr
So we should be able to correct this by applying the following command to the Ceph cluster directly.
% kubectl -n rook-ceph exec -it rook-ceph-tools-c8dff9fb6-lxvzv -- ceph osd crush tunables optimal adjusted tunables profile to optimal
While we’re here I’m going to also clear all those errors from an earlier issued during the upgrade where all the OSDs were crashing. I’ll explain that one in another post.
% kubectl -n rook-ceph exec -it rook-ceph-tools-c8dff9fb6-lxvzv -- ceph crash archive-all
After running those two commands, we can verify that all the Ceph warnings have cleared.
% kubectl -n rook-ceph exec -it rook-ceph-tools-c8dff9fb6-lxvzv -- ceph status cluster: id: 41610384-2020-4655-837c-f7f71ae578e3 health: HEALTH_OK services: mon: 3 daemons, quorum v,x,aa (age 26m) mgr: a(active, since 25m) mds: clusterfs:1 {0=clusterfs-a=up:active} 1 up:standby-replay osd: 5 osds: 5 up (since 23m), 5 in (since 10h) data: pools: 3 pools, 300 pgs objects: 211.67k objects, 20 GiB usage: 193 GiB used, 692 GiB / 885 GiB avail pgs: 300 active+clean io: client: 769 B/s rd, 1 op/s rd, 0 op/s wr
As you can see all of our warnings are gone now.