Rook-Ceph crush map has legacy tunables (require firefly, min is hammer)

Rook-Ceph

Rook turns distributed storage systems into self-managing, self-scaling, self-healing storage services using Ceph. It automates the tasks of a storage administrator: deployment, bootstrapping, configuration, provisioning, scaling, upgrading, migration, disaster recovery, monitoring, and resource management.

So I’m working on upgrading Rook-Ceph on a few Kubernetes cluster and running into some issues. This time after upgrading from v1.2.4 to v1.2.7, I’m getting a HEALTH_WARN on our Ceph cluster.

% kubectl -n rook-ceph exec -it rook-ceph-tools-c8dff9fb6-lxvzv -- ceph status
  cluster:
    id:     41610384-2020-4655-837c-f7f71ae578e3
    health: HEALTH_WARN
            crush map has legacy tunables (require firefly, min is hammer)
            617 daemons have recently crashed
 
  services:
    mon: 3 daemons, quorum v,x,aa (age 9m)
    mgr: a(active, since 8m)
    mds: clusterfs:1 {0=clusterfs-a=up:active} 1 up:standby-replay
    osd: 5 osds: 5 up (since 6m), 5 in (since 10h)
 
  data:
    pools:   3 pools, 300 pgs
    objects: 211.67k objects, 20 GiB
    usage:   194 GiB used, 692 GiB / 885 GiB avail
    pgs:     300 active+clean
 
  io:
    client:   852 B/s rd, 1 op/s rd, 0 op/s wr

So we should be able to correct this by applying the following command to the Ceph cluster directly.

% kubectl -n rook-ceph exec -it rook-ceph-tools-c8dff9fb6-lxvzv -- ceph osd crush tunables optimal
adjusted tunables profile to optimal

While we’re here I’m going to also clear all those errors from an earlier issued during the upgrade where all the OSDs were crashing. I’ll explain that one in another post.

% kubectl -n rook-ceph exec -it rook-ceph-tools-c8dff9fb6-lxvzv -- ceph crash archive-all

After running those two commands, we can verify that all the Ceph warnings have cleared.

% kubectl -n rook-ceph exec -it rook-ceph-tools-c8dff9fb6-lxvzv -- ceph status           
  cluster:
    id:     41610384-2020-4655-837c-f7f71ae578e3
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum v,x,aa (age 26m)
    mgr: a(active, since 25m)
    mds: clusterfs:1 {0=clusterfs-a=up:active} 1 up:standby-replay
    osd: 5 osds: 5 up (since 23m), 5 in (since 10h)
 
  data:
    pools:   3 pools, 300 pgs
    objects: 211.67k objects, 20 GiB
    usage:   193 GiB used, 692 GiB / 885 GiB avail
    pgs:     300 active+clean
 
  io:
    client:   769 B/s rd, 1 op/s rd, 0 op/s wr

As you can see all of our warnings are gone now.