Rook Ceph Failed to complete rook-ceph-mon0: signal: aborted (core dumped)
I’ve got an installation of Rook + Ceph, running on our Kubernetes self hosted environment and after running for a few days or a week, we end up having a problem where 2 of the 3 mons stop working. They go into a CrashLoopBackOff and I haven’t been able to recover. When that happens a number of pods the relay on rook/ceph also have issues. I’ve had to completely remove our Rook installation and start over fresh, about 3 times now. So we need to determine what the heck is causing the problem.
We’re using versions:
Ceph 12.2.4
Rook v0.7.0-40.g284c1b3
CentOS 7.4
Kubernetes 1.9.6
The error I’m getting is
failed to run mon. failed to start mon: Failed to complete rook-ceph-mon0: signal: aborted (core dumped)
I’ve spent hours trying to determine issue as well as trying to fix. There is documented on the Rook website documentation that is exactly our issue, recover from a failed mon sounds like like our situation. I’ve tried it twice and both times it wouldn’t didn’t help me get out the mess.
I even jumped into the Rook Slack channel and finally found a solution to my issue. Looks like it comes down to the fact that the CephFS is experimental. We had created multiple file systems for groups of pods that needed to share the same data. Sounds like a good way of sharing data, well that is what caused the issue.
Based on the guys in the Slack channel, the filesystem implementation is experimental. I didn’t see that anywhere in the documentation, docs mention about kernel version, but nothing about not using multiple file systems. So based on that understanding, we had started over again, with a clean Rook installation and setup each of the different pods to use a different sub-directory on a single common file system.
You can determine if you’re having a similar issue this is what our pods looked like at the time. The issue always starts with 2 of the mons going out and giving CrashLoopBackOff, once that started other things would be effected, and we couldn’t find a way to recover.
$ kubectl get pods -n rook NAME READY STATUS RESTARTS AGE rook-ceph-mds-myfs-5f74b67c6d-8cbrz 0/1 CrashLoopBackOff 85 9h rook-ceph-mds-myfs-5f74b67c6d-pp9x9 0/1 CrashLoopBackOff 85 9h rook-ceph-mds-fsvolume-65d5985578-5snlm 0/1 Error 86 9h rook-ceph-mds-fsvolume-65d5985578-p65hq 1/1 Running 86 9h rook-ceph-mgr0-cfccfd6b8-4gwhg 1/1 Running 0 9h rook-ceph-mon0-jdgnw 0/1 CrashLoopBackOff 11 32m rook-ceph-mon1-j5vpm 0/1 CrashLoopBackOff 9 22m rook-ceph-mon2-z7pnd 0/1 CrashLoopBackOff 10 30m rook-ceph-osd-5fj7q 1/1 Running 1 1d rook-ceph-osd-kt5zb 1/1 Running 2 1d rook-ceph-osd-nqlp6 1/1 Running 1 1d rook-ceph-osd-wzzjm 1/1 Running 0 1d rook-tools 1/1 Running 0 25m
Also here’s the output of the logs, you’ll see the last line where it’s getting a core dump
$ kubectl -n rook logs rook-ceph-mon0-8khz6 2018-04-17 04:35:17.285811 I | rook: starting Rook v0.7.0-40.g284c1b3 with arguments '/usr/local/bin/rook mon --config-dir=/var/lib/rook --name=rook-ceph-mon0 --port=6790 --fsid=d4f1a1ca-b919-4c5b-89f2-1aed3d913a97'. . . 2018-04-17 04:35:27.536807 I | rook-ceph-mon0: -8> 2018-04-17 04:35:27.510728 7f382bd67700 5 -- 10.107.181.203:6790/0 >> 10.111.191.96:6790/0 conn(0x55d39147a800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1518 cs=1 l=0). rx mon.2 seq 776240659 0x55d3913c4000 global_id (34096) v1 2018-04-17 04:35:27.536836 I | rook-ceph-mon0: -7> 2018-04-17 04:35:27.510857 7f382bd67700 5 -- 10.107.181.203:6790/0 >> 10.111.191.96:6790/0 conn(0x55d39147a800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1518 cs=1 l=0). rx mon.2 seq 776240660 0x55d3913c4200 global_id (34096) v1 2018-04-17 04:35:27.536855 I | rook-ceph-mon0: -6> 2018-04-17 04:35:27.510920 7f382bd67700 5 -- 10.107.181.203:6790/0 >> 10.111.191.96:6790/0 conn(0x55d39147a800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1518 cs=1 l=0). rx mon.2 seq 776240661 0x55d3913c4400 global_id (34096) v1 2018-04-17 04:35:27.536879 I | rook-ceph-mon0: -5> 2018-04-17 04:35:27.510987 7f382bd67700 5 -- 10.107.181.203:6790/0 >> 10.111.191.96:6790/0 conn(0x55d39147a800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1518 cs=1 l=0). rx mon.2 seq 776240662 0x55d3913c4600 global_id (34096) v1 2018-04-17 04:35:27.536905 I | rook-ceph-mon0: -4> 2018-04-17 04:35:27.511058 7f382bd67700 5 -- 10.107.181.203:6790/0 >> 10.111.191.96:6790/0 conn(0x55d39147a800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1518 cs=1 l=0). rx mon.2 seq 776240663 0x55d3913c4800 global_id (34096) v1 2018-04-17 04:35:27.536923 I | rook-ceph-mon0: -3> 2018-04-17 04:35:27.511108 7f382bd67700 5 -- 10.107.181.203:6790/0 >> 10.111.191.96:6790/0 conn(0x55d39147a800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1518 cs=1 l=0). rx mon.2 seq 776240664 0x55d3913c4a00 global_id (34096) v1 2018-04-17 04:35:27.536941 I | rook-ceph-mon0: -2> 2018-04-17 04:35:27.511235 7f382bd67700 5 -- 10.107.181.203:6790/0 >> 10.111.191.96:6790/0 conn(0x55d39147a800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1518 cs=1 l=0). rx mon.2 seq 776240665 0x55d3913c4c00 global_id (34096) v1 2018-04-17 04:35:27.536997 I | rook-ceph-mon0: -1> 2018-04-17 04:35:27.511298 7f382bd67700 5 -- 10.107.181.203:6790/0 >> 10.111.191.96:6790/0 conn(0x55d39147a800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1518 cs=1 l=0). rx mon.2 seq 776240666 0x55d3913c4e00 global_id (34096) v1 2018-04-17 04:35:27.537028 I | rook-ceph-mon0: 0> 2018-04-17 04:35:27.518367 7f382fd6f700 -1 /build/ceph-12.2.4/src/mds/FSMap.cc: In function 'void FSMap::assign_standby_replay(mds_gid_t, fs_cluster_id_t, mds_rank_t)' thread 7f382fd6f700 time 2018-04-17 04:35:27.510001 2018-04-17 04:35:27.537047 I | rook-ceph-mon0: /build/ceph-12.2.4/src/mds/FSMap.cc: 876: FAILED assert(mds_roles.at(standby_gid) == FS_CLUSTER_ID_NONE) 2018-04-17 04:35:27.537070 I | rook-ceph-mon0: 2018-04-17 04:35:27.537096 I | rook-ceph-mon0: ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable) 2018-04-17 04:35:27.537122 I | rook-ceph-mon0: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55d387221672] 2018-04-17 04:35:27.537146 I | rook-ceph-mon0: 2: (FSMap::assign_standby_replay(mds_gid_t, int, int)+0x457) [0x55d3874b9eb7] 2018-04-17 04:35:27.537180 I | rook-ceph-mon0: 3: (MDSMonitor::try_standby_replay(MDSMap::mds_info_t const&, Filesystem const&, MDSMap::mds_info_t const&)+0x222) [0x55d38719bfe2] 2018-04-17 04:35:27.537198 I | rook-ceph-mon0: 4: (MDSMonitor::maybe_promote_standby(std::shared_ptr )+0xc7c) [0x55d3871a004c] 2018-04-17 04:35:27.537226 I | rook-ceph-mon0: 5: (MDSMonitor::tick()+0x8ea) [0x55d3871a6c8a] 2018-04-17 04:35:27.537244 I | rook-ceph-mon0: 6: (MDSMonitor::on_active()+0x28) [0x55d38719bc88] 2018-04-17 04:35:27.537287 I | rook-ceph-mon0: 7: (PaxosService::_active()+0x40a) [0x55d3870fb77a] 2018-04-17 04:35:27.537306 I | rook-ceph-mon0: 8: (Context::complete(int)+0x9) [0x55d386fd1629] 2018-04-17 04:35:27.537325 I | rook-ceph-mon0: 9: (void finish_contexts (CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0x20b) [0x55d386fdb01b] 2018-04-17 04:35:27.537345 I | rook-ceph-mon0: 10: (Paxos::finish_round()+0x188) [0x55d3870f3358] 2018-04-17 04:35:27.537364 I | rook-ceph-mon0: 11: (Paxos::handle_last(boost::intrusive_ptr )+0xf9d) [0x55d3870f486d] 2018-04-17 04:35:27.537382 I | rook-ceph-mon0: 12: (Paxos::dispatch(boost::intrusive_ptr )+0x263) [0x55d3870f51c3] 2018-04-17 04:35:27.537405 I | rook-ceph-mon0: 13: (Monitor::dispatch_op(boost::intrusive_ptr )+0xefe) [0x55d386fc72ce] 2018-04-17 04:35:27.537430 I | rook-ceph-mon0: 14: (Monitor::_ms_dispatch(Message*)+0x6db) [0x55d386fc7e5b] 2018-04-17 04:35:27.537456 I | rook-ceph-mon0: 15: (Monitor::ms_dispatch(Message*)+0x23) [0x55d386ff7d93] 2018-04-17 04:35:27.537480 I | rook-ceph-mon0: 16: (DispatchQueue::entry()+0xf4a) [0x55d38752282a] 2018-04-17 04:35:27.537506 I | rook-ceph-mon0: 17: (DispatchQueue::DispatchThread::entry()+0xd) [0x55d3872d1a8d] 2018-04-17 04:35:27.537530 I | rook-ceph-mon0: 18: (()+0x76ba) [0x7f3837abc6ba] 2018-04-17 04:35:27.537548 I | rook-ceph-mon0: 19: (clone()+0x6d) [0x7f38362e641d] 2018-04-17 04:35:27.537576 I | rook-ceph-mon0: NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. 2018-04-17 04:35:27.537595 I | rook-ceph-mon0: 2018-04-17 04:35:27.537619 I | rook-ceph-mon0: --- logging levels --- 2018-04-17 04:35:27.537644 I | rook-ceph-mon0: 0/ 5 none 2018-04-17 04:35:27.537669 I | rook-ceph-mon0: 0/ 1 lockdep 2018-04-17 04:35:27.537726 I | rook-ceph-mon0: 0/ 1 context 2018-04-17 04:35:27.537746 I | rook-ceph-mon0: 1/ 1 crush 2018-04-17 04:35:27.537769 I | rook-ceph-mon0: 1/ 5 mds 2018-04-17 04:35:27.537793 I | rook-ceph-mon0: 1/ 5 mds_balancer 2018-04-17 04:35:27.537820 I | rook-ceph-mon0: 1/ 5 mds_locker 2018-04-17 04:35:27.537844 I | rook-ceph-mon0: 1/ 5 mds_log 2018-04-17 04:35:27.537869 I | rook-ceph-mon0: 1/ 5 mds_log_expire 2018-04-17 04:35:27.537894 I | rook-ceph-mon0: 1/ 5 mds_migrator 2018-04-17 04:35:27.537919 I | rook-ceph-mon0: 0/ 1 buffer 2018-04-17 04:35:27.537945 I | rook-ceph-mon0: 0/ 1 timer 2018-04-17 04:35:27.537969 I | rook-ceph-mon0: 0/ 1 filer 2018-04-17 04:35:27.537993 I | rook-ceph-mon0: 0/ 1 striper 2018-04-17 04:35:27.538019 I | rook-ceph-mon0: 0/ 1 objecter 2018-04-17 04:35:27.538038 I | rook-ceph-mon0: 0/ 0 rados 2018-04-17 04:35:27.538053 I | rook-ceph-mon0: 0/ 5 rbd 2018-04-17 04:35:27.538071 I | rook-ceph-mon0: 0/ 5 rbd_mirror 2018-04-17 04:35:27.538088 I | rook-ceph-mon0: 0/ 5 rbd_replay 2018-04-17 04:35:27.538105 I | rook-ceph-mon0: 0/ 5 journaler 2018-04-17 04:35:27.538122 I | rook-ceph-mon0: 0/ 5 objectcacher 2018-04-17 04:35:27.538139 I | rook-ceph-mon0: 0/ 5 client 2018-04-17 04:35:27.538157 I | rook-ceph-mon0: 0/ 0 osd 2018-04-17 04:35:27.538176 I | rook-ceph-mon0: 0/ 5 optracker 2018-04-17 04:35:27.538194 I | rook-ceph-mon0: 0/ 5 objclass 2018-04-17 04:35:27.538213 I | rook-ceph-mon0: 0/ 0 filestore 2018-04-17 04:35:27.538231 I | rook-ceph-mon0: 0/ 0 journal 2018-04-17 04:35:27.538248 I | rook-ceph-mon0: 0/ 5 ms 2018-04-17 04:35:27.538272 I | rook-ceph-mon0: 0/ 0 mon 2018-04-17 04:35:27.538290 I | rook-ceph-mon0: 0/10 monc 2018-04-17 04:35:27.538317 I | rook-ceph-mon0: 1/ 5 paxos 2018-04-17 04:35:27.538336 I | rook-ceph-mon0: 0/ 5 tp 2018-04-17 04:35:27.538362 I | rook-ceph-mon0: 1/ 5 auth 2018-04-17 04:35:27.538387 I | rook-ceph-mon0: 1/ 5 crypto 2018-04-17 04:35:27.538412 I | rook-ceph-mon0: 1/ 1 finisher 2018-04-17 04:35:27.538438 I | rook-ceph-mon0: 1/ 1 reserver 2018-04-17 04:35:27.538462 I | rook-ceph-mon0: 1/ 5 heartbeatmap 2018-04-17 04:35:27.538488 I | rook-ceph-mon0: 1/ 5 perfcounter 2018-04-17 04:35:27.538513 I | rook-ceph-mon0: 1/ 5 rgw 2018-04-17 04:35:27.538538 I | rook-ceph-mon0: 1/10 civetweb 2018-04-17 04:35:27.538563 I | rook-ceph-mon0: 1/ 5 javaclient 2018-04-17 04:35:27.538588 I | rook-ceph-mon0: 1/ 5 asok 2018-04-17 04:35:27.538613 I | rook-ceph-mon0: 1/ 1 throttle 2018-04-17 04:35:27.538638 I | rook-ceph-mon0: 0/ 0 refs 2018-04-17 04:35:27.538663 I | rook-ceph-mon0: 1/ 5 xio 2018-04-17 04:35:27.538712 I | rook-ceph-mon0: 1/ 5 compressor 2018-04-17 04:35:27.538733 I | rook-ceph-mon0: 0/ 0 bluestore 2018-04-17 04:35:27.538750 I | rook-ceph-mon0: 1/ 5 bluefs 2018-04-17 04:35:27.538777 I | rook-ceph-mon0: 1/ 3 bdev 2018-04-17 04:35:27.538796 I | rook-ceph-mon0: 1/ 5 kstore 2018-04-17 04:35:27.538825 I | rook-ceph-mon0: 4/ 5 rocksdb 2018-04-17 04:35:27.538843 I | rook-ceph-mon0: 0/ 0 leveldb 2018-04-17 04:35:27.538860 I | rook-ceph-mon0: 4/ 5 memdb 2018-04-17 04:35:27.538877 I | rook-ceph-mon0: 1/ 5 kinetic 2018-04-17 04:35:27.538893 I | rook-ceph-mon0: 1/ 5 fuse 2018-04-17 04:35:27.538910 I | rook-ceph-mon0: 1/ 5 mgr 2018-04-17 04:35:27.538927 I | rook-ceph-mon0: 1/ 5 mgrc 2018-04-17 04:35:27.538944 I | rook-ceph-mon0: 1/ 5 dpdk 2018-04-17 04:35:27.538963 I | rook-ceph-mon0: 1/ 5 eventtrace 2018-04-17 04:35:27.539006 I | rook-ceph-mon0: -2/-2 (syslog threshold) 2018-04-17 04:35:27.539025 I | rook-ceph-mon0: -1/-1 (stderr threshold) 2018-04-17 04:35:27.539042 I | rook-ceph-mon0: max_recent 10000 2018-04-17 04:35:27.539072 I | rook-ceph-mon0: max_new 1000 2018-04-17 04:35:27.539091 I | rook-ceph-mon0: log_file /dev/stdout 2018-04-17 04:35:27.539119 I | rook-ceph-mon0: --- end dump of recent events --- 2018-04-17 04:35:27.571591 I | rook-ceph-mon0: /build/ceph-12.2.4/src/mds/FSMap.cc: In function 'void FSMap::assign_standby_replay(mds_gid_t, fs_cluster_id_t, mds_rank_t)' thread 7f382fd6f700 time 2018-04-17 04:35:27.510001 2018-04-17 04:35:27.571769 I | rook-ceph-mon0: /build/ceph-12.2.4/src/mds/FSMap.cc: 876: FAILED assert(mds_roles.at(standby_gid) == FS_CLUSTER_ID_NONE) 2018-04-17 04:35:27.571812 I | rook-ceph-mon0: ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable) 2018-04-17 04:35:27.571829 I | rook-ceph-mon0: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55d387221672] 2018-04-17 04:35:27.571842 I | rook-ceph-mon0: 2: (FSMap::assign_standby_replay(mds_gid_t, int, int)+0x457) [0x55d3874b9eb7] 2018-04-17 04:35:27.571856 I | rook-ceph-mon0: 3: (MDSMonitor::try_standby_replay(MDSMap::mds_info_t const&, Filesystem const&, MDSMap::mds_info_t const&)+0x222) [0x55d38719bfe2] 2018-04-17 04:35:27.571870 I | rook-ceph-mon0: 4: (MDSMonitor::maybe_promote_standby(std::shared_ptr )+0xc7c) [0x55d3871a004c] 2018-04-17 04:35:27.571883 I | rook-ceph-mon0: 5: (MDSMonitor::tick()+0x8ea) [0x55d3871a6c8a] 2018-04-17 04:35:27.571902 I | rook-ceph-mon0: 6: (MDSMonitor::on_active()+0x28) [0x55d38719bc88] 2018-04-17 04:35:27.571934 I | rook-ceph-mon0: 7: (PaxosService::_active()+0x40a) [0x55d3870fb77a] 2018-04-17 04:35:27.571971 I | rook-ceph-mon0: 8: (Context::complete(int)+0x9) [0x55d386fd1629] 2018-04-17 04:35:27.572011 I | rook-ceph-mon0: 9: (void finish_contexts (CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0x20b) [0x55d386fdb01b] 2018-04-17 04:35:27.572054 I | rook-ceph-mon0: 10: (Paxos::finish_round()+0x188) [0x55d3870f3358] 2018-04-17 04:35:27.572082 I | rook-ceph-mon0: 11: (Paxos::handle_last(boost::intrusive_ptr )+0xf9d) [0x55d3870f486d] 2018-04-17 04:35:27.572097 I | rook-ceph-mon0: 12: (Paxos::dispatch(boost::intrusive_ptr )+0x263) [0x55d3870f51c3] 2018-04-17 04:35:27.572128 I | rook-ceph-mon0: 13: (Monitor::dispatch_op(boost::intrusive_ptr )+0xefe) [0x55d386fc72ce] 2018-04-17 04:35:27.572146 I | rook-ceph-mon0: 14: (Monitor::_ms_dispatch(Message*)+0x6db) [0x55d386fc7e5b] 2018-04-17 04:35:27.572160 I | rook-ceph-mon0: 15: (Monitor::ms_dispatch(Message*)+0x23) [0x55d386ff7d93] 2018-04-17 04:35:27.572187 I | rook-ceph-mon0: 16: (DispatchQueue::entry()+0xf4a) [0x55d38752282a] 2018-04-17 04:35:27.572201 I | rook-ceph-mon0: 17: (DispatchQueue::DispatchThread::entry()+0xd) [0x55d3872d1a8d] 2018-04-17 04:35:27.572223 I | rook-ceph-mon0: 18: (()+0x76ba) [0x7f3837abc6ba] 2018-04-17 04:35:27.572237 I | rook-ceph-mon0: 19: (clone()+0x6d) [0x7f38362e641d] 2018-04-17 04:35:27.572251 I | rook-ceph-mon0: NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. 2018-04-17 04:35:27.572266 I | rook-ceph-mon0: 2018-04-17 04:35:27.518367 7f382fd6f700 -1 /build/ceph-12.2.4/src/mds/FSMap.cc: In function 'void FSMap::assign_standby_replay(mds_gid_t, fs_cluster_id_t, mds_rank_t)' thread 7f382fd6f700 time 2018-04-17 04:35:27.510001 2018-04-17 04:35:27.572279 I | rook-ceph-mon0: /build/ceph-12.2.4/src/mds/FSMap.cc: 876: FAILED assert(mds_roles.at(standby_gid) == FS_CLUSTER_ID_NONE) 2018-04-17 04:35:27.572292 I | rook-ceph-mon0: 2018-04-17 04:35:27.572306 I | rook-ceph-mon0: ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable) 2018-04-17 04:35:27.572334 I | rook-ceph-mon0: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55d387221672] 2018-04-17 04:35:27.572350 I | rook-ceph-mon0: 2: (FSMap::assign_standby_replay(mds_gid_t, int, int)+0x457) [0x55d3874b9eb7] 2018-04-17 04:35:27.572364 I | rook-ceph-mon0: 3: (MDSMonitor::try_standby_replay(MDSMap::mds_info_t const&, Filesystem const&, MDSMap::mds_info_t const&)+0x222) [0x55d38719bfe2] 2018-04-17 04:35:27.572389 I | rook-ceph-mon0: 4: (MDSMonitor::maybe_promote_standby(std::shared_ptr )+0xc7c) [0x55d3871a004c] 2018-04-17 04:35:27.572403 I | rook-ceph-mon0: 5: (MDSMonitor::tick()+0x8ea) [0x55d3871a6c8a] 2018-04-17 04:35:27.572417 I | rook-ceph-mon0: 6: (MDSMonitor::on_active()+0x28) [0x55d38719bc88] 2018-04-17 04:35:27.572430 I | rook-ceph-mon0: 7: (PaxosService::_active()+0x40a) [0x55d3870fb77a] 2018-04-17 04:35:27.572454 I | rook-ceph-mon0: 8: (Context::complete(int)+0x9) [0x55d386fd1629] 2018-04-17 04:35:27.572468 I | rook-ceph-mon0: 9: (void finish_contexts (CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0x20b) [0x55d386fdb01b] 2018-04-17 04:35:27.572481 I | rook-ceph-mon0: 10: (Paxos::finish_round()+0x188) [0x55d3870f3358] 2018-04-17 04:35:27.572504 I | rook-ceph-mon0: 11: (Paxos::handle_last(boost::intrusive_ptr )+0xf9d) [0x55d3870f486d] 2018-04-17 04:35:27.572518 I | rook-ceph-mon0: 12: (Paxos::dispatch(boost::intrusive_ptr )+0x263) [0x55d3870f51c3] 2018-04-17 04:35:27.572531 I | rook-ceph-mon0: 13: (Monitor::dispatch_op(boost::intrusive_ptr )+0xefe) [0x55d386fc72ce] 2018-04-17 04:35:27.572550 I | rook-ceph-mon0: 14: (Monitor::_ms_dispatch(Message*)+0x6db) [0x55d386fc7e5b] 2018-04-17 04:35:27.572564 I | rook-ceph-mon0: 15: (Monitor::ms_dispatch(Message*)+0x23) [0x55d386ff7d93] 2018-04-17 04:35:27.572588 I | rook-ceph-mon0: 16: (DispatchQueue::entry()+0xf4a) [0x55d38752282a] 2018-04-17 04:35:27.572602 I | rook-ceph-mon0: 17: (DispatchQueue::DispatchThread::entry()+0xd) [0x55d3872d1a8d] 2018-04-17 04:35:27.572617 I | rook-ceph-mon0: 18: (()+0x76ba) [0x7f3837abc6ba] 2018-04-17 04:35:27.572638 I | rook-ceph-mon0: 19: (clone()+0x6d) [0x7f38362e641d] 2018-04-17 04:35:27.572654 I | rook-ceph-mon0: NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. 2018-04-17 04:35:27.572667 I | rook-ceph-mon0: 2018-04-17 04:35:27.572711 I | rook-ceph-mon0: 0> 2018-04-17 04:35:27.518367 7f382fd6f700 -1 /build/ceph-12.2.4/src/mds/FSMap.cc: In function 'void FSMap::assign_standby_replay(mds_gid_t, fs_cluster_id_t, mds_rank_t)' thread 7f382fd6f700 time 2018-04-17 04:35:27.510001 2018-04-17 04:35:27.572735 I | rook-ceph-mon0: /build/ceph-12.2.4/src/mds/FSMap.cc: 876: FAILED assert(mds_roles.at(standby_gid) == FS_CLUSTER_ID_NONE) 2018-04-17 04:35:27.572749 I | rook-ceph-mon0: 2018-04-17 04:35:27.572768 I | rook-ceph-mon0: ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable) 2018-04-17 04:35:27.572801 I | rook-ceph-mon0: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55d387221672] 2018-04-17 04:35:27.572829 I | rook-ceph-mon0: 2: (FSMap::assign_standby_replay(mds_gid_t, int, int)+0x457) [0x55d3874b9eb7] 2018-04-17 04:35:27.572844 I | rook-ceph-mon0: 3: (MDSMonitor::try_standby_replay(MDSMap::mds_info_t const&, Filesystem const&, MDSMap::mds_info_t const&)+0x222) [0x55d38719bfe2] 2018-04-17 04:35:27.572857 I | rook-ceph-mon0: 4: (MDSMonitor::maybe_promote_standby(std::shared_ptr )+0xc7c) [0x55d3871a004c] 2018-04-17 04:35:27.572884 I | rook-ceph-mon0: 5: (MDSMonitor::tick()+0x8ea) [0x55d3871a6c8a] 2018-04-17 04:35:27.572898 I | rook-ceph-mon0: 6: (MDSMonitor::on_active()+0x28) [0x55d38719bc88] 2018-04-17 04:35:27.572911 I | rook-ceph-mon0: 7: (PaxosService::_active()+0x40a) [0x55d3870fb77a] 2018-04-17 04:35:27.572934 I | rook-ceph-mon0: 8: (Context::complete(int)+0x9) [0x55d386fd1629] 2018-04-17 04:35:27.572948 I | rook-ceph-mon0: 9: (void finish_contexts (CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0x20b) [0x55d386fdb01b] 2018-04-17 04:35:27.572963 I | rook-ceph-mon0: 10: (Paxos::finish_round()+0x188) [0x55d3870f3358] 2018-04-17 04:35:27.572995 I | rook-ceph-mon0: 11: (Paxos::handle_last(boost::intrusive_ptr )+0xf9d) [0x55d3870f486d] 2018-04-17 04:35:27.573010 I | rook-ceph-mon0: 12: (Paxos::dispatch(boost::intrusive_ptr )+0x263) [0x55d3870f51c3] 2018-04-17 04:35:27.573023 I | rook-ceph-mon0: 13: (Monitor::dispatch_op(boost::intrusive_ptr )+0xefe) [0x55d386fc72ce] 2018-04-17 04:35:27.573044 I | rook-ceph-mon0: 14: (Monitor::_ms_dispatch(Message*)+0x6db) [0x55d386fc7e5b] 2018-04-17 04:35:27.573058 I | rook-ceph-mon0: 15: (Monitor::ms_dispatch(Message*)+0x23) [0x55d386ff7d93] 2018-04-17 04:35:27.573071 I | rook-ceph-mon0: 16: (DispatchQueue::entry()+0xf4a) [0x55d38752282a] 2018-04-17 04:35:27.573085 I | rook-ceph-mon0: 17: (DispatchQueue::DispatchThread::entry()+0xd) [0x55d3872d1a8d] 2018-04-17 04:35:27.573106 I | rook-ceph-mon0: 18: (()+0x76ba) [0x7f3837abc6ba] 2018-04-17 04:35:27.573121 I | rook-ceph-mon0: 19: (clone()+0x6d) [0x7f38362e641d] 2018-04-17 04:35:27.573144 I | rook-ceph-mon0: NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. 2018-04-17 04:35:27.573157 I | rook-ceph-mon0: failed to run mon. failed to start mon: Failed to complete rook-ceph-mon0: signal: aborted (core dumped)
Now we’ve been running for almost a week without any problem.