Rook Ceph OSD getting error creating empty object store in /var/lib/rook/osd215: (22) Invalid argument
I’ve been struggling with ceph-container project and trying to get it running, but getting lots of errors. I had seen some information on Rook and watched a YouTube video Building a Storage Cluster on Kubernetes at KubeCon, and thought lets try it out. Rook completely automates the setup and provisioning of a Ceph cluster on Kubernetes.
After doing the installation and running through the steps in the documentation, everything was looking very promising, however one of my OSD won’t come online. Kept getting CrashLoopBackOff.
I was happy, I was still able to deploy the a test WordPress installation using Rook + Ceph, but still just one node was acting up and the OSD wasn’t coming active.
I went thought a lot of steps, but here’s a simplified version of how the guys in the Rook Slack channel were able to point me in the right direction and I was able to solve me issue.
I’m getting error:
ERROR: error creating empty object store in /var/lib/rook/osd215: (22) Invalid argument
Versions:
CentOS 7.4
Kubernetes 1.9.3
Ceph
Rook
I went through all the step that Rook provided in their Quickstart documentation.
$ git clone git@github.com:rook/rook.git $ cd rook/cluster/examples/kubernetes $ kubectl create -f rook-operator.yaml $ kubectl create -f rook-cluster.yaml
There’s one of the OSDs no matter what I’ve done i still giving an error. This is the log output
[master] $ kubectl logs -n rook rook-ceph-osd-pxgnw 2018-03-23 19:56:57.420288 I | rook: starting Rook v0.7.0-40.g284c1b3 with arguments '/usr/local/bin/rook osd' 2018-03-23 19:56:57.420488 I | rook: flag values: --admin-secret=*****, --ceph-config-override=/etc/rook/config/override.conf, --cluster-id=c8c43f30-2e3a-11e8-97e7-00163ec25389, --cluster-name=rook, --config-dir=/var/lib/rook, --data-device-filter=, --data-devices=, --data-directories=, --force-format=false, --fsid=, --help=false, --location=, --log-level=INFO, --metadata-device=, --mon-endpoints=rook-ceph-mon0=10.96.59.3:6790,rook-ceph-mon3=10.111.59.17:6790,rook-ceph-mon1=10.105.201.7:6790, --mon-secret=*****, --node-name=kublaxnode3.example.com, --osd-database-size=1024, --osd-journal-size=1024, --osd-store=, --osd-wal-size=576, --private-ipv4=192.168.248.193, --public-ipv4=192.168.248.193 2018-03-23 19:56:57.423335 I | cephmon: parsing mon endpoints: rook-ceph-mon0=10.96.59.3:6790,rook-ceph-mon3=10.111.59.17:6790,rook-ceph-mon1=10.105.201.7:6790 2018-03-23 19:56:57.456099 I | cephmon: writing config file /var/lib/rook/rook/rook.config 2018-03-23 19:56:57.456259 I | cephmon: generated admin config in /var/lib/rook/rook 2018-03-23 19:56:57.456284 I | cephosd: discovering hardware 2018-03-23 19:56:57.456304 I | exec: Running command: lsblk --all --noheadings --list --output KNAME 2018-03-23 19:56:57.460809 I | exec: Running command: lsblk /dev/xvda1 --bytes --nodeps --pairs --output SIZE,ROTA,RO,TYPE,PKNAME 2018-03-23 19:56:57.464287 I | exec: Running command: lsblk /dev/xvda2 --bytes --nodeps --pairs --output SIZE,ROTA,RO,TYPE,PKNAME 2018-03-23 19:56:57.468585 I | cephosd: creating and starting the osds 2018-03-23 19:56:57.480379 I | cephosd: configuring osd devices: {"Entries":{}} 2018-03-23 19:56:57.480411 I | cephosd: configuring removed osd devices: {"Entries":{}} 2018-03-23 19:56:57.480446 I | cephosd: configuring osd dirs: map[/var/lib/rook:-1] 2018-03-23 19:56:57.480711 I | exec: Running command: ceph osd create f0da9d8b-6f43-4f2c-9d31-b88fc6f1d18c --cluster=rook --conf=/var/lib/rook/rook/rook.config --keyring=/var/lib/rook/rook/client.admin.keyring --format json --out-file /tmp/773701184 2018-03-23 19:56:58.287056 I | cephosd: successfully created OSD f0da9d8b-6f43-4f2c-9d31-b88fc6f1d18c with ID 215 2018-03-23 19:56:58.287165 I | cephosd: osd.215 appears to be new, cleaning the root dir at /var/lib/rook/osd215 2018-03-23 19:56:58.287541 I | cephmon: writing config file /var/lib/rook/osd215/rook.config 2018-03-23 19:56:58.287712 I | exec: Running command: ceph auth get-or-create osd.215 -o /var/lib/rook/osd215/keyring osd allow * mon allow profile osd --cluster=rook --conf=/var/lib/rook/rook/rook.config --keyring=/var/lib/rook/rook/client.admin.keyring --format plain 2018-03-23 19:56:58.700582 I | cephosd: Initializing OSD 215 file system at /var/lib/rook/osd215... 2018-03-23 19:56:58.700810 I | exec: Running command: ceph mon getmap --cluster=rook --conf=/var/lib/rook/rook/rook.config --keyring=/var/lib/rook/rook/client.admin.keyring --format json --out-file /tmp/867996831 2018-03-23 19:56:59.088460 I | exec: got monmap epoch 5 2018-03-23 19:56:59.088970 I | exec: Running command: ceph-osd --mkfs --id=215 --cluster=rook --conf=/var/lib/rook/osd215/rook.config --osd-data=/var/lib/rook/osd215 --osd-uuid=f0da9d8b-6f43-4f2c-9d31-b88fc6f1d18c --monmap=/var/lib/rook/osd215/tmp/activate.monmap --keyring=/var/lib/rook/osd215/keyring --osd-journal=/var/lib/rook/osd215/journal 2018-03-23 19:56:59.125798 I | mkfs-osd215: 2018-03-23 19:56:59.125528 7fd452c0de00 0 ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process (unknown), pid 82 2018-03-23 19:56:59.241491 I | mkfs-osd215: 2018-03-23 19:56:59.241303 7fd452c0de00 0 filestore(/var/lib/rook/osd215) backend generic (magic 0xef53) 2018-03-23 19:56:59.260569 I | mkfs-osd215: 2018-03-23 19:56:59.260383 7fd452c0de00 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2018-03-23 19:56:59.260618 I | mkfs-osd215: 2018-03-23 19:56:59.260451 7fd452c0de00 -1 journal FileJournal::_open_file : unable to preallocation journal to 1073741824 bytes: (22) Invalid argument 2018-03-23 19:56:59.260629 I | mkfs-osd215: 2018-03-23 19:56:59.260486 7fd452c0de00 -1 filestore(/var/lib/rook/osd215) mkjournal(1066): error creating journal on /var/lib/rook/osd215/journal: (22) Invalid argument 2018-03-23 19:56:59.260646 I | mkfs-osd215: 2018-03-23 19:56:59.260532 7fd452c0de00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (22) Invalid argument 2018-03-23 19:56:59.260786 I | mkfs-osd215: 2018-03-23 19:56:59.260662 7fd452c0de00 -1 ** ERROR: error creating empty object store in /var/lib/rook/osd215: (22) Invalid argument 2018-03-23 19:56:59.265399 I | mkfs-osd215: 2018-03-23 19:56:59.260383 7fd452c0de00 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2018-03-23 19:56:59.265436 I | mkfs-osd215: 2018-03-23 19:56:59.260451 7fd452c0de00 -1 journal FileJournal::_open_file : unable to preallocation journal to 1073741824 bytes: (22) Invalid argument 2018-03-23 19:56:59.265449 I | mkfs-osd215: 2018-03-23 19:56:59.260486 7fd452c0de00 -1 filestore(/var/lib/rook/osd215) mkjournal(1066): error creating journal on /var/lib/rook/osd215/journal: (22) Invalid argument 2018-03-23 19:56:59.265459 I | mkfs-osd215: 2018-03-23 19:56:59.260532 7fd452c0de00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (22) Invalid argument 2018-03-23 19:56:59.265469 I | mkfs-osd215: 2018-03-23 19:56:59.260662 7fd452c0de00 -1 ** ERROR: error creating empty object store in /var/lib/rook/osd215: (22) Invalid argument 2018-03-23 19:56:59.265619 E | cephosd: failed to config osd in path /var/lib/rook. failed to initialize OSD at /var/lib/rook/osd215: failed osd mkfs for OSD ID 215, UUID f0da9d8b-6f43-4f2c-9d31-b88fc6f1d18c, dataDir /var/lib/rook/osd215: Failed to complete mkfs-osd215: exit status 1 2018-03-23 19:56:59.265640 I | cephosd: 0/1 osd dirs succeeded on this node failed to configure dirs map[/var/lib/rook:215]. failed to initialize OSD at /var/lib/rook/osd215: failed osd mkfs for OSD ID 215, UUID f0da9d8b-6f43-4f2c-9d31-b88fc6f1d18c, dataDir /var/lib/rook/osd215: Failed to complete mkfs-osd215: exit status 1
We tested the issue, logged directly into one of the Kubernetes nodes
$ cd /var/lib/rook $ fallocate -l 1024 test.txt fallocate: test.txt: fallocate failed: Operation not supported
Checked fstab says ext4
/dev/xvda1 / ext4 defaults 1 1
but mount shows ext3
/dev/xvda1 on / type ext3 (rw,relatime,data=ordered)
That was the problem!
Looks like my VPS host was running an older version of SolusVM on a CentOS 5.x server and it didn’t have ext4 file system installed. So even though it says ext4 it was falling back to ext3 at the time the VPS node booted and mounted the root file system.
After fixing the disk so it was recognized as ext4 like it was needed by Ceph, then everything started working better.
You can see each of the OSDs are working and my deployment of WordPress + MySQL properly acquired a PVC.
NAME READY STATUS RESTARTS AGE IP NODE rook-ceph-mgr0-cfccfd6b8-v5ll5 1/1 Running 0 21h 192.168.108.143 kublaxnode4.example.com rook-ceph-mon0-vdlgx 1/1 Running 0 21h 192.168.33.7 kublaxnode1.example.com rook-ceph-mon1-nzvn8 1/1 Running 0 21h 192.168.59.25 kublaxnode2.example.com rook-ceph-mon3-2pvbn 1/1 Running 0 17h 192.168.108.146 kublaxnode4.example.com rook-ceph-osd-cp9fh 1/1 Running 0 12m 192.168.249.0 kublaxnode3.example.com rook-ceph-osd-hjqn8 1/1 Running 0 21h 192.168.33.5 kublaxnode1.example.com rook-ceph-osd-mvl5w 1/1 Running 0 21h 192.168.59.26 kublaxnode2.example.com rook-ceph-osd-r22kl 1/1 Running 1 21h 192.168.108.144 kublaxnode4.example.com NAME READY STATUS RESTARTS AGE IP NODE rook-agent-6f9dl 1/1 Running 0 21h 10.96.132.12 kublaxnode4.example.com rook-agent-8ts5c 1/1 Running 0 21h 10.81.236.202 kublaxnode2.example.com rook-agent-kj949 1/1 Running 0 21h 10.96.132.11 kublaxnode1.example.com rook-agent-qf24k 1/1 Running 1 12m 10.230.111.10 kublaxnode3.example.com rook-operator-77cf655476-77s8b 1/1 Running 0 17h 192.168.108.145 kublaxnode4.example.com NAME READY STATUS RESTARTS AGE IP NODE echoheaders-xxllz 1/1 Running 0 21h 192.168.59.24 kublaxnode2.example.com nginx-ingress-controller-ldlfn 1/1 Running 0 21h 192.168.108.142 kublaxnode4.example.com nginx-ingress-controller-lqqdv 1/1 Running 0 21h 192.168.59.22 kublaxnode2.example.com wordpress-55cbcdd99b-2x4mf 1/1 Running 0 14h 192.168.33.13 kublaxnode1.example.com wordpress-mysql-557ffc4f69-6wkvv 1/1 Running 0 14h 192.168.33.8 kublaxnode1.example.com NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-61176903-2e6f-11e8-97e7-00163ec25389 1Gi RWO Delete Bound default/mysql-pv-claim rook-block 14h pvc-6f0b39c7-2e6f-11e8-97e7-00163ec25389 1Gi RWO Delete Bound default/wp-pv-claim rook-block 14h
Hope that helps you avoid having the same issue as me.