Rook Ceph OSD getting error creating empty object store in /var/lib/rook/osd215: (22) Invalid argument

I’ve been struggling with ceph-container project and trying to get it running, but getting lots of errors.  I had seen some information on Rook and watched a YouTube video Building a Storage Cluster on Kubernetes at KubeCon, and thought lets try it out.  Rook completely automates the setup and provisioning of a Ceph cluster on Kubernetes.

After doing the installation and running through the steps in the documentation, everything was looking very promising, however one of my OSD won’t come online.  Kept getting CrashLoopBackOff.

I was happy, I was still able to deploy the a test WordPress installation using Rook + Ceph, but still just one node was acting up and the OSD wasn’t coming active.

I went thought a lot of steps, but here’s a simplified version of how the guys in the Rook Slack channel were able to point me in the right direction and I was able to solve me issue.

I’m getting error:

ERROR: error creating empty object store in /var/lib/rook/osd215: (22) Invalid argument

Versions:

CentOS 7.4
Kubernetes 1.9.3
Ceph
Rook

I went through all the step that Rook provided in their Quickstart documentation.

$ git clone git@github.com:rook/rook.git
$ cd rook/cluster/examples/kubernetes
$ kubectl create -f rook-operator.yaml
$ kubectl create -f rook-cluster.yaml

There’s one of the OSDs no matter what I’ve done i still giving an error.  This is the log output

[master] $ kubectl logs -n rook rook-ceph-osd-pxgnw
2018-03-23 19:56:57.420288 I | rook: starting Rook v0.7.0-40.g284c1b3 with arguments '/usr/local/bin/rook osd'
2018-03-23 19:56:57.420488 I | rook: flag values: --admin-secret=*****, --ceph-config-override=/etc/rook/config/override.conf, --cluster-id=c8c43f30-2e3a-11e8-97e7-00163ec25389, --cluster-name=rook, --config-dir=/var/lib/rook, --data-device-filter=, --data-devices=, --data-directories=, --force-format=false, --fsid=, --help=false, --location=, --log-level=INFO, --metadata-device=, --mon-endpoints=rook-ceph-mon0=10.96.59.3:6790,rook-ceph-mon3=10.111.59.17:6790,rook-ceph-mon1=10.105.201.7:6790, --mon-secret=*****, --node-name=kublaxnode3.example.com, --osd-database-size=1024, --osd-journal-size=1024, --osd-store=, --osd-wal-size=576, --private-ipv4=192.168.248.193, --public-ipv4=192.168.248.193
2018-03-23 19:56:57.423335 I | cephmon: parsing mon endpoints: rook-ceph-mon0=10.96.59.3:6790,rook-ceph-mon3=10.111.59.17:6790,rook-ceph-mon1=10.105.201.7:6790
2018-03-23 19:56:57.456099 I | cephmon: writing config file /var/lib/rook/rook/rook.config
2018-03-23 19:56:57.456259 I | cephmon: generated admin config in /var/lib/rook/rook
2018-03-23 19:56:57.456284 I | cephosd: discovering hardware
2018-03-23 19:56:57.456304 I | exec: Running command: lsblk --all --noheadings --list --output KNAME
2018-03-23 19:56:57.460809 I | exec: Running command: lsblk /dev/xvda1 --bytes --nodeps --pairs --output SIZE,ROTA,RO,TYPE,PKNAME
2018-03-23 19:56:57.464287 I | exec: Running command: lsblk /dev/xvda2 --bytes --nodeps --pairs --output SIZE,ROTA,RO,TYPE,PKNAME
2018-03-23 19:56:57.468585 I | cephosd: creating and starting the osds
2018-03-23 19:56:57.480379 I | cephosd: configuring osd devices: {"Entries":{}}
2018-03-23 19:56:57.480411 I | cephosd: configuring removed osd devices: {"Entries":{}}
2018-03-23 19:56:57.480446 I | cephosd: configuring osd dirs: map[/var/lib/rook:-1]
2018-03-23 19:56:57.480711 I | exec: Running command: ceph osd create f0da9d8b-6f43-4f2c-9d31-b88fc6f1d18c --cluster=rook --conf=/var/lib/rook/rook/rook.config --keyring=/var/lib/rook/rook/client.admin.keyring --format json --out-file /tmp/773701184
2018-03-23 19:56:58.287056 I | cephosd: successfully created OSD f0da9d8b-6f43-4f2c-9d31-b88fc6f1d18c with ID 215
2018-03-23 19:56:58.287165 I | cephosd: osd.215 appears to be new, cleaning the root dir at /var/lib/rook/osd215
2018-03-23 19:56:58.287541 I | cephmon: writing config file /var/lib/rook/osd215/rook.config
2018-03-23 19:56:58.287712 I | exec: Running command: ceph auth get-or-create osd.215 -o /var/lib/rook/osd215/keyring osd allow * mon allow profile osd --cluster=rook --conf=/var/lib/rook/rook/rook.config --keyring=/var/lib/rook/rook/client.admin.keyring --format plain
2018-03-23 19:56:58.700582 I | cephosd: Initializing OSD 215 file system at /var/lib/rook/osd215...
2018-03-23 19:56:58.700810 I | exec: Running command: ceph mon getmap --cluster=rook --conf=/var/lib/rook/rook/rook.config --keyring=/var/lib/rook/rook/client.admin.keyring --format json --out-file /tmp/867996831
2018-03-23 19:56:59.088460 I | exec: got monmap epoch 5
2018-03-23 19:56:59.088970 I | exec: Running command: ceph-osd --mkfs --id=215 --cluster=rook --conf=/var/lib/rook/osd215/rook.config --osd-data=/var/lib/rook/osd215 --osd-uuid=f0da9d8b-6f43-4f2c-9d31-b88fc6f1d18c --monmap=/var/lib/rook/osd215/tmp/activate.monmap --keyring=/var/lib/rook/osd215/keyring --osd-journal=/var/lib/rook/osd215/journal
2018-03-23 19:56:59.125798 I | mkfs-osd215: 2018-03-23 19:56:59.125528 7fd452c0de00  0 ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process (unknown), pid 82
2018-03-23 19:56:59.241491 I | mkfs-osd215: 2018-03-23 19:56:59.241303 7fd452c0de00  0 filestore(/var/lib/rook/osd215) backend generic (magic 0xef53)
2018-03-23 19:56:59.260569 I | mkfs-osd215: 2018-03-23 19:56:59.260383 7fd452c0de00 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2018-03-23 19:56:59.260618 I | mkfs-osd215: 2018-03-23 19:56:59.260451 7fd452c0de00 -1 journal FileJournal::_open_file : unable to preallocation journal to 1073741824 bytes: (22) Invalid argument
2018-03-23 19:56:59.260629 I | mkfs-osd215: 2018-03-23 19:56:59.260486 7fd452c0de00 -1 filestore(/var/lib/rook/osd215) mkjournal(1066): error creating journal on /var/lib/rook/osd215/journal: (22) Invalid argument
2018-03-23 19:56:59.260646 I | mkfs-osd215: 2018-03-23 19:56:59.260532 7fd452c0de00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (22) Invalid argument
2018-03-23 19:56:59.260786 I | mkfs-osd215: 2018-03-23 19:56:59.260662 7fd452c0de00 -1  ** ERROR: error creating empty object store in /var/lib/rook/osd215: (22) Invalid argument
2018-03-23 19:56:59.265399 I | mkfs-osd215: 2018-03-23 19:56:59.260383 7fd452c0de00 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2018-03-23 19:56:59.265436 I | mkfs-osd215: 2018-03-23 19:56:59.260451 7fd452c0de00 -1 journal FileJournal::_open_file : unable to preallocation journal to 1073741824 bytes: (22) Invalid argument
2018-03-23 19:56:59.265449 I | mkfs-osd215: 2018-03-23 19:56:59.260486 7fd452c0de00 -1 filestore(/var/lib/rook/osd215) mkjournal(1066): error creating journal on /var/lib/rook/osd215/journal: (22) Invalid argument
2018-03-23 19:56:59.265459 I | mkfs-osd215: 2018-03-23 19:56:59.260532 7fd452c0de00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (22) Invalid argument
2018-03-23 19:56:59.265469 I | mkfs-osd215: 2018-03-23 19:56:59.260662 7fd452c0de00 -1  ** ERROR: error creating empty object store in /var/lib/rook/osd215: (22) Invalid argument
2018-03-23 19:56:59.265619 E | cephosd: failed to config osd in path /var/lib/rook. failed to initialize OSD at /var/lib/rook/osd215: failed osd mkfs for OSD ID 215, UUID f0da9d8b-6f43-4f2c-9d31-b88fc6f1d18c, dataDir /var/lib/rook/osd215: Failed to complete mkfs-osd215: exit status 1
2018-03-23 19:56:59.265640 I | cephosd: 0/1 osd dirs succeeded on this node
failed to configure dirs map[/var/lib/rook:215]. failed to initialize OSD at /var/lib/rook/osd215: failed osd mkfs for OSD ID 215, UUID f0da9d8b-6f43-4f2c-9d31-b88fc6f1d18c, dataDir /var/lib/rook/osd215: Failed to complete mkfs-osd215: exit status 1

We tested the issue, logged directly into one of the Kubernetes nodes

$ cd /var/lib/rook
$ fallocate -l 1024 test.txt
fallocate: test.txt: fallocate failed: Operation not supported

Checked fstab says ext4

/dev/xvda1 /                       ext4    defaults        1 1

but mount shows ext3

/dev/xvda1 on / type ext3 (rw,relatime,data=ordered)

That was the problem!

Looks like my VPS host was running an older version of SolusVM on a CentOS 5.x server and it didn’t have ext4 file system installed.  So even though it says ext4 it was falling back to ext3 at the time the VPS node booted and mounted the root file system.

After fixing the disk so it was recognized as ext4 like it was needed by Ceph, then everything started working better.

You can see each of the OSDs are working and my deployment of WordPress + MySQL properly acquired a PVC.

NAME                             READY     STATUS    RESTARTS   AGE       IP                NODE
rook-ceph-mgr0-cfccfd6b8-v5ll5   1/1	   Running   0          21h       192.168.108.143   kublaxnode4.example.com
rook-ceph-mon0-vdlgx             1/1	   Running   0          21h       192.168.33.7      kublaxnode1.example.com
rook-ceph-mon1-nzvn8             1/1	   Running   0          21h       192.168.59.25     kublaxnode2.example.com
rook-ceph-mon3-2pvbn             1/1	   Running   0          17h       192.168.108.146   kublaxnode4.example.com
rook-ceph-osd-cp9fh              1/1	   Running   0          12m       192.168.249.0     kublaxnode3.example.com
rook-ceph-osd-hjqn8              1/1	   Running   0          21h       192.168.33.5      kublaxnode1.example.com
rook-ceph-osd-mvl5w              1/1	   Running   0          21h       192.168.59.26     kublaxnode2.example.com
rook-ceph-osd-r22kl              1/1	   Running   1          21h       192.168.108.144   kublaxnode4.example.com

NAME                             READY     STATUS    RESTARTS   AGE       IP                NODE
rook-agent-6f9dl                 1/1	   Running   0          21h       10.96.132.12     kublaxnode4.example.com
rook-agent-8ts5c                 1/1	   Running   0          21h       10.81.236.202    kublaxnode2.example.com
rook-agent-kj949                 1/1	   Running   0          21h       10.96.132.11     kublaxnode1.example.com
rook-agent-qf24k                 1/1	   Running   1          12m       10.230.111.10    kublaxnode3.example.com
rook-operator-77cf655476-77s8b   1/1	   Running   0          17h       192.168.108.145   kublaxnode4.example.com

NAME                                        READY     STATUS             RESTARTS   AGE       IP                NODE
echoheaders-xxllz                           1/1       Running            0          21h       192.168.59.24     kublaxnode2.example.com
nginx-ingress-controller-ldlfn              1/1       Running            0          21h       192.168.108.142   kublaxnode4.example.com
nginx-ingress-controller-lqqdv              1/1       Running            0          21h       192.168.59.22     kublaxnode2.example.com
wordpress-55cbcdd99b-2x4mf                  1/1       Running            0          14h       192.168.33.13     kublaxnode1.example.com
wordpress-mysql-557ffc4f69-6wkvv            1/1       Running            0          14h       192.168.33.8      kublaxnode1.example.com

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                    STORAGECLASS   REASON    AGE
pvc-61176903-2e6f-11e8-97e7-00163ec25389   1Gi        RWO            Delete           Bound     default/mysql-pv-claim   rook-block               14h
pvc-6f0b39c7-2e6f-11e8-97e7-00163ec25389   1Gi        RWO            Delete           Bound     default/wp-pv-claim	 rook-block               14h

 

Hope that helps you avoid having the same issue as me.