How to solved ceph-mgr error “clock skew detected”
The clock skew error message indicates that Ceph Monitors’ clocks are not synchronized. Clock synchronization is important because Ceph Monitors depend on time precision and behave unpredictably if their clocks are not synchronized.
The mon_clock_drift_allowed parameter determines what disparity between the clocks is tolerated. By default, this parameter is set to 0.05 seconds.
Important: Do not change the default value of mon_clock_drift_allowed without previous testing. Changing this value might affect the stability of the Ceph Monitors and the Ceph Storage Cluster in general.
Possible causes of the clock skew error include network problems or problems with chrony Network Time Protocol (NTP) synchronization if that is configured. In addition, time synchronization does not work properly on Ceph Monitors deployed on virtual machines.
How to resolve MON clock skew issue in OCS 4.x
Run from the Bastion server
oc rsh -n openshift-storage $(oc get pods -n openshift-storage -o name -l app=rook-ceph-operator)
From the shell, export the openshift storage configuration
sh-5.1$ export CEPH_ARGS='-c /var/lib/rook/openshift-storage/openshift-storage.config'
And then execute the ceph command.
sh-5.1$ ceph -s
cluster:
id: 496246f1-f423-4366-8543-2ac1fa5bbbf5
health: HEALTH_WARN
clock skew detected on mon.b, mon.c
services:
mon: 3 daemons, quorum a,b,c (age 2d)
mgr: a(active, since 2d)
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 4w), 3 in (since 5M)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 12 pools, 169 pgs
objects: 6.73k objects, 22 GiB
usage: 63 GiB used, 1.4 TiB / 1.5 TiB avail
pgs: 169 active+clean
io:
client: 853 B/s rd, 129 KiB/s wr, 1 op/s rd, 12 op/s wr
The command ceph -s
showing one or more mons are out of time sync.
Resolution
- Manually force chronyc to sync the clocks by running the following…
- Connect to the ODF (or ceph) node that’s reporting one of the issues above and turn off selinux. For ODF, you can either use ‘oc debug node/’ or use ssh (if keys are configured for the core user).
NOTE: Please make sure you turn back on selinux, you DO NOT want to keep this off for an extended period of time
Temporarily disable SELinux
$ setenforce 0
Then run the makestep command, manually force adjust timesync using chronyc
:
$ chronyc -a makestep
$ systemctl stop chronyd; systemctl start chronyd; systemctl enable chronyd
Now you can re-enable SELinux
$ setenforce 1
The Root Cause is because the ODF nodes are unable to sync with the NTP servers.
Reference : https://access.redhat.com/solutions/5244631 (This article add more details steps)