IBM API Connect Crash Loop OS CronJob

4 min readJul 22, 2024

In this article, I wanna share one error that related to the CronJob, this error impact to the analytics subsystem (Elasticsearch) indexing.

CronJob is meant for performing regular scheduled actions such as backups, report generation, and so on. One CronJob object is like one line of a crontab (cron table) file on a Unix system. It runs a Job periodically on a given schedule, written in Cron format.

This is the error that we found :

back-off 2m40s restarting failed container=oscron pod=apiconnect-9ecaa918-oscron-28686405-wtvd8_cp4i(599b02d7-5c94-497f-a7ae-5bdb421c55c7)
CrashLoopBackOff indicates that the application within the container is failing to start properly.

We found there is one pod oscron that got CrashLoopBackOff, in first place we have already tried to remove those pods and the related jobs, but the oscron always re-generate again and again.

Then we tried to see the pod logs and we found this.

# oc logs -f apiconnect-9ecaa918-oscron-28688010-vzcp9
apiconnect-9ecaa918-oscron-28688010-vzcp9                         0/1     Error       4 (71s ago)     3m1s   12.123.1.123    csworker-2.dev.ocp.bankabc.co.id   <none>           <none>

Fetching current index
Making request to:  https://apiconnect-9ecaa918-storage:9200/apic-api-w
file:///app/summary-management.js:175
  const indexWithWriteAlias = filteredIndices.find((obj) => obj.aliases[OS_WRITE_INDEX].is_write_index);
                                                                                        ^
TypeError: Cannot read properties of undefined (reading 'is_write_index')
    at file:///app/summary-management.js:175:89
    at Array.find (<anonymous>)
    at getCurrentWriteIndex (file:///app/summary-management.js:175:47)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async processTransformJobs (file:///app/summary-management.js:428:24)
    at async main (file:///app/summary-management.js:512:3)
Node.js v18.19.0

From that log indicate the related pod is complaining of unable to read to the current write alias, which we can see from indices output that the write alias has been turned into an index with name apic-api-w below:

health status index              uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   all-content-data   76o234u0SQeM-G3fi006cA   1   1        131            0     55.7kb         55.7kb
yellow open   reports            clozT4pQRMaIF68lZqmVtQ   1   1          0            0       208b           208b
green  open   .plugins-ml-config VG2wrqadTHKUQu8CzdKSUA   1   0          1            0      3.9kb          3.9kb
yellow open   all-content-count  A35Vk6dIQ-OL6RToONBKEA   1   1       1876            0    252.9kb        252.9kb
yellow open   apic-api-w         wFc_3JVvTdWMB_nS0en2iw   1   1       1644            0      4.3mb          4.3mb

Action Plan

In an attempt to recover from the issue, please try the below steps in order:

Step 1: Scale ingestion down to 0 replicas

[root@bastion ~]# oc get sts -n cp4i
NAME                               READY   AGE
apiconnect-9ecaa918-ingestion      1/1     132d
apiconnect-9ecaa918-storage        1/1     132d
apiconnect-cd20b410-cd20b410-db    1/1     132d
apiconnect-cd20b410-cd20b410-www   1/1     132d
apiconnect-cd20b410-nginx          1/1     132d
apiconnect-cefeb06f-natscluster    1/1     132d
apiconnect-development-gw          1/1     132d
[root@bastion ~]#

kubectl scale sts <ingestion-sts-name> -n cp4i --replicas=0
kubectl scale sts apiconnect-9ecaa918-ingestion -n cp4i --replicas=0

Wait until 0/0

[root@bastion ~]# oc get sts -n cp4i | grep ingestion
NAME                               READY   AGE
apiconnect-9ecaa918-ingestion      0/0     132d

Step 2: Delete index apic-api-w

Warning: You are going to lose all of the analytics data held under the wrong index i.e. apic-api-w

[root@bastion ~]# oc get po -n cp4i | grep storage
apiconnect-9ecaa918-storage-0                                     1/1     Running     0               23d

[root@bastion ~]# kubectl exec -it apiconnect-9ecaa918-storage-0 -n cp4i -- bash
bash-4.4$ export CURL_CMD="curl -sk --key /etc/velox/certs/client/tls.key --cert /etc/velox/certs/client/tls.crt https://localhost:9200"
bash-4.4$ $CURL_CMD/apic-api-w -X DELETE
{"acknowledged":true}
bash-4.4$ exit
exit

Step 3: Re-run osinit

[root@bastion ~]# kubectl get jobs -n cp4i | grep osinit
apiconnect-9ecaa918-osinit                                    1/1           38s        132d

[root@bastion ~]# kubectl delete job apiconnect-9ecaa918-osinit -n cp4i
job.batch "apiconnect-9ecaa918-osinit" deleted

Step 4: Scale ingestion up to original replica count

[root@bastion ~]# kubectl get sts -n cp4i
NAME                               READY   AGE
apiconnect-9ecaa918-ingestion      0/0     132d
apiconnect-9ecaa918-storage        1/1     132d
apiconnect-cd20b410-cd20b410-db    1/1     132d
apiconnect-cd20b410-cd20b410-www   1/1     132d
apiconnect-cd20b410-nginx          1/1     132d
apiconnect-cefeb06f-natscluster    1/1     132d
apiconnect-development-gw          1/1     132d

[root@bastion ~]# kubectl scale sts apiconnect-9ecaa918-ingestion -n cp4i --replicas=1
statefulset.apps/apiconnect-9ecaa918-ingestion scaled

After this wait for 15 — 18 mins and then check the status of your analytics.

The analytics already running properly but with empty data, because on the previous step we delete the wrong index, at this point we lose all of the data that held under that index.

If the oscron pod is not in completed state, then try deleting that oscron job and wait for 15 — 18 mins.

[root@bastion ~]# oc get jobs
NAME                                                          COMPLETIONS   DURATION   AGE
apiconnect-36e322ff-configurator                              1/1           10m        132d
apiconnect-9ecaa918-oscron-28693515                           0/1           78m        78m
apiconnect-9ecaa918-oscron-28693530                           0/1           63m        63m
apiconnect-9ecaa918-oscron-28693545                           0/1           48m        48m
apiconnect-9ecaa918-oscron-28693560                           0/1           33m        33m
apiconnect-9ecaa918-oscron-28693575                           0/1           18m        18m
apiconnect-9ecaa918-oscron-28693590                           1/1           23s        3m50s
apiconnect-9ecaa918-osinit                                    1/1           86s        6m3s
apiconnect-cefeb06f-analytics-push-28565415                   0/1           89d        89d
apiconnect-cefeb06f-analytics-push-28693575                   1/1           107s       18m
apiconnect-cefeb06f-up-apim-data-populate-0-to-605-f767074c   1/1           2m49s      132d
apiconnect-cefeb06f-up-apim-schema-0-to-605-f767074c          1/1           2m10s      132d
apiconnect-cefeb06f-up-lur-data-populate-0-to-103-f767074c    1/1           103s       132d
apiconnect-cefeb06f-up-lur-schema-0-to-103-f767074c           1/1           21s        132d