How to reset Zenko queue counters
The objects counters for target clouds can get out of sync when objects are deleted before they are replicated across regions (CRR) or when deleted or old versions of objects are removed before delete operations are executed on the target cloud. If this happens, you need to reset the Zenko queue counters in Redis and below are the instructions to do it.
Step-by-step guide
To clear the counters you first need to make sure the replication queues are empty and then reset the counters in Redis.
1) Do check the queues, set maintenance.enabled = true and maintenance.debug = true for the deployment. This can be done by setting the values by enabling them in the chart and running a “helm upgrade” or by setting them with an upgrade like this:
% helm upgrade my-zenko -f options.yml --set maintenance.enabled=true --set maintenance.debug.enabled=true zenko
This enables some extra pods for performing maintenance activities and debugging. After it’s done deploying make sure the “my-zenko-zenko-debug-kafka-client” pod is running.
2) Then you can enter the pod and check the queues:
% kubectl exec -it [kafka-client pod] bash
# List the avail queues (replacing "my-zenko-zenko-queue" with "[your name]-zenko-queue")
root@[pod-name]/opt/kafka# ./bin/kafka-consumer-groups.sh --bootstrap-server my-zenko-zenko-queue:9092 --list
3) Identify the target cloud replication groups relevant to the counters you want to reset and check the queue lag like this:
root@[pod-name]/opt/kafka# ./bin/kafka-consumer-groups.sh --bootstrap-server my-zenko-zenko-queue:9092--group backbeat-replication-group-example-location --describe
Check the “LAG” column for pending actions, lag should be zero if they are empty. If the queues for all of the targets are quiescent we can move on.
4) Now we can head over to a Redis pod and start resetting counters.
% kubectl exec -it my-zenko-redis-ha-server-0 bash
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli KEYS [location constraint]* |grep pending
# (for example: redis-cli KEYS aws-eu-west-1* |grep pending)
# This will return two keys, one for bytespending and one for opspending
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli KEYS aws-eu-west-1* |grep pending
aws-eu-west-1:bb:crr:opspending
aws-eu-west-1:bb:crr:bytespending
# Set the counters to 0
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli SET aws-eu-west-1:bb:crr:opspending 0
OK
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli SET aws-eu-west-1:bb:crr:bytespending 0
OK
Do this for each target location that you wish to clear.
Failed Object Counters
Failed object markers for a location will clear out in 24 hours (if they are not manually or automatically retried). You can force the to clear by setting the “failed” counters to zero. You’ll need to find the keys with “failed” in the text and delete them. Something like this:
##
# Grep out the redis keys that house the failed object pointers
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli KEYS aws-eu-west-1* |grep failed
##
# Now delete those keys
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli DEL [key name]
Developing and debugging a highly distributed system can be hard and sharing our learning is a way to help others. For everything else, please use the forum to ask more questions 🙂
Photo by Nick Hillier on Unsplash