The objects counters for target clouds can get out of sync when objects are deleted before they are replicated across regions (CRR) or when deleted or old versions of objects are removed before delete operations are executed on the target cloud. If this happens, you need to reset the Zenko queue counters in Redis and below are the instructions to do it.
Step-by-step guide
To clear the counters you first need to make sure the replication queues are empty and then reset the counters in Redis.
1) Do check the queues, set maintenance.enabled = true and maintenance.debug = true for the deployment. This can be done by setting the values by enabling them in the chart and running a “helm upgrade” or by setting them with an upgrade like this:
This enables some extra pods for performing maintenance activities and debugging. After it’s done deploying make sure the “my-zenko-zenko-debug-kafka-client” pod is running.
2) Then you can enter the pod and check the queues:
% kubectl exec -it [kafka-client pod] bash
# List the avail queues (replacing "my-zenko-zenko-queue" with "[your name]-zenko-queue")
root@[pod-name]/opt/kafka# ./bin/kafka-consumer-groups.sh --bootstrap-server my-zenko-zenko-queue:9092 --list
3) Identify the target cloud replication groups relevant to the counters you want to reset and check the queue lag like this:
Check the “LAG” column for pending actions, lag should be zero if they are empty. If the queues for all of the targets are quiescent we can move on.
4) Now we can head over to a Redis pod and start resetting counters.
% kubectl exec -it my-zenko-redis-ha-server-0 bash
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli KEYS [location constraint]* |grep pending
# (for example: redis-cli KEYS aws-eu-west-1* |grep pending)
# This will return two keys, one for bytespending and one for opspending
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli KEYS aws-eu-west-1* |grep pending
aws-eu-west-1:bb:crr:opspending
aws-eu-west-1:bb:crr:bytespending
# Set the counters to 0
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli SET aws-eu-west-1:bb:crr:opspending 0
OK
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli SET aws-eu-west-1:bb:crr:bytespending 0
OK
Do this for each target location that you wish to clear.
Failed Object Counters
Failed object markers for a location will clear out in 24 hours (if they are not manually or automatically retried). You can force the to clear by setting the “failed” counters to zero. You’ll need to find the keys with “failed” in the text and delete them. Something like this:
##
# Grep out the redis keys that house the failed object pointers
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli KEYS aws-eu-west-1* |grep failed
##
# Now delete those keys
no-name!@galaxy-z-redis-ha-server-0:/data$ redis-cli DEL [key name]
Developing and debugging a highly distributed system can be hard and sharing our learning is a way to help others. For everything else, please use the forum to ask more questions đ
One of the best ways to improve your programming skills is to get involved with a community, meet people, and find new opportunities is to collaborate with others in open source projects. If itâs your first time creating a pull request it can be quite intimidating. I’m here to tell you to not be afraid of making even a tiny change because it’s likely that your pull request will help make Zenko better.
Feel free to ask
The best idea is to reach out to us first. We can discuss what you want to contribute and check whether someone is working on a similar change already or if you can get started right away. Wherever possible, we want to make sure you have a clear path to make your work easier, faster, relevant. Or if you are not sure what exactly you can do, we would be happy to help you find a way to contribute.
To do that you can create an issue on GitHub or ask your question on the Zenko forum.
Where you can find Zenko
If you visit the Zenko repository you will find that it includes installation resources (helm charts) to deploy the full Zenko stack over an orchestration system. A helm chart is a collection of files that describes a related set of Kubernetes resources.
The actual components of Zenko are spread across two repositories: Backbeat (core engine for asynchronous replication, optimized for queuing metadata updates and dispatching work to long-running tasks in the background) and CloudServer (Node.js implementation of the Amazon S3 protocol on the front-end and backend storage capabilities to multiple clouds, including Azure and Google).
Another great way to help is contributing to Zenko-specs (repository that contains design.mds of upcoming features where you are more than welcome to suggest or comment). Additionally, every repository has a design.md describing the existing feature.
Letâs get down to it
Step 1
After you have chosen a repository to contribute to, go ahead and fork it to your GitHub account. In the forked repository, you have âwriteâ access and can push changes. Eventually, you will contribute back to the original repository using pull requests.
Letâs say you want to add some changes to Backbeat.
Clone the forked repository to your local machine:
$ git clone https://github.com/dashagurova/backbeat.git
$ cd backbeat
Step 2
You will find yourself in the default development branch of some version(development/major.minor). There is no master branch. Want to know why? Learn more about Scalityâs own GitWaterFlow delivery model here.
The next step is to create your own branch where all your work will be done:
$Â git checkout -b type_of_branch/name_your_fix
Step 3
Important:Â âtype_of_branchâ should be one of these prefixes: feature/*, improvement/*, bugfix/*, hotfix/*.
Do your magic! Fix something, improve existing code, add a feature or document one.
Note: Scality utilizes TDD (Test Driven Development) model, so it is highly appreciated if any code submission is associated with related unit tests or changes on the existing tests (more info), depending on the type of code that was submitted. You will find a tests/ folder in the root directory of every repository.
Step 4
While working in your branch, you might end up having many commits. In order to keep things easy to navigate, it is common practice to âsquashâ many small commits down to a few or a single logical changeset before submitting a pull request.
To squash three commits into one, you can do  the following:
$ git rebase -i HEAD~3
Where 3 is the number of commits
In the text editor that comes up, replace the words “pick” with “squash” next to the commits you want to squash into the commit before it.
Save and close the editor, and git will combine the squashed commits with the one before it. Git will then give you the opportunity to change your commit message to describe your fix or feature (in no more than 50 characters).
Step 5
If you’ve already pushed commits to GitHub and then squashed them locally, you will have to force the push to your branch.
$ git push -f origin type_of_branch/myfix
Otherwise just:
$ git push origin type_of_branch/myfix
Important:Â make sure that you push the changes to your type_of_branch/myfix!
Make the pull request
Now youâre ready to create a pull request. You can open a pull request to the upstream repository (original repository) or in your fork. One option is to create it in your fork and search for bugfix/myfix branch. Hit âNew pull requestâ.
After that, you are presented with the page where you can go into the details about your work.
After you click  âCreate pull request,â you are greeted by Bert-E. Bert-E is the gatekeeping and merging bot Scality developed in-house to automate GitWaterFlow. Its purpose is to help developers merge their feature branches on multiple development branches.
Now itâs time to relax and have some tea. Our core developers will review your request and get back to you shortly. If you are willing to contribute code, docs, issues, proposals or just ask a question, come find me on the forum.
Backbeat, a key Zenko microservice, dispatches work to long-running background tasks. Backbeat uses Apache Kafka, the popular open-source distributed streaming platform, for scalability and high availability. This gives Zenko functionalities like:
Asynchronous multi-site replication
Lifecycle policies
Metadata ingestion (supporting Scality RING today, with other backends coming soon)
As with the rest of the Zenko stack, Backbeat is an open-source project, with code organized to let you use extensions to add features. Using extensions, you can create rules to manipulate objects based on metadata logs. For example, an extension can recognize music files by artist and move objects in buckets named after the artist. Or an extension can automatically move objects to separate buckets, based on data type (zip, jpeg, text, etc.) or on the owner of the object.
All Backbeat interactions go through CloudServer, which means they are not restricted to one backend and you can reuse existing solutions for different backends.
The Backbeat service publishes a stream of bucket and object metadata updates to Kafka. Each extension applies its own filters to the metadata stream, picking only metadata that meets its filter criteria. Each extension has its own Kafka consumers that consume and process metadata entries as defined.
To help you develop  new extensions, weâve added a basic extension called âhelloWorld.â This extension filters the metadata stream to select only object key names with the name âhelloworldâ (case insensitive) and when processing each metadata entry, applies a basic AWS S3 putObjectTagging where the key is âhelloâ and the value is âworld.â
This example extension shows:
How to add your own extension using the existing metadata stream from a Zenko 1.0 deployment
How to add your own filters for your extension
How to add a queue processor to subscribe to and consume from a Kafka topic
There are two kinds of Backbeat extensions: populators and processors. The populator receives all the metadata logs, filters them as needed, and publishes them to Kafka. The processor subscribes to the extensionâs Kafka topic, thus receiving these filtered metadata log entries from the populator. The processor then applies any required changes (in our case, adding object tags to all âhelloworldâ object keys).
Begin by working on the populator side of the extension. Within Backbeat, add all the configs needed to set up a new helloWorld extension, following the examples in this commit. These configurations are placeholders. Zenko will overwrite them with its own values, as youâll see in later commits.
Every extension must have an index.js file in its extension directory (âhelloWorld/â in the present example). This file must contain the extensionâs definitions in its name, version, and configValidator fields. The index.js file is the entry point for the main populator process to load the extension.
Add filters for the helloWorld extension by creating a new class that extends the existing architecture defined by the QueuePopulatorExtension class. It is important to add this new filter class to the index.js definition as âqueuePopulatorExtensionâ.
On the processor side of the extension, you need to create service accounts in Zenko to be used as clients to complete specific S3 API calls. In the HelloWorldProcessor class, this._serviceAuth is the credential set we pass from Zenko to Backbeat to help us perform the putObjectTagging S3 operation. For this demo, borrow the existing replication service account credentials.
Create an entry point for the new extensions processor by adding a new script in the package.jsonfile. This part may be a little tricky, but the loadManagementDatabase method helps sync up Backbeat extensions with the latest changes in the Zenko environment, including config changes and service account information updates.
Instantiate the new extension processor class and finish the setup of the class by calling the start method, defined here.
Update the docker-entrypoint.sh file. These variables point to specific fields in the config.json file. For example, â.extensions.helloWorld.topicâ points to the config.json value currently defined as âtopicâ: âbackbeat-hello-worldâ.
These variable names (i.e. EXTENSION_HELLOWORLD_TOPIC) are set when Zenko is upgraded or deployed as a new Kubernetes pod, which updates these config.json values in Backbeat.
Some config environment variables arenât so apparent to add because we did not add them to our extension configs, but they are necessary for running some of Backbeatâs internal processes. Also, because this demo borrows some replication service accounts, those variables (EXTENSIONS_REPLICATION_SOURCE_AUTH_TYPE, EXTENSIONS_REPLICATION_SOURCE_AUTH_ACCOUNT) must be defined as well.
Where the Kubernetes deployment name is âzenkoâ. You must update the âbackbeatâ Docker image with the new extension changes.
With the Helm upgrade, youâve added a new Backbeat extension! Now whenever you create an object with the key name of âhelloworldâ (case insensitive), Backbeat automatically adds object tagging, with a âhelloâ key and a âworldâ value  to the object.
Have any questions or comments? Please let us know on our forum. We would love to hear from you.
Do you have half an hour and an AWS account? If so, you can install Zenko and use Orbit to manage your data. Below is a step-by-step guide with time estimates to get started.
If you are an AWS user with appropriate permissions or policies to create EC2 instances and EKS clusters, you can dive into this tutorial. Otherwise, contact your administrator, who can add permissions (full documentation).
For this tutorial, we use a jumper EC2 instance with Amazon Linux to deploy and manage our Kubernetes cluster. A power user can use their own workstation or laptop to manage the Kubernetes cluster.
Follow this guide to set up your EC2 instance and connect to your new instance using the information here. Once connected to the instance, install applications that will help set up the Kubernetes cluster.
Install Kubectl, a command-line tool for running commands against Kubernetes clusters.
Find the Instance ID to use for registering your instance:
$ kubectl logs $(kubectl get pods --no-headers=true -o \custom-columns=:metadata.name | grep cloudserver-manager) | grep Instance | tail -n 1{"name":"S3","time":1548793280888,"req_id":"a67edf37254381fc4781","level":"info","message":"this deployment's Instance ID is fb3c8811-88c6-468c-a2f4-aebd309707ef","hostname":"zenko-cloudserver-manager-8568c85497-5k5zp","pid":17}
Copy the ID and head to Orbit to paste it in the Settings page. Once the Zenko instance is connected to Orbit youâll be able to attach cloud storage from different providers. If you have any questions or want to show off a faster time than 30 minutes, join us at the Zenko forum.
We want to provide all the tools our customers need for data and storage, but sometimes the best solution is one the customer creates on their own. In this tutorial, available in full on the Zenko forums, our Head of Research Vianney Rancurel demonstrates how to set up a CloudServer instance to perform additional functions from a Python script.
The environment for this instance includes a modified version of CloudServer deployed in Kubernetes (Minikube will also work) with Helm, AWS CLI, Kubeless and Kafka. Kubeless is a serverless framework designed to be deployed on a Kubernetes cluster, which allows users to call functions in other languages through Kafka triggers (full documentation). Weâre taking advantage of this feature to call a Python script that produces two thumbnails of any image that is uploaded to CloudServer.
The modified version of CloudServer will generate Kafka events in a specific topic for each S3 operation. When a user uploads a photo, CloudServer pushes a message to the Kafka topic and the Kafka trigger runs the Python script to create two thumbnail images based on the image uploaded.
This setup allows users to create scripts in popular languages like Python, Ruby and Node.js to configure the best solutions to automate their workflows. Check out the video below to see Kubeless and Kafka triggers in action.
As the media and entertainment industry modernizes, companies are leveraging private and public cloud technology to meet the ever-increasing demands of consumers. Scality Zenko can be integrated with existing public cloud tools, such as Microsoft Azureâs Video Indexer, to help âcloudifyâ media assets.
Azureâs Video Indexer utilizes machine learning and artificial intelligence to automate a number of tasks, including face detection, thumbnail extraction and object identification. When paired with the Zenko Orbit multi-cloud browser, metadata can be automatically created by the Indexer and imported as tags into Zenko Orbit.
Check out the demo of Zenko Orbit and Video Indexer to see them in action. A raw video fileâwith no information on content beyond a filenameâis uploaded with Zenko Orbit, automatically indexed through the Azure tool, and the newly created metadata is fed back into Zenko as tags for the video file. Note that Orbit also supports user-created tags, so more information can be added if Indexer misses something important.
Why is this relevant?
Applications donât need to support multiple APIs to use the best cloud features. Zenko Orbit uses the S3 APIs and seamlessly translates the calls to Azure Blob Storage API.
The metadata catalog is the same, wherever the data is stored. The metadata added by Video Indexer are available even if the files are expired from Azure and replicated to other locations.
Enjoy the demo:
Donât hesitate to reach out on the Zenko Forums with questions.