For my first week at Scality as the Technical Community Manager, I was lucky enough to join the yearly Open Source Leadership Summit hosted by the Linux Foundation. Around 400 leaders of the open source community met to “drive digital transformation with open source technologies and learn how to collaboratively manage the largest shared technology investment of our time.”
Here are some interesting insights into the three days event in Half Moon Bay.
Open Data is as important as open source
The movement that becomes more and more prominent with the advancements in machine learning technologies. The idea is that data should be shared freely, used and redistributed by anyone. It should not be personal data, of course, or contain information about specific individuals. Many impressive tools for machine learning (like TensorFlow) were open sourced which is great but the real value for machine learning is data. Even if many valuable datasets are carefully guarded secrets in big corporations, lots of good is coming from governments, research institutions and corporations like Mozilla. In the spirit of growing interest in machine learning solutions, open data seems to be a younger brother to open source.
Hybrid cloud and multi-cloud are real
I was curious about what these world-class leaders and thinkers have to say on the future of the cloud and cloud storage. The most powerful comment I heard was from Sarah Novotny, program manager for Kubernetes Community: “Multi-cloud is something that people want nowadays”. The more I thought about it, the more obvious it became: multi-cloud is absolutely essential to preserve freedom and avoid vendor lock-in, to give an opportunity to the open source community to implement great ideas. Nobody wants to have a single-vendor.
On that note, a surprising discovery was The Permanent Legacy Foundation. They have a small but dedicated team working to provide permanent and accessible data storage to all people, for all time. It is a nonprofit organization and they use storage services on multiple public clouds to lower the cost (as they store your data forever) and stay flexible to archive precious memories for the next generations to come. An interesting use case and a business model that is between a charity (like Internet Archive) and a for-profit company (like ancestry.com).
Open source is made of people
Linux Foundation’s Executive Director announced CommunityBridge, a platform created to empower open source developers, the individuals and organizations who support them to advance sustainability, security, and diversity in open source technology. It’s a big step to support people who want to contribute but don’t know where to start or don’t have an opportunity. CommunityBridge provides grants and access to world-class specialists to help people contribute to open source communities, grow and innovate together. Go check it out.
More food for thoughts from fellow community managers
As a community manager, I want to foster a strong and healthy community for Zenko. The Summit was a goldmine for ideas about what can be done better to sustain a happy, engaged community. Four most important nuggets of wisdom I gathered:
Diversity means strength – we have to make sure that clear routines and nurturing environment are established for our communities to strive.
Transparency and openness is the key – plans and roadmaps are open and every opinion in the community is heard and taken into consideration.
Anyone can contribute – anything new or existing members want to bring to the table is important and appreciated.
The future is in open source communities – the number of people involved as members with diverse expertise from all over the world grows every day, collaborating and creating together, it’s a fascinating place to be in right now.
The location was a treat!
The Ritz Hotel right on the coast in Half Moon Bay. You can check out more beautiful pictures on the Linux Foundation website. That was a great three days at the beginning of my Zenko journey. Now you’ll see me more regularly on this and other channels! Let’s create some exciting things together!
Backbeat, a key Zenko microservice, dispatches work to long-running background tasks. Backbeat uses Apache Kafka, the popular open-source distributed streaming platform, for scalability and high availability. This gives Zenko functionalities like:
Asynchronous multi-site replication
Lifecycle policies
Metadata ingestion (supporting Scality RING today, with other backends coming soon)
As with the rest of the Zenko stack, Backbeat is an open-source project, with code organized to let you use extensions to add features. Using extensions, you can create rules to manipulate objects based on metadata logs. For example, an extension can recognize music files by artist and move objects in buckets named after the artist. Or an extension can automatically move objects to separate buckets, based on data type (zip, jpeg, text, etc.) or on the owner of the object.
All Backbeat interactions go through CloudServer, which means they are not restricted to one backend and you can reuse existing solutions for different backends.
The Backbeat service publishes a stream of bucket and object metadata updates to Kafka. Each extension applies its own filters to the metadata stream, picking only metadata that meets its filter criteria. Each extension has its own Kafka consumers that consume and process metadata entries as defined.
To help you develop new extensions, we’ve added a basic extension called “helloWorld.” This extension filters the metadata stream to select only object key names with the name “helloworld” (case insensitive) and when processing each metadata entry, applies a basic AWS S3 putObjectTagging where the key is “hello” and the value is “world.”
This example extension shows:
How to add your own extension using the existing metadata stream from a Zenko 1.0 deployment
How to add your own filters for your extension
How to add a queue processor to subscribe to and consume from a Kafka topic
There are two kinds of Backbeat extensions: populators and processors. The populator receives all the metadata logs, filters them as needed, and publishes them to Kafka. The processor subscribes to the extension’s Kafka topic, thus receiving these filtered metadata log entries from the populator. The processor then applies any required changes (in our case, adding object tags to all “helloworld” object keys).
Begin by working on the populator side of the extension. Within Backbeat, add all the configs needed to set up a new helloWorld extension, following the examples in this commit. These configurations are placeholders. Zenko will overwrite them with its own values, as you’ll see in later commits.
Every extension must have an index.js file in its extension directory (“helloWorld/” in the present example). This file must contain the extension’s definitions in its name, version, and configValidator fields. The index.js file is the entry point for the main populator process to load the extension.
Add filters for the helloWorld extension by creating a new class that extends the existing architecture defined by the QueuePopulatorExtension class. It is important to add this new filter class to the index.js definition as “queuePopulatorExtension”.
On the processor side of the extension, you need to create service accounts in Zenko to be used as clients to complete specific S3 API calls. In the HelloWorldProcessor class, this._serviceAuth is the credential set we pass from Zenko to Backbeat to help us perform the putObjectTagging S3 operation. For this demo, borrow the existing replication service account credentials.
Create an entry point for the new extensions processor by adding a new script in the package.jsonfile. This part may be a little tricky, but the loadManagementDatabase method helps sync up Backbeat extensions with the latest changes in the Zenko environment, including config changes and service account information updates.
Instantiate the new extension processor class and finish the setup of the class by calling the start method, defined here.
Update the docker-entrypoint.sh file. These variables point to specific fields in the config.json file. For example, “.extensions.helloWorld.topic” points to the config.json value currently defined as “topic”: “backbeat-hello-world”.
These variable names (i.e. EXTENSION_HELLOWORLD_TOPIC) are set when Zenko is upgraded or deployed as a new Kubernetes pod, which updates these config.json values in Backbeat.
Some config environment variables aren’t so apparent to add because we did not add them to our extension configs, but they are necessary for running some of Backbeat’s internal processes. Also, because this demo borrows some replication service accounts, those variables (EXTENSIONS_REPLICATION_SOURCE_AUTH_TYPE, EXTENSIONS_REPLICATION_SOURCE_AUTH_ACCOUNT) must be defined as well.
Where the Kubernetes deployment name is “zenko”. You must update the “backbeat” Docker image with the new extension changes.
With the Helm upgrade, you’ve added a new Backbeat extension! Now whenever you create an object with the key name of “helloworld” (case insensitive), Backbeat automatically adds object tagging, with a “hello” key and a “world” value to the object.
Have any questions or comments? Please let us know on our forum. We would love to hear from you.