Our team at Scality has been working for years on QuadIron, an open source library which implements Fast Fourier Transform (FFT) based erasure codes in binary, prime, and extension fields, and a set of optimizations to make them practical for a very large number of symbols. These FFT based erasure codes could be useful for applications like decentralized storage over the internet. We’ll be presenting QuadIron at SNIA Storage Developer Conference in Santa Clara (CA) on Tuesday Sep 25.
As drive density continues to be driven higher in keeping with Moore’s Law, the price of storage continues to fall. This makes extra data copies cheaper, data can be stored reliably on relatively unreliable Internet servers by making extra copies. For example, generating and spreading hundreds of fragments from a file makes it possible to reconstruct the data while having only a fraction of the total data available. The QuadIron library provides a framework to generate such erasure codes efficiently.
Erasure codes are a form of error correction code that use bit erasures transforming a message into longer messages made of pieces such that the original message can be recovered from a subset of those pieces.
A C(n,k) erasure code is defined by n=k+m, k being the number of data fragments, m being the number of desired erasure fragments. In an application it is required to transmit the n fragments. A Maximum Distance Separable (MDS) code guarantees that any k fragments can be used to decode a file. Erasure codes can be either systematic or non-systematic. Systematic codes generate n-k erasure fragments and therefore maintain k data fragments. Non-systematic codes generate n erasure fragments. In the case of systematic codes, we try to retrieve primarily the k data fragments if possible because there is nothing to decode. A decoding is necessary only if one or more data fragments are missing. In the case of non-systematic codes, we need to decode k fragments. Erasure codes can also be compared by their sensitivity to the rate r=k/n, which may or may not impact the encoding and decoding speed. Another comparison criterion is the support of adaptive rate: does the erasure code allows to change k and m dynamically, without having to regenerate the whole set of erasure fragments. Another critical property is called the ‘confidentiality’ which is determined if an attacker can partially decode the data if he obtains less than k fragments. Finally, we can also compare erasure code according to their repair bandwidth, i.e. the number of fragments required to repair a fragment. To sum up, here is a list of codes properties that are of interest for us:
- MDS/non-MDS
- Systematic/Non-systematic
- Encoding/Decoding speed according to various n
- Encoding/Decoding speed predictivity and stability acc/ to n
- Rate sensitivity
- Adaptive vs non-adaptive rate
- Confidentiality
- Repair bandwidth
Reed-Solomon (RS) codes are MDS codes constructed from Vandermonde or Cauchy matrices that support both systematic and adaptive rates properties. The RS encoding process is traditionally performed in ways that lead to a high complexity. The topic of optimizing those codes has been widely discussed but mostly around hardware optimizations.
Low-Density-Parity-Check (LDPC) codes are also an important class of erasure codes and are constructed over sparse parity-check matrices. Although initially used in networking applications, some researchers recently showed that it is possible to use them in distributed storage scenarios. Those codes, which even though require to store n=k+m fragments (like MDS codes), need to retrieve k*f fragments to recover the data (instead of only k for MDS codes), f being called the overhead or the inefficiency. In general f oscillates between 1.10 and 1.30 for n <100, and when n > 1000, f is approaching 1.0. Those codes are more sensible to network latency because of the extra fragments, due to the overhead, that they need to retrieve, so in cases where latency can be important RS codes seems more interesting than LDPC codes. More recently hybrid-LDPC schemes have reduced the overhead to k+f with a very small f. Also it is possible to design LDPC codes which beat RS codes when taking into account the repair bandwidth, because RS codes always need to retrieve k fragments to be able to repair the data, while it is possible to design LDPC codes that require less than k fragments for the repair process. However:
- LDPC are not MDS: it is always possible to find a pattern (e.g. stopping sets) that cannot decode (e.g. having only k fragments out of n).
- You can always find/design an LDPC code optimized for few properties (i.e. tailored for a specific use case) that beats other codes on those few properties, but there is no silver bullet: it will be sub-optimal for the other properties (its a trade-off, e.g. good for large n and with an optimal repair bandwidth, but not good for small n and cannot support adaptive rate): these cannot be used in a generic library.
- Designing a good LDPC code is some kind of black art that requires a lot of fine tuning and experimentation. Ultimately an LDPC code optimal for all the interesting properties for a given use case could exist but would be very complex and/or would only be available in a commercial library.
Recently some other types of codes, called Locally-Repairable-Codes (LRC) have tackled the repair bandwidth issue of the RS codes. They combine multiple layers of RS: the local codes and the global codes. However those codes are not MDS and they require an higher storage overhead than MDS codes.
Fast Fourier transform (FFT) based RS codes remain relatively simple, and can be used to perform encoding on finite fields with clearer and lower announced complexities therefore having a good set of desirable properties. We focus our research on optimizing their encode and decode speed.
Since chosen finite fields dominate the computational complexities of FFT operations, we investigate two types of finite field: prime finite fields and binary extension finite fields. For each type, there are different approaches to accelerate FFT operations.
These codes offer superior encoding performance compared to matrix-based RS erasure codes for applications requiring more than 24 symbols, and are simpler than LDPC codes, while supporting all the desirable properties: Fast, MDS, systematic or non-systematic, confidential when non-systematic, predictive and rate insensitive. The systematicity is not critical for a decentralized storage application because as k augments the chance of losing a data fragment also augments. The optimization of repair bandwidth are not critical for a decentralized storage archive application as people download the files in their entirety. The most important property for us remains the MDS property as a rock solid contract: being sure than if k fragments are available then the data is recoverable.
All of those properties make us think that the QuadIron library and NTT based erasure codes are suitable for a decentralized storage archival application over the Internet.
The library is open source (BSD 3-clause license), available on GitHub. Ask questions on the forum.
Photo by Ant Rozetsky on Unsplash.