Ambedded

Google+
|  My Cart
NEWS
Home  >  News  >   Optimal Ceph Erasure Code Repair Bandwidth and Disk I/O by Clay Code

News

Optimal Ceph Erasure Code Repair Bandwidth and Disk I/O by Clay Code

 

Distributed Software-Defined Storage such as Ceph and Swift use replication and erasure coding techniques to provide high availability despite multiple disk failure. Multiple replications technique has the advantage of simplicity. It re-heal the data by re-copy from the replica. However, replication uses at least three times its original size to store the data. This increases the cost of using the replication technique.

In contrast, the erasure code uses much less capacity to offer the same data, or even higher availability compare to replica 3. Erasure coding divides original data to k number of data chunks and generates an extra m number of the coding chunks. Coding chunks are generated by the encoding function for data healing. The example of using k=4 and m=2 erasure coding will consume (4+2)/4= 1.5 times of original data size, and data can survive if there are concurrently up to 2 disks or hosts fail. RS erasure code is far more flexible and space-efficient. For storage applications that require large data sets, using erasure coding can save substantial capital expenditures.

 

The disadvantage of traditional erasure codes such as the popular Reed-Solomon erasure code is that they take a much longer time to re-heal the lost chunks. It is computationally intensive when to create and repair erasure code chunks.

 

Ceph version Nautilus supports a new erasure code plugin, Clay codes, that extract the best of Maximum Distance Separate (MDS) erasure code and modern Minimum Storage Regression (MSR) codes. Clay codes have significant improvement in repair performance. An example code that has 1.25x storage overhead shows to reduce repair network traffic by a factor of 2.9 in comparison with RS code, and similar reductions are obtained for both repair time, and disk read.[1] 

 

Ambedded has supported the Clay erasure code in his latest version of Unified Virtual Stor (UVS) Manager to enable users to create Clay erasure code pools. Ambedded also benchmarks the read/write and recovery performance of replica 3, Jerasure, and Clay erasure codes. These tests use a 21 OSD Ceph cluster running on Ambedded Arm microserver Ceph Appliance Mars 400. The erasure code k=4 and m=2. The benchmarks show Clay Code uses recovery time 62% less in comparison with Jerasure code. The 4KB object random read of Clay code IOPS is 17% more than the Jerasure code. The 4KB object random write and 4MB sequential read and write performance is similar to Jerasure code. These improvements by Clay code make the motivation to use erasure code in Ceph storage.

 

 

References

[1] https://www.usenix.org/system/files/conference/fast18/fast18-vajha.pdf

[2] https://ece.umd.edu/news/story/ye-barg-win-ieee-data-storage-best-paper-award