Modern Erasure Codes for Distributed Storage Systems

Srinivasan Narayanamurthy

The Data Deluge

Cisco, in its recently published Global Cloud Index (2014-2019) forecasts that by the end of 2019 cloud traffic will more than quadruple to 8.6 zettabytes (ZB) and data center traffic will more than triple to 10.4 ZB1 . This includes various digital data continuously being generated by individuals as well as business and government organizations. These data need scalable solutions to store data reliably and securely.

In 2012 Microsoft said that they would soon surpass an Exabyte of data and one can only guess how much more data Google, Facebook, Microsoft and Amazon store in their data centers [6].

Changing Storage Technologies

In traditional RAID-based storage systems, disks are collocated, all data objects are stored in the same set of storage disks, and these disks share an exclusive communication bus within a stand-alone unit. In modern distributed storage systems, a shared interconnect is used across storage nodes, and different objects may be stored across arbitrarily different (possibly intersecting) subsets of storage nodes, and thus there is competition and interference in the usage of the network resources.

- While data centers comprise thousands of compute and storage nodes, individual clusters such as that of Google File System (GFS) [2] are formed out of hundreds to thousands of nodes.

- Hardware failures are very common in such large environments. Facebook, for instance counted an average of 20 node failures per day on a relatively small 3000 node Hadoop cluster [14].

Classical Protection Mechanisms (read, RAID) Fall Short

Disk capacities are increasing faster than disk data transfer rates. This is creating data integrity exposures that demand more robust data resilience schemes than traditional RAID schemes can deliver. Other modern forms of erasure coding are a potential solution to this problem.2

New Dimensions for Optimization

Traditional erasure codes including RAID were designed to optimize on tradeoffs across two dimensions—storage overhead and reliability. In recent years, the coding theory community has focused on designing codes that better suit modern distributed storage system nuances, particularly with respect to repairing lost redundancy efficiently. The focus of such works has been on the following aspects:

1. repair bandwidth, the network bandwidth utilization during a repair, which is typically a scarce resource in a modern distributed storage system. Facebook noted that with 8% of their data RAIDed, repair traffic consumes 10-20% of their average network utilization, and with 50% of their data encoded with RAID, repair traffic would saturate their network [14].
2. repair degree, the number of storage nodes involved in a repair process,
3. I/O, the number of disk accesses at the nodes facilitating a repair, which is currently the biggest contributor to the repair time, and

New Dimensions for Optimization

Traditional erasure codes including RAID were designed to optimize on tradeoffs across two dimensions—storage overhead and reliability. In recent years, the coding theory community has focused on designing codes that better suit modern distributed storage system nuances, particularly with respect to repairing lost redundancy efficiently. The focus of such works has been on the following aspects:

1. repair bandwidth, the network bandwidth utilization during a repair, which is typically a scarce resource in a modern distributed storage system. Facebook noted that with 8% of their data RAIDed, repair traffic consumes 10-20% of their average network utilization, and with 50% of their data encoded with RAID, repair traffic would saturate their network [14].
2. repair degree, the number of storage nodes involved in a repair process,
3. I/O, the number of disk accesses at the nodes facilitating a repair, which is currently the biggest contributor to the repair time, and

Modern Erasure Codes

There are two specific families of erasure codes that are noteworthy:

1. Locally Repairable Codes, that optimize on repair degree, and
2. Regenerating Codes, that optimize on repair bandwidth

Figure 1 shows a timeline of theoretical evolution of erasure codes for storage systems (above the horizontal line) and adoption of these codes in systems (below the horizontal line). The intention of this article is not to go through the details of every erasure code, however to point out that codes have evolved into a whole new dimension. More details about these modern codes can be found in the citations provided below the timeline.

One Size Does Not Fit All!

A storage system designer has a gamut of dimensions to juggle with, such as:
- Type of storage system: general-purpose storage array, geo-distributed storage, secure storage, etc.
- Type of workload: read-intensive analytics workload, archive, object store, latency-sensitive, etc.
- Properties to optimize on: reliability, performance, cost, storage overhead, repair time/availability, I/O, repair bandwidth, etc.

Permutations and combinations of these dimensions define a set of requirements from a code and that is only the first step to designing the right code for a given storage system. Thus, a one-size-fits-all solution using classical protection mechanisms such as RAID does not hold good anymore.

Figure 2 provides a zoo of erasure codes—a self-explanatory picture of researchers in academics and practitioners in industry including startups who represent this space. Note that the work and people above the horizontal line are involved in theory research while the ones below are involved in systems implementations.


Note that acronyms and years mentioned in Figure 1 and Figure 2 refer to individual classes of work that are explained in their respective papers: Xorbas [14], Piggyback [12], Hitchhiker [11], PM (RBT) [13], Double Rep [7], PMDS [1], Pyramid [5], Locality [3], Flat XOR [4], GF Intel SIMD [10], SD Codes [9], STAIR [8], and mLRC [6].

Long Story Short

In summary, storage systems have evolved over the past couple of decades and so did data protection mechanisms. Using age -old erasure codes on modern distributed storage systems does not let these systems to operate efficiently at its optimal performance. This article provided a bouquet of modern erasure codes that are available at a storage systems' designer's disposal. Some of these modern erasure codes are yet to mature for industrial consumption. However, there are several practical constructions that have made themselves available in storage systems, while the rest of them are expected to cross the mark in the near future.

About Author:

Author Srinivasan Narayanamurthy working in Advanced Technology Group, NetApp








}