50 research outputs found

    Coding for the Clouds: Coding Techniques for Enabling Security, Locality, and Availability in Distributed Storage Systems

    Get PDF
    Cloud systems have become the backbone of many applications such as multimedia streaming, e-commerce, and cluster computing. At the foundation of any cloud architecture lies a large-scale, distributed, data storage system. To accommodate the massive amount of data being stored on the cloud, these distributed storage systems (DSS) have been scaled to contain hundreds to thousands of nodes that are connected through a networking infrastructure. Such data-centers are usually built out of commodity components, which make failures the norm rather than the exception. In order to combat node failures, data is typically stored in a redundant fashion. Due to the exponential data growth rate, many DSS are beginning to resort to error control coding over conventional replication methods, as coding offers high storage space efficiency. This paradigm shift from replication to coding, along with the need to guarantee reliability, efficiency, and security in DSS, has created a new set of challenges and opportunities, opening up a new area of research. This thesis addresses several of these challenges and opportunities by broadly making the following contributions. (i) We design practically amenable, low-complexity coding schemes that guarantee security of cloud systems, ensure quick recovery from failures, and provide high availability for retrieving partial information; and (ii) We analyze fundamental performance limits and optimal trade-offs between the key performance metrics of these coding schemes. More specifically, we first consider the problem of achieving information-theoretic security in DSS against an eavesdropper that can observe a limited number of nodes. We present a framework that enables design of secure repair-efficient codes through a joint construction of inner and outer codes. Then, we consider a practically appealing notion of weakly secure coding, and construct coset codes that can weakly secure a wide class of regenerating codes that reduce the amount of data downloaded during node repair. Second, we consider the problem of meeting repair locality constraints, which specify the number of nodes participating in the repair process. We propose a notion of unequal locality, which enables different locality values for different nodes, ensuring quick recovery for nodes storing important data. We establish tight upper bounds on the minimum distance of linear codes with unequal locality, and present optimal code constructions. Next, we extend the notion of locality from the Hamming metric to the rank and subspace metrics, with the goal of designing codes for efficient data recovery from special types of correlated failures in DSS.We construct a family of locally recoverable rank-metric codes with optimal data recovery properties. Finally, we consider the problem of providing high availability, which is ensured by enabling node repair from multiple disjoint subsets of nodes of small size. We study codes with availability from a queuing-theoretical perspective by analyzing the average time necessary to download a block of data under the Poisson request arrival model when each node takes a random amount of time to fetch its contents. We compare the delay performance of the availability codes with several alternatives such as conventional erasure codes and replication schemes

    Locally repairable convertible codes with optimal access costs

    Full text link
    Modern large-scale distributed storage systems use erasure codes to protect against node failures with low storage overhead. In practice, the failure rate and other factors of storage devices in the system may vary significantly over time, and leads to changes of the ideal code parameters. To maintain the storage efficiency, this requires the system to adjust parameters of the currently used codes. The changing process of code parameters on encoded data is called code conversion. As an important class of storage codes, locally repairable codes (LRCs) can repair any codeword symbol using a small number of other symbols. This feature makes LRCs highly efficient for addressing single node failures in the storage systems. In this paper, we investigate the code conversions for locally repairable codes in the merge regime. We establish a lower bound on the access cost of code conversion for general LRCs and propose a general construction of LRCs that can perform code conversions with access cost matching this bound. This construction provides a family of LRCs together with optimal conversion process over the field of size linear in the code length.Comment: 25 page

    Locally Repairable Convolutional Codes With Sliding Window Repair

    Get PDF
    Locally repairable convolutional codes (LRCCs) for distributed storage systems (DSSs) are introduced in this work. They enable local repair, for a single node erasure (or more generally, ∂−1 erasures per local group), and sliding-window global repair, which can correct erasure patterns with up to dcj−1 erasures in every window of j+1 consecutive blocks of n nodes, where dcj−1 is the j th column distance of the code. The parameter j can be adjusted, for a fixed LRCC, according to different catastrophic erasure patterns, requiring only to contact n(j+1)−dcj+1 nodes, plus less than μn other nodes, in the storage system, where μ is the memory of the code. A Singleton-type bound is provided for dcj−1 . If it attains such a bound, an LRCC can correct the same number of catastrophic erasures in a window of length n(j+1) as an optimal locally repairable block code of the same rate and locality, and with block length n(j+1) . In addition, the LRCC is able to perform the flexible and somehow local sliding-window repair by adjusting j . Furthermore, by adjusting and/or sliding the window, the LRCC can potentially correct more erasures in the original window of n(j+1) nodes than an optimal locally repairable block code of the same rate and locality, and length n(j+1) . Finally, the concept of partial maximum distance profile (partial MDP) codes is introduced. Partial MDP codes can correct all information-theoretically correctable erasure patterns for a given locality, local distance and information rate. An explicit construction of partial MDP codes whose column distances attain the provided Singleton-type bound, up to certain parameter j=L , is obtained based on known maximum sum-rank distance convolutional codes.This work was supported in part by the Independent Research Fund Denmark under Grant DFF-7027-00053B, in part by the Generalitat Valenciana under Grant AICO/2017/128, and in part by the Universitat d’Alacant under Grant VIGROB-287

    Singleton-Optimal LRCs and Perfect LRCs via Cyclic and Constacyclic Codes

    Full text link
    Locally repairable codes (LRCs) have emerged as an important coding scheme in distributed storage systems (DSSs) with relatively low repair cost by accessing fewer non-failure nodes. Theoretical bounds and optimal constructions of LRCs have been widely investigated. Optimal LRCs via cyclic and constacyclic codes provide significant benefit of elegant algebraic structure and efficient encoding procedure. In this paper, we continue to consider the constructions of optimal LRCs via cyclic and constacyclic codes with long code length. Specifically, we first obtain two classes of qq-ary cyclic Singleton-optimal (n,k,d=6;r=2)(n, k, d=6;r=2)-LRCs with length n=3(q+1)n=3(q+1) when 3(q1)3 \mid (q-1) and qq is even, and length n=32(q+1)n=\frac{3}{2}(q+1) when 3(q1)3 \mid (q-1) and q1(mod 4)q \equiv 1(\bmod~4), respectively. To the best of our knowledge, this is the first construction of qq-ary cyclic Singleton-optimal LRCs with length n>q+1n>q+1 and minimum distance d5d \geq 5. On the other hand, an LRC acheiving the Hamming-type bound is called a perfect LRC. By using cyclic and constacyclic codes, we construct two new families of qq-ary perfect LRCs with length n=qm1q1n=\frac{q^m-1}{q-1}, minimum distance d=5d=5 and locality r=2r=2

    Coding for the Clouds: Coding Techniques for Enabling Security, Locality, and Availability in Distributed Storage Systems

    Get PDF
    Cloud systems have become the backbone of many applications such as multimedia streaming, e-commerce, and cluster computing. At the foundation of any cloud architecture lies a large-scale, distributed, data storage system. To accommodate the massive amount of data being stored on the cloud, these distributed storage systems (DSS) have been scaled to contain hundreds to thousands of nodes that are connected through a networking infrastructure. Such data-centers are usually built out of commodity components, which make failures the norm rather than the exception. In order to combat node failures, data is typically stored in a redundant fashion. Due to the exponential data growth rate, many DSS are beginning to resort to error control coding over conventional replication methods, as coding offers high storage space efficiency. This paradigm shift from replication to coding, along with the need to guarantee reliability, efficiency, and security in DSS, has created a new set of challenges and opportunities, opening up a new area of research. This thesis addresses several of these challenges and opportunities by broadly making the following contributions. (i) We design practically amenable, low-complexity coding schemes that guarantee security of cloud systems, ensure quick recovery from failures, and provide high availability for retrieving partial information; and (ii) We analyze fundamental performance limits and optimal trade-offs between the key performance metrics of these coding schemes. More specifically, we first consider the problem of achieving information-theoretic security in DSS against an eavesdropper that can observe a limited number of nodes. We present a framework that enables design of secure repair-efficient codes through a joint construction of inner and outer codes. Then, we consider a practically appealing notion of weakly secure coding, and construct coset codes that can weakly secure a wide class of regenerating codes that reduce the amount of data downloaded during node repair. Second, we consider the problem of meeting repair locality constraints, which specify the number of nodes participating in the repair process. We propose a notion of unequal locality, which enables different locality values for different nodes, ensuring quick recovery for nodes storing important data. We establish tight upper bounds on the minimum distance of linear codes with unequal locality, and present optimal code constructions. Next, we extend the notion of locality from the Hamming metric to the rank and subspace metrics, with the goal of designing codes for efficient data recovery from special types of correlated failures in DSS.We construct a family of locally recoverable rank-metric codes with optimal data recovery properties. Finally, we consider the problem of providing high availability, which is ensured by enabling node repair from multiple disjoint subsets of nodes of small size. We study codes with availability from a queuing-theoretical perspective by analyzing the average time necessary to download a block of data under the Poisson request arrival model when each node takes a random amount of time to fetch its contents. We compare the delay performance of the availability codes with several alternatives such as conventional erasure codes and replication schemes
    corecore