Search CORE

18 research outputs found

Coding for the Clouds: Coding Techniques for Enabling Security, Locality, and Availability in Distributed Storage Systems

Author: Kadhe Swanand Ravindra
Publication venue
Publication date: 16/01/2019
Field of study

Cloud systems have become the backbone of many applications such as multimedia streaming, e-commerce, and cluster computing. At the foundation of any cloud architecture lies a large-scale, distributed, data storage system. To accommodate the massive amount of data being stored on the cloud, these distributed storage systems (DSS) have been scaled to contain hundreds to thousands of nodes that are connected through a networking infrastructure. Such data-centers are usually built out of commodity components, which make failures the norm rather than the exception. In order to combat node failures, data is typically stored in a redundant fashion. Due to the exponential data growth rate, many DSS are beginning to resort to error control coding over conventional replication methods, as coding offers high storage space efficiency. This paradigm shift from replication to coding, along with the need to guarantee reliability, efficiency, and security in DSS, has created a new set of challenges and opportunities, opening up a new area of research. This thesis addresses several of these challenges and opportunities by broadly making the following contributions. (i) We design practically amenable, low-complexity coding schemes that guarantee security of cloud systems, ensure quick recovery from failures, and provide high availability for retrieving partial information; and (ii) We analyze fundamental performance limits and optimal trade-offs between the key performance metrics of these coding schemes. More specifically, we first consider the problem of achieving information-theoretic security in DSS against an eavesdropper that can observe a limited number of nodes. We present a framework that enables design of secure repair-efficient codes through a joint construction of inner and outer codes. Then, we consider a practically appealing notion of weakly secure coding, and construct coset codes that can weakly secure a wide class of regenerating codes that reduce the amount of data downloaded during node repair. Second, we consider the problem of meeting repair locality constraints, which specify the number of nodes participating in the repair process. We propose a notion of unequal locality, which enables different locality values for different nodes, ensuring quick recovery for nodes storing important data. We establish tight upper bounds on the minimum distance of linear codes with unequal locality, and present optimal code constructions. Next, we extend the notion of locality from the Hamming metric to the rank and subspace metrics, with the goal of designing codes for efficient data recovery from special types of correlated failures in DSS.We construct a family of locally recoverable rank-metric codes with optimal data recovery properties. Finally, we consider the problem of providing high availability, which is ensured by enabling node repair from multiple disjoint subsets of nodes of small size. We study codes with availability from a queuing-theoretical perspective by analyzing the average time necessary to download a block of data under the Poisson request arrival model when each node takes a random amount of time to fetch its contents. We compare the delay performance of the availability codes with several alternatives such as conventional erasure codes and replication schemes

Coding for the Clouds: Coding Techniques for Enabling Security, Locality, and Availability in Distributed Storage Systems

Author: Kadhe Swanand Ravindra
Publication venue
Publication date: 16/01/2019
Field of study

Coding Schemes for Distributed Storage Systems

Author: Ye Min
Publication venue
Publication date: 01/01/2017
Field of study

This thesis is devoted to problems in error-correcting codes motivated by data integrity problems arising in large-scale distributed storage systems. We study properties and constructions of Maximum Distance Separable (MDS) codes, which are widely used in storage applications since they provide the maximum failure tolerance for a given amount of storage overhead. Among the parameters of the code that are important for storage applications are: the amount of data transferred in the system during node repair (the repair bandwidth), which characterizes the network usage, and the volume of accessed data, which corresponds to the number of disk I/O operations. Therefore, recent research on MDS codes for distributed storage has focused on codes that can minimize these two quantities. A lower bound on the repair bandwidth of a code, called the cut-set bound, was proved by Dimakis et al. in 2010, and codes that attain this bound are said to have the optimal repair property. Explicit optimal-repair low-rate (rate

\le 1/2

) MDS codes were constructed by Rashmi et al. in 2011. At the same time, large-scale distributed systems such as the Google File System and Hadoop Distributed File System, employ high-rate (rate

> 1/2

) MDS codes due to the need of reducing storage overhead. Until recently, except for some particular cases, no general explicit constructions of high-rate optimal-repair MDS codes were known. In this thesis, we present the first explicit constructions of optimal-repair MDS codes, thereby providing a solution to the general construction problem of such codes for the high-rate regime. More specifically, we construct explicit MDS codes that can repair any number of failed nodes from any number of helper nodes with the smallest possible amount of downloaded/accessed data. For the particular case of repairing a single node failure, we further present an explicit family of MDS codes that minimize the amount of accessed data during the repair. This family of codes has an additional favorable property that the node size (the amount of information stored in the node) is also the smallest possible. Reducing the node size directly translates into reducing the complexity of storage systems. While most studies on MDS codes with optimal repair bandwidth focus on array codes, the repair problem of widely used scalar codes such as Reed-Solomon codes has also recently attracted attention of researchers. It has been an open problem whether scalar linear MDS codes can achieve the cut-set bound. In this thesis, we answer this question in the affirmative by giving explicit constructions of Reed-Solomon codes that can be repaired at the cut-set bound. We also prove a lower bound on the node size of optimally repairable scalar MDS codes, showing that the node size of our RS codes is close to the best possible for scalar linear codes. Finally, we extend the concept of repair bandwidth from erasure correction to error correction, which forms a new problem in coding theory. We prove a bound on the amount of downloaded information for this problem and present explicit code families that attain this bound for a wide range of parameters

선형 동일 복구 재생 부호의 저장량과 통신량 간 상충 관계의 외부 경계에 관한 연구

Author: 이혁
Publication venue: 서울대학교 대학원
Publication date: 01/08/2017
Field of study

학위논문 (박사)-- 서울대학교 대학원 공과대학 전기·컴퓨터공학부, 2017. 8. 이정우.최근 SNS나 클라우드 서비스의 사용량 증가와 더불어, 대규모의 데이터를 네트워크상에 효율적이고 안정적으로 저장할 수 있는 분산 저장 시스템(distributed storage system)에 대한 연구가 활발하게 진행되고 있다. 분산 저장 시스템은 대규모의 데이터 파일을 네트워크로 연결된 다수의 노드에 분산적으로 저장하는 시스템을 말한다. 일부의 노드가 손실되었을 때, 손실된 노드는 다른 생존한 노드들로부터 전송받은 정보를 이용하여 복구될 수 있어야 한다. 이러한 복구 과정에서 필요한 총 정보량인 복구 대역폭(repair bandwidth)을 최소화하는 것은 분산 저장시스템의 중요한 성능 지표중 하나이다. 협력 재생 부호(Cooperative regenerating codes)는 높은 복구 대역폭을 최소화하는 erasure code의 일종이다.

(n,k,d,r)

-협력 재생 부호는 총

n

개의 저장소 노드 중 일부의

k

개의 노드에 저장된 정보만으로 원래의 파일을 복구할 수 있는 기능과

r

개의 노드 손실이 발생했을때, 임의의

d

개의 생존한 노드들로부터 정보를 전송받아 복구될 수 있는 기능을 가진다. 이 때, 재생 부호의 각 노드별 저장량

\alpha

와 복구 대역폭

\gamma

는 일반적으로 상충관계에 놓여 있음이 알려져 있다. 하지만 새롭게 복구된 노드가 기존 노드와 다른 정보를 가지는 것을 허용하는 기능 복구(functional repair) 모델의 경우, 이 상충관계가 완벽히 밝혀져 있으나, 손실되기 전과 완전히 동일한 노드로의 복구를 요구하는 동일 복구(exact repair) 모델의 경우, 이 상충관계가 명확히 밝혀져 있지 않다. 본 논문에서는 동일 복구 모델의 상충 관계에 대한 두 종류의 외부 경계(outer bound)를 제시한다. 상충 관계의 외부 경계는 기능 복구 부호로는 가능하지만, 동일 복구 부호로는 설계가 불가능한

(\alpha,\gamma)

동작점들을 제시한다. 첫 번째 외부 경계는 일반적인

(n,k,d,r)

파라미터를 가지는 협력 재생 부호를 가정하여 유도되었다. 이 외부 경계는

d=k=n-1

r=1

을 만족하는 경우에 한하여 최적의 상충관계를 밝힌 Prakash 등의 연구 결과를 일반화한 것으로 볼 수 있다. 첫 번째 외부 경계는

k

가 크거나

r

이 작거나

k

와

d

가 비슷한 조건 하에서 더 좋은 성능을 보임을 확인할 수 있다. 두 번째 외부 경계는 한 번에 한 개의 손실된 노드만을 복구하는 경우로 한정하였을 때를 고려한다. 두 번째 외부 경계는 두 개의 독립적인 부경계(sub-bound)의 합집합으로 표현된다. 두 가지의 부경계들은 각각 성능이 좋아지는 조건이 다름을 실험을 통해 확인할 수 있다. 첫 번째 부경계는 본 논문에서 첫 번째로 제안된 외부 경계와 비슷하게

k/n

으로 정의되는 코드의 부호화율이 1에 가까울수록 더 좋은 성능을 보이며, 두 번째 부 경계는 반대로 부호화율이 낮아질떄 다른 기존의 외부경계들보다 더 좋은 성능을 보임을 확인할 수 있다.Distributed storage systems disperse data to a large number of storage nodes connected in a network. When some of the storage nodes fail, a storage system should be able to repair them by downloading data from other surviving nodes. The amount of data traffic during the repair, called repair bandwidth, is one of the important performance metrics of distributed storage systems. Cooperative regenerating codes are a class of recently developed erasure codes which are optimal in terms of minimizing the repair bandwidth. An

(n,k,d,r)

-cooperative regenerating code has

n

storage nodes, where

k

arbitrary nodes are enough to reconstruct the original data, and

r

failed nodes can be repaired cooperatively with the help of

d

arbitrary surviving nodes. In the regenerating-code framework, there exists a tradeoff between the storage capacity of each node

\alpha

and the repair bandwidth

\gamma

. The tradeoff of functional repair codes are fully characterized by Shum et al, but the problem of specifying the optimal storage-bandwidth tradeoff of the exact repair codes remains open. In this dissertation, two outer bounds on the storage-bandwidth tradeoff under the exact repair model are proposed. The outer bounds suggest the

(\alpha,\gamma)

pairs that no exact repair codes can achieve but only functional repair codes can. The first outer bound considers general set of parameters

(n,k,d,r)

. This result can be regarded as a generalization of the outer bound proposed by Prakash et al., which specifies the optimal tradeoff of exact-repair regenerating codes for the case of

d=k=n-1

and

r=1

. It is verified that the proposed outer bound becomes more effective when

k

is large,

r

is small, or

d~(\geq k)

is close to

k

. The second outer bound is developed for the case of single node repair (

r=1

). The bound is union of two independently derived sub-bounds. Each sub-bound has its own condition to be tighter than the other. One sub-bound can be regarded as an extension of the first outer bound for

r=1

, and becomes more effective in high rates (

k/n >\frac {1}{2}

). The other sub-bound is derived based on the symmetric property of the storage nodes, and is tight in low rates (

k/n <\frac{1}{2}

).1 Introduction 1 1.1 The Family of Regenerating Codes 2 1.2 The Exact Repair Model 5 1.3 Existing Results on the S-B Tradeoff of Exact Repair Codes 7 1.4 Main Contribution 10 2 An Outer Bound on the Storage-Bandwidth Tradeoff of Cooperative Regenerating Codes 14 2.1 Conditions for Parity Check Matrices of Linear Cooperative Regenerating Codes 14 2.1.1 Proof of Lemma 1 24 2.2 An Alternative Proof of Functional Repair Cutset Bound 28 2.2.1 Construction of Hrepair 30 2.2.2 Lower Bounds of rank(Hrepair) 35 2.2.3 Upper Bounds of B 39 2.3 Block Matrices with Full-Rank Diagonal Blocks 39 2.3.1 Definitions 41 2.3.2 Properties of Block Matrices with Full-Rank Diagonal Blocks 43 2.4 An Outer Bound of Linear and Exact-Repair Cooperative Regenerating Codes 55 2.4.1 Construction of Hrepair 56 2.4.2 Lower Bound of rank(Hrepair) 57 2.4.3 Derivation of the Proposed Outer Bound 60 2.5 Evaluation of the Proposed Outer Bound 63 3 An Improved Outer Bound for the Case of Single Node Repair 69 3.1 Symmetric Exact-Repair codes 69 3.2 Conditions for Parity Check Matrices of Single Repair Codes 70 3.3 Construction of Hsingle 75 3.4 Derivation of Two Sub-Bounds 80 3.4.1 Proof of Theorem 2 80 3.4.2 Proof of Theorem 3 83 3.5 Performance Evaluation 86 4 Conclusion 93 Bibilography 95 Abstract (In Korean) 102 Acknowledgements (In Korean) 104Docto

An erasure-resilient and compute-efficient coding scheme for storage applications

Author: Kalcher Sebastian
Publication venue
Publication date: 01/01/2013
Field of study

Driven by rapid technological advancements, the amount of data that is created, captured, communicated, and stored worldwide has grown exponentially over the past decades. Along with this development it has become critical for many disciplines of science and business to being able to gather and analyze large amounts of data. The sheer volume of the data often exceeds the capabilities of classical storage systems, with the result that current large-scale storage systems are highly distributed and are comprised of a high number of individual storage components. As with any other electronic device, the reliability of storage hardware is governed by certain probability distributions, which in turn are influenced by the physical processes utilized to store the information. The traditional way to deal with the inherent unreliability of combined storage systems is to replicate the data several times. Another popular approach to achieve failure tolerance is to calculate the block-wise parity in one or more dimensions. With better understanding of the different failure modes of storage components, it has become evident that sophisticated high-level error detection and correction techniques are indispensable for the ever-growing distributed systems. The utilization of powerful cyclic error-correcting codes, however, comes with a high computational penalty, since the required operations over finite fields do not map very well onto current commodity processors. This thesis introduces a versatile coding scheme with fully adjustable fault-tolerance that is tailored specifically to modern processor architectures. To reduce stress on the memory subsystem the conventional table-based algorithm for multiplication over finite fields has been replaced with a polynomial version. This arithmetically intense algorithm is better suited to the wide SIMD units of the currently available general purpose processors, but also displays significant benefits when used with modern many-core accelerator devices (for instance the popular general purpose graphics processing units). A CPU implementation using SSE and a GPU version using CUDA are presented. The performance of the multiplication depends on the distribution of the polynomial coefficients in the finite field elements. This property has been used to create suitable matrices that generate a linear systematic erasure-correcting code which shows a significantly increased multiplication performance for the relevant matrix elements. Several approaches to obtain the optimized generator matrices are elaborated and their implications are discussed. A Monte-Carlo-based construction method allows it to influence the specific shape of the generator matrices and thus to adapt them to special storage and archiving workloads. Extensive benchmarks on CPU and GPU demonstrate the superior performance and the future application scenarios of this novel erasure-resilient coding scheme

GSI Repository