46,750 research outputs found

    Locality and Availability in Distributed Storage

    Full text link
    This paper studies the problem of code symbol availability: a code symbol is said to have (r,t)(r, t)-availability if it can be reconstructed from tt disjoint groups of other symbols, each of size at most rr. For example, 33-replication supports (1,2)(1, 2)-availability as each symbol can be read from its t=2t= 2 other (disjoint) replicas, i.e., r=1r=1. However, the rate of replication must vanish like 1t+1\frac{1}{t+1} as the availability increases. This paper shows that it is possible to construct codes that can support a scaling number of parallel reads while keeping the rate to be an arbitrarily high constant. It further shows that this is possible with the minimum distance arbitrarily close to the Singleton bound. This paper also presents a bound demonstrating a trade-off between minimum distance, availability and locality. Our codes match the aforementioned bound and their construction relies on combinatorial objects called resolvable designs. From a practical standpoint, our codes seem useful for distributed storage applications involving hot data, i.e., the information which is frequently accessed by multiple processes in parallel.Comment: Submitted to ISIT 201

    Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments

    Full text link
    Data centres that use consumer-grade disks drives and distributed peer-to-peer systems are unreliable environments to archive data without enough redundancy. Most redundancy schemes are not completely effective for providing high availability, durability and integrity in the long-term. We propose alpha entanglement codes, a mechanism that creates a virtual layer of highly interconnected storage devices to propagate redundant information across a large scale storage system. Our motivation is to design flexible and practical erasure codes with high fault-tolerance to improve data durability and availability even in catastrophic scenarios. By flexible and practical, we mean code settings that can be adapted to future requirements and practical implementations with reasonable trade-offs between security, resource usage and performance. The codes have three parameters. Alpha increases storage overhead linearly but increases the possible paths to recover data exponentially. Two other parameters increase fault-tolerance even further without the need of additional storage. As a result, an entangled storage system can provide high availability, durability and offer additional integrity: it is more difficult to modify data undetectably. We evaluate how several redundancy schemes perform in unreliable environments and show that alpha entanglement codes are flexible and practical codes. Remarkably, they excel at code locality, hence, they reduce repair costs and become less dependent on storage locations with poor availability. Our solution outperforms Reed-Solomon codes in many disaster recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN

    Locality and Availability with Multiple Erasure Tolerance in Distributed Storage

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2019. 2. ์ด์ •์šฐ.์ตœ๊ทผ ์—ฌ๋Ÿฌ ์‹œ์Šคํ…œ์—์„œ ๋‹ค๋ฃจ๋Š” ๋ฐ์ดํ„ฐ์˜ ์–‘์ด ๋ฐฉ๋Œ€ํ•ด์ง€๋ฉด์„œ ๋ถ„์‚ฐ ์ €์žฅ ์‹œ์Šคํ…œ์˜ ์ค‘์š”์„ฑ์ด ์ปค์ง€๊ณ  ์žˆ๋‹ค. ๋ถ„์‚ฐ ์ €์žฅ ์‹œ์Šคํ…œ์—์„œ๋Š” ๋„คํŠธ์›Œํฌ ์ƒ์˜ ๋ฌธ์ œ ํ˜น์€ ์žฅ๋น„์˜ ๋ฌธ์ œ๋กœ ์ธํ•ด ๋…ธ๋“œ ์†์‹ค์ด๋ผ๋Š” ๊ฒฐํ•จ์ด ์ƒ๊ธด๋‹ค. ์ด ๊ฒฝ์šฐ ์†์‹ค๋˜์ง€ ์•Š์€ ๋…ธ๋“œ๋ฅผ ํ†ตํ•ด ์†์‹ค๋œ ๋…ธ๋“œ๋ฅผ ์›์ƒํƒœ๋กœ ๋ณต๊ตฌํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค. ์ด ๋•Œ ๋ถ„์‚ฐ ์ €์žฅ์— ์‚ฌ์šฉ๋œ ๋ถ€ํ˜ธ๊ฐ€ ๋ณต๊ตฌ์˜ ์„ฑ๋Šฅ์„ ๊ฒฐ์ •์ง“๊ฒŒ ๋œ๋‹ค. ์‹œ์Šคํ…œ์˜ ์šฉ๋„์— ๋”ฐ๋ผ ๋ถ„์‚ฐ ์ €์žฅ์— ์‚ฌ์šฉ๋˜๋Š” ๋ถ€ํ˜ธ์˜ ์„ฑ๋Šฅ์„ ๊ฒฐ์ •ํ•˜๋Š” ์š”์†Œ๊ฐ€ ๋‹ค๋ฅด๋‹ค. ๊ทธ ์ค‘ ๋ถ€๋ถ„์ ‘์†์ˆ˜(locality)๋Š” ์–ด๋–ค ์†์‹ค๋œ ๋…ธ๋“œ๋ฅผ ๋ณต๊ตฌํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ๋…ธ๋“œ์˜ ์ˆ˜๋ฅผ ์˜๋ฏธํ•˜๊ณ  ๊ฐ€์šฉ๋„๋Š” ์–ด๋–ค ์†์‹ค๋œ ๋…ธ๋“œ๋ฅผ ๋ณต๊ตฌํ•  ์ˆ˜ ์žˆ๋Š” ์„œ๋กœ์†Œ(disjoint)์ธ ๋ณต๊ตฌ์ง‘ํ•ฉ์˜ ์ˆ˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ์‹ค์šฉ์ ์ธ ์ธก๋ฉด์—์„œ ๊ฐ€์šฉ๋„ ๊ฐœ๋…์„ ๋„์ž…ํ•  ๊ฒฝ์šฐ ๋‹ค์ˆ˜์˜ ์‚ฌ์šฉ์ž๊ฐ€ ๋™์‹œ์— ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ์— ๋ณ‘๋ ฌ์ ์œผ๋กœ ์ ‘๊ทผํ•จ์œผ๋กœ์จ ๋™์‹œ์— ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๊ฐ€์šฉ๋„๋ฅผ ๊ณ ๋ คํ•œ ๋ถ€๋ถ„์ ‘์†๋ณต๊ตฌ ๋ถ€ํ˜ธ๋Š” ํ•ซ ๋ฐ์ดํ„ฐ๊ฐ€ ์ฃผ๋กœ ์ €์žฅ๋œ ๋ถ„์‚ฐ ์ €์žฅ ์‹œ์Šคํ…œ์— ๋งค์šฐ ์œ ์šฉํ•˜๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ถ„์‚ฐ ์ €์žฅ ์‹œ์Šคํ…œ์—์„œ ๋‹ค์ค‘ ๋…ธ๋“œ ์†์‹ค๊ณผ ๊ฐ€์šฉ๋„๋ฅผ ํ•จ๊ป˜ ๊ณ ๋ คํ•œ ๋ถ€๋ถ„์ ‘์†๋ณต๊ตฌ ๋ถ€ํ˜ธ๋ฅผ ์ƒˆ๋กญ๊ฒŒ ์ œ์•ˆํ•˜๊ณ  ๊ทธ ๋ถ€ํ˜ธ์— ๋Œ€ํ•œ ์ตœ์†Œ ๊ฑฐ๋ฆฌ์˜ ์ƒ๊ณ„๋ฅผ ๊ตฌํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ƒˆ๋กญ๊ฒŒ ์ œ์•ˆํ•œ ๋ถ€ํ˜ธ์˜ ์ตœ์†Œ ๊ฑฐ๋ฆฌ์˜ ์ƒ๊ณ„์˜ achievability๋ฅผ ๋ณด์ด๊ธฐ ์œ„ํ•ด ์ตœ์†Œ ๊ฑฐ๋ฆฌ ์ƒ๊ณ„์— ๋Œ€ํ•œ ๋“ฑ์‹์„ ๋งŒ์กฑํ•˜๋Š” ๋ถ€ํ˜ธ๋ฅผ ์„ค๊ณ„ํ•œ๋‹ค. ํŠนํžˆ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ •๋ณด ์‹ฌ๋ณผ๋“ค์— ๋Œ€ํ•œ ๋ณต๊ตฌ์ง‘ํ•ฉ๋“ค์˜ ๋…ธ๋“œ ์†์‹ค๊นŒ์ง€ ๊ณ ๋ คํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ๊ธฐ์กด์˜ ๊ฐ€์šฉ๋„๋งŒ์„ ๊ณ ๋ คํ•œ ๋ถ€๋ถ„์ ‘์†๋ณต๊ตฌ ๋ถ€ํ˜ธ์— ๋น„ํ•ด ์†์‹ค์— ๋Œ€ํ•œ tolerance๊ฐ€ ๋” ํฌ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” (n,k,r,t,ฮด)-๋ถ€๋ถ„์ ‘์†๋ณต๊ตฌ ๋ถ€ํ˜ธ๋Š” ์†์‹ค์ด ์ž์ฃผ ์ผ์–ด๋‚˜๋ฉฐ ๋™์‹œ์— ์ ‘์†ํ•  ํ•„์š”๊ฐ€ ์žˆ๋Š” ํ•ซ ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ์— ๋”์šฑ ์ ํ•ฉํ•˜๋‹ค.Recently, as the amount of data to be handled by various systems has increased, the importance of distributed storage systems has increased. In a distributed storage system, there is a flaw in the node loss due to network problems or equipment problems. In this case, it is important to reconstruct the lost node through the non-lost node. At this time, the code used for distributed storage determines the performance of recovery. Depending on the use of the system, the factors that determine the performance of the codes used for distributed storage are different. Among them, 'locality' means the number of nodes needed to recover a lost node, and availability means the number of disjoint recovery sets that can recover a lost node. In practical terms, when the availability is introduced, it is advantageous that a plurality of users simultaneously access data at the same time and simultaneously read data. Therefore, locally repairable code considering availability is very useful for distributed storage systems where hot data is mainly stored. In this paper, we propose a locally repairable code considering multi - node loss and availability in a distributed storage system. Moreover, we find the upper bound of minimum distance for the code. In order to show the achievability of the upper bound of the minimum distance of the newly proposed code, a code satisfying the equation for the bound is designed. In particular, since we consider multiple node loss of recovery sets, we have more tolerance for loss than locally repairable code considering only the availability. Therefore, the (n, k, r, t, ฮด) โ€“ locally repairable code proposed in this paper is more suitable for using hot data which has frequent loss and frequent connection.์ œ 1 ์žฅ ์„œ ๋ก  1 ์ œ 1 ์ ˆ ์—ฐ๊ตฌ์˜ ๋ฐฐ๊ฒฝ 1 ์ œ 2 ์ ˆ ์—ฐ๊ตฌ์˜ ๋‚ด์šฉ 1 ์ œ 2 ์žฅ ๋ฐฐ๊ฒฝ์ด๋ก  2 ์ œ 1 ์ ˆ ์ƒ์„ฑ ํ–‰๋ ฌ๊ณผ ํŒจ๋ฆฌํ‹ฐ ๊ฒ€์‚ฌ ํ–‰๋ ฌ 2 ์ œ 2 ์ ˆ ๋ถ€ํ˜ธ์˜ ์ตœ์†Œ ๊ฑฐ๋ฆฌ์™€ ์‹ฑ๊ธ€ํ†ค ์ƒ๊ณ„ 3 ์ œ 3 ์žฅ ๋ถ€๋ถ„์ ‘์†๋ณต๊ตฌ ๋ถ€ํ˜ธ 6 ์ œ 1 ์ ˆ ๋ถ€๋ถ„์ ‘์†๋ณต๊ตฌ ๋ถ€ํ˜ธ 6 ์ œ 2 ์ ˆ ๋ถ„์‚ฐ์ €์žฅ์—์„œ ๋ถ€๋ถ„์ ‘์†์ˆ˜์™€ ๊ฐ€์šฉ๋„ 8 ์ œ 4 ์žฅ ๋‹ค์ค‘ ๋…ธ๋“œ ์†์‹ค์„ ๊ณ ๋ คํ•œ ๋ถ€๋ถ„์ ‘์†์ˆ˜์™€ ๊ฐ€์šฉ๋„ 11 ์ œ 1 ์ ˆ ๋‹ค์ค‘ ๋…ธ๋“œ ์†์‹ค๊ณผ ๊ฐ€์šฉ๋„๋ฅผ ๊ณ ๋ คํ•œ ๋ถ€๋ถ„์ ‘์†๋ณต๊ตฌ ๋ถ€ํ˜ธ 11 ์ œ 2 ์ ˆ (n,k,r,t,ฮด)-๋ถ€๋ถ„์ ‘์†๋ณต๊ตฌ ๋ถ€ํ˜ธ์˜ ์ตœ์†Œ ๊ฑฐ๋ฆฌ์— ๋Œ€ํ•œ ์ƒ๊ณ„ 12 ์ œ 5 ์žฅ (n,k,r,t,ฮด)-๋ถ€๋ถ„์ ‘์†๋ณต๊ตฌ ๋ถ€ํ˜ธ์˜ ์ตœ์†Œ ๊ฑฐ๋ฆฌ ์ƒ๊ณ„์— ๋Œ€ํ•œachievability 18 ์ œ 1 ์ ˆ 5(n,k,r,t,ฮด)-๋ถ€๋ถ„์ ‘์†๋ณต๊ตฌ ๋ถ€ํ˜ธ์˜ ์„ค๊ณ„ 18 ์ œ 2 ์ ˆ ์ตœ์†Œ ๊ฑฐ๋ฆฌ ์ƒ๊ณ„์— ๋Œ€ํ•œ achievability 20 ์ œ 6 ์žฅ ๊ฒฐ ๋ก  23 ์ฐธ๊ณ ๋ฌธํ—Œ 24Maste

    Coding for the Clouds: Coding Techniques for Enabling Security, Locality, and Availability in Distributed Storage Systems

    Get PDF
    Cloud systems have become the backbone of many applications such as multimedia streaming, e-commerce, and cluster computing. At the foundation of any cloud architecture lies a large-scale, distributed, data storage system. To accommodate the massive amount of data being stored on the cloud, these distributed storage systems (DSS) have been scaled to contain hundreds to thousands of nodes that are connected through a networking infrastructure. Such data-centers are usually built out of commodity components, which make failures the norm rather than the exception. In order to combat node failures, data is typically stored in a redundant fashion. Due to the exponential data growth rate, many DSS are beginning to resort to error control coding over conventional replication methods, as coding offers high storage space efficiency. This paradigm shift from replication to coding, along with the need to guarantee reliability, efficiency, and security in DSS, has created a new set of challenges and opportunities, opening up a new area of research. This thesis addresses several of these challenges and opportunities by broadly making the following contributions. (i) We design practically amenable, low-complexity coding schemes that guarantee security of cloud systems, ensure quick recovery from failures, and provide high availability for retrieving partial information; and (ii) We analyze fundamental performance limits and optimal trade-offs between the key performance metrics of these coding schemes. More specifically, we first consider the problem of achieving information-theoretic security in DSS against an eavesdropper that can observe a limited number of nodes. We present a framework that enables design of secure repair-efficient codes through a joint construction of inner and outer codes. Then, we consider a practically appealing notion of weakly secure coding, and construct coset codes that can weakly secure a wide class of regenerating codes that reduce the amount of data downloaded during node repair. Second, we consider the problem of meeting repair locality constraints, which specify the number of nodes participating in the repair process. We propose a notion of unequal locality, which enables different locality values for different nodes, ensuring quick recovery for nodes storing important data. We establish tight upper bounds on the minimum distance of linear codes with unequal locality, and present optimal code constructions. Next, we extend the notion of locality from the Hamming metric to the rank and subspace metrics, with the goal of designing codes for efficient data recovery from special types of correlated failures in DSS.We construct a family of locally recoverable rank-metric codes with optimal data recovery properties. Finally, we consider the problem of providing high availability, which is ensured by enabling node repair from multiple disjoint subsets of nodes of small size. We study codes with availability from a queuing-theoretical perspective by analyzing the average time necessary to download a block of data under the Poisson request arrival model when each node takes a random amount of time to fetch its contents. We compare the delay performance of the availability codes with several alternatives such as conventional erasure codes and replication schemes

    Improving capacity-performance tradeoffs in the storage tier

    Get PDF
    Data-set sizes are growing. New techniques are emerging to organize and analyze these data-sets. There is a key access pattern emerging with these new techniques, large sequential file accesses. The trend toward bigger files exists to help amortize the cost of data accesses from the storage layer, as many workloads are recognized to be I/O bound. The storage layer is widely recognized as the slowest layer in the system. This work focuses on the tradeoff one can make with that storage capacity to improve system performance. ^ Capacity can be leveraged for improved availability or improved performance. This tradeoff is key in the storage layer, as this allows for data loss prevention and bandwidth aggregation. Typically these tradeoffs do not allow much choice with regard to capacity use. This work will leverage replication as the enabling mechanism to improve the capacity-performance tradeoff in the storage tier, while still providing for availability. ^ This capacity-performance tradeoff can be made at both the local and distributed file system level. I propose two techniques that allow for an improved tradeoff of capacity. The local file system can be employed on scale-out or scale-up infrastructures to improve performance. The distributed file system is targeted at distributed frameworks, such as MapReduce, to improve the cluster performance. The local file system design is MorphStore, and the distributed file system is BoostDFS. ^ MorphStore is a file system that significantly improves performance when accessing large files by using two innovations. MorphStore combines (a) load-adaptive I/O access scheduling to dynamically optimize throughput (aggregation), and (b) utility-xiii driven replication to best use capacity for performance. Additionally, adaptive-access scheduling can be utilized to optimize scheduling of requests (for throughput) on systems with a large number of storage devices. Replication is utilized to make available high utility files and then optimize throughput of these high utility files based on system load. ^ BoostDFS is a distributed file system that allows a better capacity-performance tradeoff via inter-node file replication. BoostDFS is built on the observation that distributed file systems currently inter-node replication for availability, but provide no mechanism to further improve performance. Replication for availability provides diminishing returns on performance, this is due to saturation of locality. BoostDFS exploits the common by improving I/O performance of these local tasks. This is done via intra-node replication by leveraging MorphStore as the local file system. This technique allows for capacity to be traded for availability as well as performance, with a small capacity overhead under constant availability. ^ Both MorphStore and BoostDFS utilize replication. Replication allows for both bandwidth aggregation and availability, This work primarily focuses on the performance utility of replication, but does not sacrifice availability in the process. These techniques provide an improved capacity-performance tradeoff while allowing the desired level of availability
    • โ€ฆ
    corecore