20 research outputs found

    Exploration of Erasure-Coded Storage Systems for High Performance, Reliability, and Inter-operability

    Get PDF
    With the unprecedented growth of data and the use of low commodity drives in local disk-based storage systems and remote cloud-based servers has increased the risk of data loss and an overall increase in the user perceived system latency. To guarantee high reliability, replication has been the most popular choice for decades, because of simplicity in data management. With the high volume of data being generated every day, the storage cost of replication is very high and is no longer a viable approach. Erasure coding is another approach of adding redundancy in storage systems, which provides high reliability at a fraction of the cost of replication. However, the choice of erasure codes being used affects the storage efficiency, reliability, and overall system performance. At the same time, the performance and interoperability are adversely affected by the slower device components and complex central management systems and operations. To address the problems encountered in various layers of the erasure coded storage system, in this dissertation, we explore the different aspects of storage and design several techniques to improve the reliability, performance, and interoperability. These techniques range from the comprehensive evaluation of erasure codes, application of erasure codes for highly reliable and high-performance SSD system, to the design of new erasure coding and caching schemes for Hadoop Distributed File System, which is one of the central management systems for distributed storage. Detailed evaluation and results are also provided in this dissertation

    Optimal Rebuilding of Multiple Erasures in MDS Codes

    Get PDF
    MDS array codes are widely used in storage systems due to their computationally efficient encoding and decoding procedures. An MDS code with rr redundancy nodes can correct any rr node erasures by accessing all the remaining information in the surviving nodes. However, in practice, ee erasures is a more likely failure event, for 1≤e<r1\le e<r. Hence, a natural question is how much information do we need to access in order to rebuild ee storage nodes? We define the rebuilding ratio as the fraction of remaining information accessed during the rebuilding of ee erasures. In our previous work we constructed MDS codes, called zigzag codes, that achieve the optimal rebuilding ratio of 1/r1/r for the rebuilding of any systematic node when e=1e=1, however, all the information needs to be accessed for the rebuilding of the parity node erasure. The (normalized) repair bandwidth is defined as the fraction of information transmitted from the remaining nodes during the rebuilding process. For codes that are not necessarily MDS, Dimakis et al. proposed the regenerating codes framework where any rr erasures can be corrected by accessing some of the remaining information, and any e=1e=1 erasure can be rebuilt from some subsets of surviving nodes with optimal repair bandwidth. In this work, we study 3 questions on rebuilding of codes: (i) We show a fundamental trade-off between the storage size of the node and the repair bandwidth similar to the regenerating codes framework, and show that zigzag codes achieve the optimal rebuilding ratio of e/re/r for MDS codes, for any 1≤e≤r1\le e\le r. (ii) We construct systematic codes that achieve optimal rebuilding ratio of 1/r1/r, for any systematic or parity node erasure. (iii) We present error correction algorithms for zigzag codes, and in particular demonstrate how these codes can be corrected beyond their minimum Hamming distances.Comment: There is an overlap of this work with our two previous submissions: Zigzag Codes: MDS Array Codes with Optimal Rebuilding; On Codes for Optimal Rebuilding Access. arXiv admin note: text overlap with arXiv:1112.037

    HTSC and FH_HTSC: XOR-based codes to reduce access latency in distributed storage systems

    Get PDF
    A massive distributed storage system is the foundation for big data operations. Access latency performance is a key metric in distributed storage systems since it greatly impacts user experience while existing codes mainly focus on improving performance such as storage overhead and repair cost. By generating parity nodes from parity nodes, in this paper we design new XOR-based erasure codes hierarchical tree structure code (HTSC) and high failure tolerant HTSC (FH_HTSC) to reduce access latency in distributed storage systems. By comparing with other popular and representative codes, we show that, under the same repair cost, HTSC and FH.HTSC codes can reduce access latency while maintaining favorable performance in other metrics. In particular, under the same repair cost, FH.HTSC can achieve lower access latency, higher or equal failure tolerance and lower computation cost compared with the representative codes while enjoying similar storage overhead. Accordingly, FH.HTSC is a superior choice for applications requiring low access latency and outstanding failure tolerance capability at the same time.postprin

    RAID Organizations for Improved Reliability and Performance: A Not Entirely Unbiased Tutorial (1st revision)

    Full text link
    RAID proposal advocated replacing large disks with arrays of PC disks, but as the capacity of small disks increased 100-fold in 1990s the production of large disks was discontinued. Storage dependability is increased via replication or erasure coding. Cloud storage providers store multiple copies of data obviating for need for further redundancy. Varitaions of RAID based on local recovery codes, partial MDS reduce recovery cost. NAND flash Solid State Disks - SSDs have low latency and high bandwidth, are more reliable, consume less power and have a lower TCO than Hard Disk Drives, which are more viable for hyperscalers.Comment: Submitted to ACM Computing Surveys. arXiv admin note: substantial text overlap with arXiv:2306.0876

    An erasure-resilient and compute-efficient coding scheme for storage applications

    Get PDF
    Driven by rapid technological advancements, the amount of data that is created, captured, communicated, and stored worldwide has grown exponentially over the past decades. Along with this development it has become critical for many disciplines of science and business to being able to gather and analyze large amounts of data. The sheer volume of the data often exceeds the capabilities of classical storage systems, with the result that current large-scale storage systems are highly distributed and are comprised of a high number of individual storage components. As with any other electronic device, the reliability of storage hardware is governed by certain probability distributions, which in turn are influenced by the physical processes utilized to store the information. The traditional way to deal with the inherent unreliability of combined storage systems is to replicate the data several times. Another popular approach to achieve failure tolerance is to calculate the block-wise parity in one or more dimensions. With better understanding of the different failure modes of storage components, it has become evident that sophisticated high-level error detection and correction techniques are indispensable for the ever-growing distributed systems. The utilization of powerful cyclic error-correcting codes, however, comes with a high computational penalty, since the required operations over finite fields do not map very well onto current commodity processors. This thesis introduces a versatile coding scheme with fully adjustable fault-tolerance that is tailored specifically to modern processor architectures. To reduce stress on the memory subsystem the conventional table-based algorithm for multiplication over finite fields has been replaced with a polynomial version. This arithmetically intense algorithm is better suited to the wide SIMD units of the currently available general purpose processors, but also displays significant benefits when used with modern many-core accelerator devices (for instance the popular general purpose graphics processing units). A CPU implementation using SSE and a GPU version using CUDA are presented. The performance of the multiplication depends on the distribution of the polynomial coefficients in the finite field elements. This property has been used to create suitable matrices that generate a linear systematic erasure-correcting code which shows a significantly increased multiplication performance for the relevant matrix elements. Several approaches to obtain the optimized generator matrices are elaborated and their implications are discussed. A Monte-Carlo-based construction method allows it to influence the specific shape of the generator matrices and thus to adapt them to special storage and archiving workloads. Extensive benchmarks on CPU and GPU demonstrate the superior performance and the future application scenarios of this novel erasure-resilient coding scheme

    Optimal Rebuilding of Multiple Erasures in MDS Codes

    Get PDF
    Maximum distance separable (MDS) array codes are widely used in storage systems due to their computationally efficient encoding and decoding procedures. An MDS code with r redundancy nodes can correct any r node erasures by accessing (reading) all the remaining information in the surviving nodes. However, in practice, e erasures are a more likely failure event, for some 1≤e<r . Hence, a natural question is how much information do we need to access in order to rebuild e storage nodes. We define the rebuilding ratio as the fraction of remaining information accessed during the rebuilding of e erasures. In our previous work, we constructed MDS codes, called zigzag codes, that achieve the optimal rebuilding ratio of 1/r for the rebuilding of any systematic node when e=1 ; however, all the information needs to be accessed for the rebuilding of the parity node erasure. The (normalized) repair bandwidth is defined as the fraction of information transmitted from the remaining nodes during the rebuilding process. For codes that are not necessarily MDS, Dimakis et al. proposed the regenerating codes framework where any r erasures can be corrected by accessing some of the remaining information, and any e=1 erasure can be rebuilt from some subsets of surviving nodes with optimal repair bandwidth. In this paper, we present three results on rebuilding of codes: 1) we show a fundamental outer bound on the storage size of the node and the repair bandwidth similar to the regenerating codes framework, and show that zigzag codes achieve the optimal rebuilding ratio of e/r for systematic nodes of MDS codes, for any 1≤e≤r ; 2) we construct systematic codes that achieve optimal rebuilding ratio of 1/r , for any systematic or parity node erasure; and 3) we present error correction algorithms for zigzag codes, and in particular demonstrate how these codes can be corrected beyond their minimum Hamming distances

    Hyfs: design and implementation of a reliable file system

    Get PDF
    Building reliable data storage systems is crucial to any commercial or scientific applications. Modern storage systems are complicated, and they are comprised of many components, from hardware to software. Problems may occur to any component of storage systems and cause data loss. When this kind of failures happens, storage systems cannot continue their data services, which may result in large revenue loss or even catastrophe to enterprises. Therefore, it is critically important to build reliable storage systems to ensure data reliability. In this dissertation, we propose to employ general erasure codes to build a reliable file system, called HyFS. HyFS is a cluster system, which can aggregate distributed storage servers to provide reliable data service. On client side, HyFS is implemented as a native file system so that applications can transparently run on top of HyFS. On server side, HyFS utilizes multiple distributed storage servers to provide highly reliable data service by employing erasure codes. HyFS is able to offer high throughput for either random or sequential file access, which makes HyFS an attractive choice for primary or backup storage systems. This dissertation studies five relevant topics of HyFS. Firstly, it presents several algorithms that can perform encoding operation efficiently for XOR-based erasure codes. Secondly, it discusses an efficient decoding algorithm for RAID-6 erasure codes. This algorithm can recover various types of disk failures. Thirdly, it describes an efficient algorithm to detect and correct errors for the STAR code, which further improves a storage system\u27s reliability. Fourthly, it describes efficient implementations for the arithmetic operations of large finite fields. This is to improve a storage system\u27s security. Lastly and most importantly, it presents the design and implementation of HyFS and evaluates the performance of HyFS

    Communication Cost for Updating Linear Functions when Message Updates are Sparse: Connections to Maximally Recoverable Codes

    Full text link
    We consider a communication problem in which an update of the source message needs to be conveyed to one or more distant receivers that are interested in maintaining specific linear functions of the source message. The setting is one in which the updates are sparse in nature, and where neither the source nor the receiver(s) is aware of the exact {\em difference vector}, but only know the amount of sparsity that is present in the difference-vector. Under this setting, we are interested in devising linear encoding and decoding schemes that minimize the communication cost involved. We show that the optimal solution to this problem is closely related to the notion of maximally recoverable codes (MRCs), which were originally introduced in the context of coding for storage systems. In the context of storage, MRCs guarantee optimal erasure protection when the system is partially constrained to have local parity relations among the storage nodes. In our problem, we show that optimal solutions exist if and only if MRCs of certain kind (identified by the desired linear functions) exist. We consider point-to-point and broadcast versions of the problem, and identify connections to MRCs under both these settings. For the point-to-point setting, we show that our linear-encoder based achievable scheme is optimal even when non-linear encoding is permitted. The theory is illustrated in the context of updating erasure coded storage nodes. We present examples based on modern storage codes such as the minimum bandwidth regenerating codes.Comment: To Appear in IEEE Transactions on Information Theor

    SDSF : social-networking trust based distributed data storage and co-operative information fusion.

    Get PDF
    As of 2014, about 2.5 quintillion bytes of data are created each day, and 90% of the data in the world was created in the last two years alone. The storage of this data can be on external hard drives, on unused space in peer-to-peer (P2P) networks or using the more currently popular approach of storing in the Cloud. When the users store their data in the Cloud, the entire data is exposed to the administrators of the services who can view and possibly misuse the data. With the growing popularity and usage of Cloud storage services like Google Drive, Dropbox etc., the concerns of privacy and security are increasing. Searching for content or documents, from this distributed stored data, given the rate of data generation, is a big challenge. Information fusion is used to extract information based on the query of the user, and combine the data and learn useful information. This problem is challenging if the data sources are distributed and heterogeneous in nature where the trustworthiness of the documents may be varied. This thesis proposes two innovative solutions to resolve both of these problems. Firstly, to remedy the situation of security and privacy of stored data, we propose an innovative Social-based Distributed Data Storage and Trust based co-operative Information Fusion Framework (SDSF). The main objective is to create a framework that assists in providing a secure storage system while not overloading a single system using a P2P like approach. This framework allows the users to share storage resources among friends and acquaintances without compromising the security or privacy and enjoying all the benefits that the Cloud storage offers. The system fragments the data and encodes it to securely store it on the unused storage capacity of the data owner\u27s friends\u27 resources. The system thus gives a centralized control to the user over the selection of peers to store the data. Secondly, to retrieve the stored distributed data, the proposed system performs the fusion also from distributed sources. The technique uses several algorithms to ensure the correctness of the query that is used to retrieve and combine the data to improve the information fusion accuracy and efficiency for combining the heterogeneous, distributed and massive data on the Cloud for time critical operations. We demonstrate that the retrieved documents are genuine when the trust scores are also used while retrieving the data sources. The thesis makes several research contributions. First, we implement Social Storage using erasure coding. Erasure coding fragments the data, encodes it, and through introduction of redundancy resolves issues resulting from devices failures. Second, we exploit the inherent concept of trust that is embedded in social networks to determine the nodes and build a secure net-work where the fragmented data should be stored since the social network consists of a network of friends, family and acquaintances. The trust between the friends, and availability of the devices allows the user to make an informed choice about where the information should be stored using `k\u27 optimal paths. Thirdly, for the purpose of retrieval of this distributed stored data, we propose information fusion on distributed data using a combination of Enhanced N-grams (to ensure correctness of the query), Semantic Machine Learning (to extract the documents based on the context and not just bag of words and also considering the trust score) and Map Reduce (NSM) Algorithms. Lastly we evaluate the performance of distributed storage of SDSF using era- sure coding and identify the social storage providers based on trust and evaluate their trustworthiness. We also evaluate the performance of our information fusion algorithms in distributed storage systems. Thus, the system using SDSF framework, implements the beneficial features of P2P networks and Cloud storage while avoiding the pitfalls of these systems. The multi-layered encrypting ensures that all other users, including the system administrators cannot decode the stored data. The application of NSM algorithm improves the effectiveness of fusion since large number of genuine documents are retrieved for fusion
    corecore