11 research outputs found

    Node Repair for Distributed Storage Systems over Fading Channels

    Full text link
    Distributed storage systems and associated storage codes can efficiently store a large amount of data while ensuring that data is retrievable in case of node failure. The study of such systems, particularly the design of storage codes over finite fields, assumes that the physical channel through which the nodes communicate is error-free. This is not always the case, for example, in a wireless storage system. We study the probability that a subpacket is repaired incorrectly during node repair in a distributed storage system, in which the nodes communicate over an AWGN or Rayleigh fading channels. The asymptotic probability (as SNR increases) that a node is repaired incorrectly is shown to be completely determined by the repair locality of the DSS and the symbol error rate of the wireless channel. Lastly, we propose some design criteria for physical layer coding in this scenario, and use it to compute optimally rotated QAM constellations for use in wireless distributed storage systems.Comment: To appear in ISITA 201

    A Repair Framework for Scalar MDS Codes

    Full text link
    Several works have developed vector-linear maximum-distance separable (MDS) storage codes that min- imize the total communication cost required to repair a single coded symbol after an erasure, referred to as repair bandwidth (BW). Vector codes allow communicating fewer sub-symbols per node, instead of the entire content. This allows non trivial savings in repair BW. In sharp contrast, classic codes, like Reed- Solomon (RS), used in current storage systems, are deemed to suffer from naive repair, i.e. downloading the entire stored message to repair one failed node. This mainly happens because they are scalar-linear. In this work, we present a simple framework that treats scalar codes as vector-linear. In some cases, this allows significant savings in repair BW. We show that vectorized scalar codes exhibit properties that simplify the design of repair schemes. Our framework can be seen as a finite field analogue of real interference alignment. Using our simplified framework, we design a scheme that we call clique-repair which provably identifies the best linear repair strategy for any scalar 2-parity MDS code, under some conditions on the sub-field chosen for vectorization. We specify optimal repair schemes for specific (5,3)- and (6,4)-Reed- Solomon (RS) codes. Further, we present a repair strategy for the RS code currently deployed in the Facebook Analytics Hadoop cluster that leads to 20% of repair BW savings over naive repair which is the repair scheme currently used for this code.Comment: 10 Pages; accepted to IEEE JSAC -Distributed Storage 201

    Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments

    Full text link
    Data centres that use consumer-grade disks drives and distributed peer-to-peer systems are unreliable environments to archive data without enough redundancy. Most redundancy schemes are not completely effective for providing high availability, durability and integrity in the long-term. We propose alpha entanglement codes, a mechanism that creates a virtual layer of highly interconnected storage devices to propagate redundant information across a large scale storage system. Our motivation is to design flexible and practical erasure codes with high fault-tolerance to improve data durability and availability even in catastrophic scenarios. By flexible and practical, we mean code settings that can be adapted to future requirements and practical implementations with reasonable trade-offs between security, resource usage and performance. The codes have three parameters. Alpha increases storage overhead linearly but increases the possible paths to recover data exponentially. Two other parameters increase fault-tolerance even further without the need of additional storage. As a result, an entangled storage system can provide high availability, durability and offer additional integrity: it is more difficult to modify data undetectably. We evaluate how several redundancy schemes perform in unreliable environments and show that alpha entanglement codes are flexible and practical codes. Remarkably, they excel at code locality, hence, they reduce repair costs and become less dependent on storage locations with poor availability. Our solution outperforms Reed-Solomon codes in many disaster recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN

    Sparsity Exploiting Erasure Coding for Resilient Storage and Efficient I/O Access in Delta based Versioning Systems

    Full text link
    In this paper we study the problem of storing reliably an archive of versioned data. Specifically, we focus on systems where the differences (deltas) between subsequent versions rather than the whole objects are stored - a typical model for storing versioned data. For reliability, we propose erasure encoding techniques that exploit the sparsity of information in the deltas while storing them reliably in a distributed back-end storage system, resulting in improved I/O read performance to retrieve the whole versioned archive. Along with the basic techniques, we propose a few optimization heuristics, and evaluate the techniques' efficacy analytically and with numerical simulations.Comment: 10 pages, 8 figure

    Secure Cooperative Regenerating Codes for Distributed Storage Systems

    Full text link
    Regenerating codes enable trading off repair bandwidth for storage in distributed storage systems (DSS). Due to their distributed nature, these systems are intrinsically susceptible to attacks, and they may also be subject to multiple simultaneous node failures. Cooperative regenerating codes allow bandwidth efficient repair of multiple simultaneous node failures. This paper analyzes storage systems that employ cooperative regenerating codes that are robust to (passive) eavesdroppers. The analysis is divided into two parts, studying both minimum bandwidth and minimum storage cooperative regenerating scenarios. First, the secrecy capacity for minimum bandwidth cooperative regenerating codes is characterized. Second, for minimum storage cooperative regenerating codes, a secure file size upper bound and achievability results are provided. These results establish the secrecy capacity for the minimum storage scenario for certain special cases. In all scenarios, the achievability results correspond to exact repair, and secure file size upper bounds are obtained using min-cut analyses over a suitable secrecy graph representation of DSS. The main achievability argument is based on an appropriate pre-coding of the data to eliminate the information leakage to the eavesdropper

    Hadoop : processament distribuït de gran volum de dades en el núvol d'Apache

    Get PDF
    Avui en dia es genera un volum increïble de dades de diferents tipus i que provenen de multitud d'orígens. Els sistemes d'emmagatzematge i processament distribuït són els elements tecnològics que fan possible capturar aquest allau de dades i permeten donar-ne un valor a través d'anàlisis diversos. Hadoop, que integra un sistema d'emmagatzematge i processament distribuïts, s'ha convertit en l'estàndard de-facto per a aplicacions que necessiten una gran capacitat d'emmagatzematge, inclús de l'ordre de desenes de PBs. En aquest treball farem un estudi de Hadoop, analitzarem l'eficiència del seu sistema de durabilitat i en proposarem una alternativa.Hoy en día se genera un volumen increíble de datos de diferentes tipos y que proceden de multitud de orígenes. Los sistemas de almacenamiento y procesado distribuidos son los elementos tecnológicos que hacen posible capturar esta avalancha de datos y permiten extraer un valor de ellos a través de diferentes tipos de análisis. Hadoop, que integra un sistema de almacenaje y procesado distribuidos, se ha convertido en el estándar de-facto para aplicaciones que necesitan una gran capacidad de almacenaje, incluso del orden de decenas de PBs. En el presente trabajo realizaremos un estudio de Hadoop, analizaremos la eficiencia de su sistema de durabilidad, y propondremos una alternativa.Nowadays, the amount of data generated, which comes from various sources, is overwhelming. Distributed storage systems are the technological solution that make possible to capture this avalanche of data and to obtain a value from it. Hadoop, which offers a distributed storage and processing systems, has become the de-facto standard for applications that seek for a big storage capacity, even in the order of tens of PBs. In the present work, we'll study Hadoop, we'll analyze its durability system's efficiency and we will propose an alternative to it
    corecore