825 research outputs found

    Improving Data Availability in Decentralized Storage Systems

    Get PDF
    PhD thesis in Information technologyPreserving knowledge for future generations has been a primary concern for humanity since the dawn of civilization. State-of-the-art methods have included stone carvings, papyrus scrolls, and paper books. With each advance in technology, it has become easier to record knowledge. In the current digital age, humanity may preserve enormous amounts of knowledge on hard drives with the click of a button. The aggregation of several hard drives into a computer forms the basis for a storage system. Traditionally, large storage systems have comprised many distinct computers operated by a single administrative entity. With the rise in popularity of blockchain and cryptocurrencies, a new type of storage system has emerged. This new type of storage system is fully decentralized and comprises a network of untrusted peers cooperating to act as a single storage system. During upload, files are split into chunks and distributed across a network of peers. These storage systems encode files using Merkle trees, a hierarchical data structure that provides integrity verification and lookup services. While decentralized storage systems are popular and have a user base in the millions, many technical aspects are still in their infancy. As such, they have yet to prove themselves viable alternatives to traditional centralized storage systems. In this thesis, we contribute to the technical aspects of decentralized storage systems by proposing novel techniques and protocols. We make significant contributions with the design of three practical protocols that each improve data availability in different ways. Our first contribution is Snarl and entangled Merkle trees. Entangled Merkle trees are resilient data structures that decrease the impact hierarchical dependencies have on data availability. Whenever a chunk loss is detected, Snarl uses the entangled Merkle trees to find parity chunks to repair the lost chunk. Our results show that by encoding data as an entangled Merkle tree and using Snarl’s repair algorithm, the storage utilization in current systems could be improved by over five times, with improved data availability. Second, we propose SNIPS, a protocol that efficiently synchronizes the data stored on peers to ensure that all peers have the same data. We designed a Proof of Storage-like construction using a Minimal Perfect Hash Function. Each peer uses the PoS-like construction to create a storage proof for those chunks it wants to synchronize. Peers exchange storage proofs and use them to efficiently determine which chunks they are missing. The evaluation shows that by using SNIPS, the amount of synchronization data can be reduced by three orders of magnitude in current systems. Lastly, in our third contribution, we propose SUP, a protocol that uses cryptographic proofs to check if a chunk is already stored in the network before doing wasteful uploads. We show that SUP may reduce the amount of data transferred by up to 94 % in current systems. The protocols may be deployed independently or in combination to create a decentralized storage system that is more robust to major outages. Each of the protocols has been implemented and evaluated on a large cluster of 1,000 peers

    Snarl : entangled merkle trees for improved file availability and storage utilization

    Get PDF
    In cryptographic decentralized storage systems, files are split into chunks and distributed across a network of peers. These storage systems encode files using Merkle trees, a hierarchical data structure that provides integrity verification and lookup services. A Merkle tree maps the chunks of a file to a single root whose hash value is the file's content-address. A major concern is that even minor network churn can result in chunks becoming irretrievable due to the hierarchical dependencies in the Merkle tree. For example, chunks may be available but can not be found if all peers storing the root fail. Thus, to reduce the impact of churn, a decentralized replication process typically stores each chunk at multiple peers. However, we observe that this process reduces the network's storage utilization and is vulnerable to cascading failures as some chunks are replicated 10X less than others. We propose Snarl, a novel storage component that uses a variation of alpha entanglement codes to add user-controlled redundancy to address these problems. Our contributions are summarized as follows: 1) the design of an entangled Merkle tree, a resilient data structure that reduces the impact of hierarchical dependencies, and 2) the Snarl prototype to improve file availability and storage utilization in a real-world storage network. We evaluate Snarl using various failure scenarios on a large cluster running the Ethereum Swarm network. Our evaluation shows that Snarl increases storage utilization by 5X in Swarm with improved file availability. File recovery is bandwidth-efficient and uses less than 2X chunks on average in scenarios with up to 50% of total chunk loss.publishedVersio

    Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments

    Full text link
    Data centres that use consumer-grade disks drives and distributed peer-to-peer systems are unreliable environments to archive data without enough redundancy. Most redundancy schemes are not completely effective for providing high availability, durability and integrity in the long-term. We propose alpha entanglement codes, a mechanism that creates a virtual layer of highly interconnected storage devices to propagate redundant information across a large scale storage system. Our motivation is to design flexible and practical erasure codes with high fault-tolerance to improve data durability and availability even in catastrophic scenarios. By flexible and practical, we mean code settings that can be adapted to future requirements and practical implementations with reasonable trade-offs between security, resource usage and performance. The codes have three parameters. Alpha increases storage overhead linearly but increases the possible paths to recover data exponentially. Two other parameters increase fault-tolerance even further without the need of additional storage. As a result, an entangled storage system can provide high availability, durability and offer additional integrity: it is more difficult to modify data undetectably. We evaluate how several redundancy schemes perform in unreliable environments and show that alpha entanglement codes are flexible and practical codes. Remarkably, they excel at code locality, hence, they reduce repair costs and become less dependent on storage locations with poor availability. Our solution outperforms Reed-Solomon codes in many disaster recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN

    On the effectiveness of recoding-based repair in network coded distributed storage

    Get PDF

    Network Coding for Distributed Cloud, Fog and Data Center Storage

    Get PDF

    Adaptive Redundancy Management for Durable P2P Backup

    Get PDF
    We design and analyze the performance of a redundancy management mechanism for Peer-to-Peer backup applications. Armed with the realization that a backup system has peculiar requirements -- namely, data is read over the network only during restore processes caused by data loss -- redundancy management targets data durability rather than attempting to make each piece of information availabile at any time. In our approach each peer determines, in an on-line manner, an amount of redundancy sufficient to counter the effects of peer deaths, while preserving acceptable data restore times. Our experiments, based on trace-driven simulations, indicate that our mechanism can reduce the redundancy by a factor between two and three with respect to redundancy policies aiming for data availability. These results imply an according increase in storage capacity and decrease in time to complete backups, at the expense of longer times required to restore data. We believe this is a very reasonable price to pay, given the nature of the application. We complete our work with a discussion on practical issues, and their solutions, related to which encoding technique is more suitable to support our scheme
    • …
    corecore