530 research outputs found

    CRAID: Online RAID upgrades using dynamic hot data reorganization

    Get PDF
    Current algorithms used to upgrade RAID arrays typically require large amounts of data to be migrated, even those that move only the minimum amount of data required to keep a balanced data load. This paper presents CRAID, a self-optimizing RAID array that performs an online block reorganization of frequently used, long-term accessed data in order to reduce this migration even further. To achieve this objective, CRAID tracks frequently used, long-term data blocks and copies them to a dedicated partition spread across all the disks in the array. When new disks are added, CRAID only needs to extend this process to the new devices to redistribute this partition, thus greatly reducing the overhead of the upgrade process. In addition, the reorganized access patterns within this partition improve the array’s performance, amortizing the copy overhead and allowing CRAID to offer a performance competitive with traditional RAIDs. We describe CRAID’s motivation and design and we evaluate it by replaying seven real-world workloads including a file server, a web server and a user share. Our experiments show that CRAID can successfully detect hot data variations and begin using new disks as soon as they are added to the array. Also, the usage of a dedicated partition improves the sequentiality of relevant data access, which amortizes the cost of reorganizations. Finally, we prove that a full-HDD CRAID array with a small distributed partition (<1.28% per disk) can compete in performance with an ideally restriped RAID-5 and a hybrid RAID-5 with a small SSD cache.Peer ReviewedPostprint (published version

    Architectural Techniques to Enable Reliable and Scalable Memory Systems

    Get PDF
    High capacity and scalable memory systems play a vital role in enabling our desktops, smartphones, and pervasive technologies like Internet of Things (IoT). Unfortunately, memory systems are becoming increasingly prone to faults. This is because we rely on technology scaling to improve memory density, and at small feature sizes, memory cells tend to break easily. Today, memory reliability is seen as the key impediment towards using high-density devices, adopting new technologies, and even building the next Exascale supercomputer. To ensure even a bare-minimum level of reliability, present-day solutions tend to have high performance, power and area overheads. Ideally, we would like memory systems to remain robust, scalable, and implementable while keeping the overheads to a minimum. This dissertation describes how simple cross-layer architectural techniques can provide orders of magnitude higher reliability and enable seamless scalability for memory systems while incurring negligible overheads.Comment: PhD thesis, Georgia Institute of Technology (May 2017

    Elevating commodity storage with the SALSA host translation layer

    Full text link
    To satisfy increasing storage demands in both capacity and performance, industry has turned to multiple storage technologies, including Flash SSDs and SMR disks. These devices employ a translation layer that conceals the idiosyncrasies of their mediums and enables random access. Device translation layers are, however, inherently constrained: resources on the drive are scarce, they cannot be adapted to application requirements, and lack visibility across multiple devices. As a result, performance and durability of many storage devices is severely degraded. In this paper, we present SALSA: a translation layer that executes on the host and allows unmodified applications to better utilize commodity storage. SALSA supports a wide range of single- and multi-device optimizations and, because is implemented in software, can adapt to specific workloads. We describe SALSA's design, and demonstrate its significant benefits using microbenchmarks and case studies based on three applications: MySQL, the Swift object store, and a video server.Comment: Presented at 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS

    SDN Enabled Network Efficient Data Regeneration for Distributed Storage Systems

    Get PDF
    Distributed Storage Systems (DSSs) have seen increasing levels of deployment in data centers and in cloud storage networks. DSS provides efficient and cost-effective ways to store large amount of data. To ensure reliability and resilience to failures, DSS employ mirroring and coding schemes at the block and file level. While mirroring techniques provide an efficient way to recover lost data, they do not utilize disk space efficiently, resulting in large overheads in terms of data storage. Coding techniques on the other hand provide a better way to recover data as they reduce the amount of storage space required for data recovery purposes. However, the current recovery process for coded data is not efficient due to the need to transfer large amounts of data to regenerate the data lost as a result of a failure. This results in significant delays and excessive network traffic resulting in a major performance bottleneck. In this thesis, we propose a new architecture for efficient data regeneration in distribution storage systems. A key idea of our architecture is to enable network switches to perform network coding operations, i.e., combine packets they receive over incoming links and forward the resulting packet towards the destination and do this in a principled manner. Another key element of our framework is a transport-layer reverse multicast protocol that takes advantage of network coding to minimize the rebuild time required to transmit the data by allowing more efficient utilization of network bandwidth. The new architecture is supported using the principles of Software Defined Networking (SDN) and making extensions where required in a principled manner. To enable the switches to perform network coding operations, we propose an extension of packet processing pipeline in the dataplane of a software switch. Our testbed experiments show that the proposed architecture results in modest performance gains

    RAID-2: Design and implementation of a large scale disk array controller

    Get PDF
    We describe the implementation of a large scale disk array controller and subsystem incorporating over 100 high performance 3.5 inch disk drives. It is designed to provide 40 MB/s sustained performance and 40 GB capacity in three 19 inch racks. The array controller forms an integral part of a file server that attaches to a Gb/s local area network. The controller implements a high bandwidth interconnect between an interleaved memory, an XOR calculation engine, the network interface (HIPPI), and the disk interfaces (SCSI). The system is now functionally operational, and we are tuning its performance. We review the design decisions, history, and lessons learned from this three year university implementation effort to construct a truly large scale system assembly

    SPA: On-Line Availability Upgrades for Parity-based RAIDs through Supplementary Parity Augmentations

    Get PDF
    In this paper, we propose a simple but powerful on-line availability upgrade mechanism, Supplementary Parity Augmentations (SPA), to address the availability issue for parity-based RAID systems. The basic idea of SPA is to store and update the supplementary parity units on one or a few newly augmented spare disks for on-line RAID systems in the operational mode, thus achieving the goals of improving the reconstruction performance while tole-rating multiple disk failures and latent sector errors simultaneously. By applying the exclusive OR operations appropriately among supplementary parity, full parity and data units, SPA can reconstruct the data on the failed disks with a fraction of the original overhead that is proportional to the supplementary parity coverage, thus significantly reducing the overhead of data regeneration and decreasing recovery time in parity-based RAID systems. In particular, SPA has two supplementary-parity coverage orientations, SPA Vertical and SPA Diagonal, which cater to user’s different availability needs. The former, which calculates the supplementary parity of a fixed subset of the disks, can tolerate more disk failures and sector errors; whereas, the latter shifts the coverage of supplementary parity by one disk for each stripe to balance the workload and thus maximize the performance of reconstruction during recovery. The SPA with a single supplementary-parity disk can be viewed as a variant of but significantly different from the RAID5+0 architecture in that the former can easily and dynamically upgrade a RAID5 system to a RAID5+0-like system without any change to the data layout of the RAID5 system. Our extensive trace-driven simulation study shows that both SPA orientations can significantly improve the reconstruction performance of the RAID5 system while SPA Diagonal significantly improves the reconstruction performance of RAID5+0, at an acceptable performance overhead imposed in the operational mode. Moreover, our reliability analytical modeling and Sequential Monte-Carlo simulation demonstrate that both SPA orientations consistently more than double the MTTDL of the RAID5 system and improve the reliability of the RAID5+0 system noticeably

    Update-Efficiency and Local Repairability Limits for Capacity Approaching Codes

    Get PDF
    Motivated by distributed storage applications, we investigate the degree to which capacity achieving encodings can be efficiently updated when a single information bit changes, and the degree to which such encodings can be efficiently (i.e., locally) repaired when single encoded bit is lost. Specifically, we first develop conditions under which optimum error-correction and update-efficiency are possible, and establish that the number of encoded bits that must change in response to a change in a single information bit must scale logarithmically in the block-length of the code if we are to achieve any nontrivial rate with vanishing probability of error over the binary erasure or binary symmetric channels. Moreover, we show there exist capacity-achieving codes with this scaling. With respect to local repairability, we develop tight upper and lower bounds on the number of remaining encoded bits that are needed to recover a single lost bit of the encoding. In particular, we show that if the code-rate is ϵ\epsilon less than the capacity, then for optimal codes, the maximum number of codeword symbols required to recover one lost symbol must scale as log1/ϵ\log1/\epsilon. Several variations on---and extensions of---these results are also developed.Comment: Accepted to appear in JSA
    corecore