160 research outputs found

    Achieving High Reliability and Efficiency in Maintaining Large-Scale Storage Systems through Optimal Resource Provisioning and Data Placement

    Get PDF
    With the explosive increase in the amount of data being generated by various applications, large-scale distributed and parallel storage systems have become common data storage solutions and been widely deployed and utilized in both industry and academia. While these high performance storage systems significantly accelerate the data storage and retrieval, they also bring some critical issues in system maintenance and management. In this dissertation, I propose three methodologies to address three of these critical issues. First, I develop an optimal resource management and spare provisioning model to minimize the impact brought by component failures and ensure a highly operational experience in maintaining large-scale storage systems. Second, in order to cost-effectively integrate solid-state drives (SSD) into large-scale storage systems, I design a holistic algorithm which can adaptively predict the popularity of data objects by leveraging temporal locality in their access pattern and adjust their placement among solid-state drives and regular hard disk drives so that the data access throughput as well as the storage space efficiency of the large-scale heterogeneous storage systems can be improved. Finally, I propose a new checkpoint placement optimization model which can maximize the computation efficiency of large-scale scientific applications while guarantee the endurance requirements of the SSD-based burst buffer in high performance hierarchical storage systems. All these models and algorithms are validated through extensive evaluation using data collected from deployed large-scale storage systems and the evaluation results demonstrate our models and algorithms can significantly improve the reliability and efficiency of large-scale distributed and parallel storage systems

    Bridging the Gap between Application and Solid-State-Drives

    Get PDF
    Data storage is one of the important and often critical parts of the computing system in terms of performance, cost, reliability, and energy. Numerous new memory technologies, such as NAND flash, phase change memory (PCM), magnetic RAM (STT-RAM) and Memristor, have emerged recently. Many of them have already entered the production system. Traditional storage optimization and caching algorithms are far from optimal because storage I/Os do not show simple locality. To provide optimal storage we need accurate predictions of I/O behavior. However, the workloads are increasingly dynamic and diverse, making the long and short time I/O prediction challenge. Because of the evolution of the storage technologies and the increasing diversity of workloads, the storage software is becoming more and more complex. For example, Flash Translation Layer (FTL) is added for NAND-flash based Solid State Disks (NAND-SSDs). However, it introduces overhead such as address translation delay and garbage collection costs. There are many recent studies aim to address the overhead. Unfortunately, there is no one-size-fits-all solution due to the variety of workloads. Despite rapidly evolving in storage technologies, the increasing heterogeneity and diversity in machines and workloads coupled with the continued data explosion exacerbate the gap between computing and storage speeds. In this dissertation, we improve the data storage performance from both top-down and bottom-up approach. First, we will investigate exposing the storage level parallelism so that applications can avoid I/O contentions and workloads skew when scheduling the jobs. Second, we will study how architecture aware task scheduling can improve the performance of the application when PCM based NVRAM are equipped. Third, we will develop an I/O correlation aware flash translation layer for NAND-flash based Solid State Disks. Fourth, we will build a DRAM-based correlation aware FTL emulator and study the performance in various filesystems

    Exploring Scheduling for On-demand File Systems and Data Management within HPC Environments

    Get PDF

    Exploring Scheduling for On-demand File Systems and Data Management within HPC Environments

    Get PDF

    Workload Interleaving with Performance Guarantees in Data Centers

    Get PDF
    In the era of global, large scale data centers residing in clouds, many applications and users share the same pool of resources for the purposes of reducing energy and operating costs, and of improving availability and reliability. Along with the above benefits, resource sharing also introduces performance challenges: when multiple workloads access the same resources concurrently, contention may occur and introduce delays in the performance of individual workloads. Providing performance isolation to individual workloads needs effective management methodologies. The challenges of deriving effective management methodologies lie in finding accurate, robust, compact metrics and models to drive algorithms that can meet different performance objectives while achieving efficient utilization of resources. This dissertation proposes a set of methodologies aiming at solving the challenging performance isolation problem in workload interleaving in data centers, focusing on both storage components and computing components. at the storage node level, we focus on methodologies for better interleaving user traffic with background workloads, such as tasks for improving reliability, availability, and power savings. More specifically, a scheduling policy for background workload based on the statistical characteristics of the system busy periods and a methodology that quantitatively estimates the performance impact of power savings are developed. at the storage cluster level, we consider methodologies on how to efficiently conduct work consolidation and schedule asynchronous updates without violating user performance targets. More specifically, we develop a framework that can estimate beforehand the benefits and overheads of each option in order to automate the process of reaching intelligent consolidation decisions while achieving faster eventual consistency. at the computing node level, we focus on improving workload interleaving at off-the-shelf servers as they are the basic building blocks of large-scale data centers. We develop priority scheduling middleware that employs different policies to schedule background tasks based on the instantaneous resource requirements of the high priority applications running on the server node. Finally, at the computing cluster level, we investigate popular computing frameworks for large-scale data intensive distributed processing, such as MapReduce and its Hadoop implementation. We develop a new Hadoop scheduler called DyScale to exploit capabilities offered by heterogeneous cores in order to achieve a variety of performance objectives

    Study On Endurance Of Flash Memory Ssds

    Get PDF
    Flash memory promises to revolutionize storage systems because of its massive performance gains, ruggedness, large decrease in power usage and physical space requirements, but it is not a direct replacement for magnetic hard disks. Flash memory possesses fundamentally different characteristics and in order to fully utilize the positive aspects of flash memory, we must engineer around its unique limitations. The primary limitations are lack of in-place updates, the asymmetry between the sizes of the write and erase operations, and the limited endurance of flash memory cells. This leads to the need for efficient methods for block cleaning, combating write amplification and performing wear leveling. These are fundamental attributes of flash memory and will always need to be understood and efficiently managed to produce an efficient and high performance storage system. Our goal in this work is to provide analysis and algorithms for efficiently managing data storage for endurance in flash memory. We present update codes, a class of floating codes, which encodes data updates as flash memory cell increments that results in reduced block erases and longer lifespan of flash memory, and provides a new algorithm for constructing optimal floating codes. We also analyze the theoretically possible limits of write amplification reduction and minimization by using offline workloads. We give an estimation of the minimal write amplification by a workload decomposition algorithm and find that write amplification can be pushed to zero with relatively low over-provisioning. Additionally, we give simple, efficient and practical algorithms that are effective in reducing write amplification and performing wear leveling. Finally, we present a quantitative model of wear levels in flash memory by constructing a difference equation that gives erase counts of a block with workload, wear leveling strategy and SSD configuration as parameters

    Simulation-Aided Performance Evaluation of Input/Output Optimizations for Distributed Systems

    Get PDF
    The performance of parallel cluster file systems suffers from many clients executing a large number of operations in parallel, because the I/O subsystem can be easily overwhelmed by the sheer amount of incoming I/O operations. This, in turn, can slow down the whole distributed system. Many optimizations exist that try to alleviate this problem. Client-side optimizations do preprocessing to minimize the amount of work the file servers have to do. Server-side optimizations use server-internal knowledge to improve performance. The PIOsimHD framework contains components to simulate, trace and visualize applications. It is used as a testbed to implement optimizations that could later be implemented in real-life projects. The main focus of this thesis lies on comparing existing client-side optimizations and newly implemented server-side optimizations like Server-Directed I/O, which provides server-side optimizations for both read and write operations. It chooses the order of I/O operations and tries to aggregate as many operations as possible to decrease the load on the I/O subsystem and improve overall performance. The Interleaved Two-Phase protocol is a modification of ROMIO's Two-Phase protocol, which only accesses contiguous file regions. HDSunshot is used to visualize and analyze some of the results. It is also used to evaluate different optimization techniques by analyzing the resulting traces. The results show that client-side optimizations do not necessarily beat server-side optimizations in terms of performance, but suggest that even simple server-side optimizations are good enough for many use cases. Integrating such optimizations into parallel cluster file systems could alleviate the need for sophisticated client-side optimizations. Due to their additional knowledge of internal workflows server-side optimizations may be better suited to provide high performance in general
    • …
    corecore