8 research outputs found

    Integrated prefetching and caching in single and parallel disk systems

    Get PDF

    On I/O Performance and Cost Efficiency of Cloud Storage: A Client\u27s Perspective

    Get PDF
    Cloud storage has gained increasing popularity in the past few years. In cloud storage, data are stored in the service provider’s data centers; users access data via the network and pay the fees based on the service usage. For such a new storage model, our prior wisdom and optimization schemes on conventional storage may not remain valid nor applicable to the emerging cloud storage. In this dissertation, we focus on understanding and optimizing the I/O performance and cost efficiency of cloud storage from a client’s perspective. We first conduct a comprehensive study to gain insight into the I/O performance behaviors of cloud storage from the client side. Through extensive experiments, we have obtained several critical findings and useful implications for system optimization. We then design a client cache framework, called Pacaca, to further improve end-to-end performance of cloud storage. Pacaca seamlessly integrates parallelized prefetching and cost-aware caching by utilizing the parallelism potential and object correlations of cloud storage. In addition to improving system performance, we have also made efforts to reduce the monetary cost of using cloud storage services by proposing a latency- and cost-aware client caching scheme, called GDS-LC, which can achieve two optimization goals for using cloud storage services: low access latency and low monetary cost. Our experimental results show that our proposed client-side solutions significantly outperform traditional methods. Our study contributes to inspiring the community to reconsider system optimization methods in the cloud environment, especially for the purpose of integrating cloud storage into the current storage stack as a primary storage layer

    Effective data parallel computing on multicore processors

    Get PDF
    The rise of chip multiprocessing or the integration of multiple general purpose processing cores on a single chip (multicores), has impacted all computing platforms including high performance, servers, desktops, mobile, and embedded processors. Programmers can no longer expect continued increases in software performance without developing parallel, memory hierarchy friendly software that can effectively exploit the chip level multiprocessing paradigm of multicores. The goal of this dissertation is to demonstrate a design process for data parallel problems that starts with a sequential algorithm and ends with a high performance implementation on a multicore platform. Our design process combines theoretical algorithm analysis with practical optimization techniques. Our target multicores are quad-core processors from Intel and the eight-SPE IBM Cell B.E. Target applications include Matrix Multiplications (MM), Finite Difference Time Domain (FDTD), LU Decomposition (LUD), and Power Flow Solver based on Gauss-Seidel (PFS-GS) algorithms. These applications are popular computation methods in science and engineering problems and are characterized by unit-stride (MM, LUD, and PFS-GS) or 2-point stencil (FDTD) memory access pattern. The main contributions of this dissertation include a cache- and space-efficient algorithm model, integrated data pre-fetching and caching strategies, and in-core optimization techniques. Our multicore efficient implementations of the above described applications outperform nai¨ve parallel implementations by at least 2x and scales well with problem size and with the number of processing cores

    Minimizing stall time in single and parallel disk systems

    No full text
    Abstract. We study integrated prefetching and caching in single and parallel disk systems. A recent approach used linear programming to solve the problem. We show that integrated prefetching and caching can also be formulated as a min-cost multicommodity flow problem and, exploiting special properties of our network, can be solved using combinatorial techniques. Moreover, for parallel disk systems, we develop improved approximation algorithms, trading performance guarantee for running time. If the number of disks is constant, we achieve a 2-approximation.
    corecore