3,404 research outputs found

    SEARS: Space Efficient And Reliable Storage System in the Cloud

    Full text link
    Today's cloud storage services must offer storage reliability and fast data retrieval for large amount of data without sacrificing storage cost. We present SEARS, a cloud-based storage system which integrates erasure coding and data deduplication to support efficient and reliable data storage with fast user response time. With proper association of data to storage server clusters, SEARS provides flexible mixing of different configurations, suitable for real-time and archival applications. Our prototype implementation of SEARS over Amazon EC2 shows that it outperforms existing storage systems in storage efficiency and file retrieval time. For 3 MB files, SEARS delivers retrieval time of 2.52.5 s compared to 77 s with existing systems.Comment: 4 pages, IEEE LCN 201

    Extending DIRAC File Management with Erasure-Coding for efficient storage

    Get PDF
    The state of the art in Grid style data management is to achieve increased resilience of data via multiple complete replicas of data files across multiple storage endpoints. While this is effective, it is not the most space-efficient approach to resilience, especially when the reliability of individual storage endpoints is sufficiently high that only a few will be inactive at any point in time. We report on work performed as part of GridPP\cite{GridPP}, extending the Dirac File Catalogue and file management interface to allow the placement of erasure-coded files: each file distributed as N identically-sized chunks of data striped across a vector of storage endpoints, encoded such that any M chunks can be lost and the original file can be reconstructed. The tools developed are transparent to the user, and, as well as allowing up and downloading of data to Grid storage, also provide the possibility of parallelising access across all of the distributed chunks at once, improving data transfer and IO performance. We expect this approach to be of most interest to smaller VOs, who have tighter bounds on the storage available to them, but larger (WLCG) VOs may be interested as their total data increases during Run 2. We provide an analysis of the costs and benefits of the approach, along with future development and implementation plans in this area. In general, overheads for multiple file transfers provide the largest issue for competitiveness of this approach at present.Comment: 21st International Conference on Computing for High Energy and Nuclear Physics (CHEP2015

    When Queueing Meets Coding: Optimal-Latency Data Retrieving Scheme in Storage Clouds

    Full text link
    In this paper, we study the problem of reducing the delay of downloading data from cloud storage systems by leveraging multiple parallel threads, assuming that the data has been encoded and stored in the clouds using fixed rate forward error correction (FEC) codes with parameters (n, k). That is, each file is divided into k equal-sized chunks, which are then expanded into n chunks such that any k chunks out of the n are sufficient to successfully restore the original file. The model can be depicted as a multiple-server queue with arrivals of data retrieving requests and a server corresponding to a thread. However, this is not a typical queueing model because a server can terminate its operation, depending on when other servers complete their service (due to the redundancy that is spread across the threads). Hence, to the best of our knowledge, the analysis of this queueing model remains quite uncharted. Recent traces from Amazon S3 show that the time to retrieve a fixed size chunk is random and can be approximated as a constant delay plus an i.i.d. exponentially distributed random variable. For the tractability of the theoretical analysis, we assume that the chunk downloading time is i.i.d. exponentially distributed. Under this assumption, we show that any work-conserving scheme is delay-optimal among all on-line scheduling schemes when k = 1. When k > 1, we find that a simple greedy scheme, which allocates all available threads to the head of line request, is delay optimal among all on-line scheduling schemes. We also provide some numerical results that point to the limitations of the exponential assumption, and suggest further research directions.Comment: Original accepted by IEEE Infocom 2014, 9 pages. Some statements in the Infocom paper are correcte
    • …
    corecore