3,404 research outputs found
SEARS: Space Efficient And Reliable Storage System in the Cloud
Today's cloud storage services must offer storage reliability and fast data
retrieval for large amount of data without sacrificing storage cost. We present
SEARS, a cloud-based storage system which integrates erasure coding and data
deduplication to support efficient and reliable data storage with fast user
response time. With proper association of data to storage server clusters,
SEARS provides flexible mixing of different configurations, suitable for
real-time and archival applications.
Our prototype implementation of SEARS over Amazon EC2 shows that it
outperforms existing storage systems in storage efficiency and file retrieval
time. For 3 MB files, SEARS delivers retrieval time of s compared to
s with existing systems.Comment: 4 pages, IEEE LCN 201
Extending DIRAC File Management with Erasure-Coding for efficient storage
The state of the art in Grid style data management is to achieve increased
resilience of data via multiple complete replicas of data files across multiple
storage endpoints. While this is effective, it is not the most space-efficient
approach to resilience, especially when the reliability of individual storage
endpoints is sufficiently high that only a few will be inactive at any point in
time. We report on work performed as part of GridPP\cite{GridPP}, extending the
Dirac File Catalogue and file management interface to allow the placement of
erasure-coded files: each file distributed as N identically-sized chunks of
data striped across a vector of storage endpoints, encoded such that any M
chunks can be lost and the original file can be reconstructed. The tools
developed are transparent to the user, and, as well as allowing up and
downloading of data to Grid storage, also provide the possibility of
parallelising access across all of the distributed chunks at once, improving
data transfer and IO performance. We expect this approach to be of most
interest to smaller VOs, who have tighter bounds on the storage available to
them, but larger (WLCG) VOs may be interested as their total data increases
during Run 2. We provide an analysis of the costs and benefits of the approach,
along with future development and implementation plans in this area. In
general, overheads for multiple file transfers provide the largest issue for
competitiveness of this approach at present.Comment: 21st International Conference on Computing for High Energy and
Nuclear Physics (CHEP2015
When Queueing Meets Coding: Optimal-Latency Data Retrieving Scheme in Storage Clouds
In this paper, we study the problem of reducing the delay of downloading data
from cloud storage systems by leveraging multiple parallel threads, assuming
that the data has been encoded and stored in the clouds using fixed rate
forward error correction (FEC) codes with parameters (n, k). That is, each file
is divided into k equal-sized chunks, which are then expanded into n chunks
such that any k chunks out of the n are sufficient to successfully restore the
original file. The model can be depicted as a multiple-server queue with
arrivals of data retrieving requests and a server corresponding to a thread.
However, this is not a typical queueing model because a server can terminate
its operation, depending on when other servers complete their service (due to
the redundancy that is spread across the threads). Hence, to the best of our
knowledge, the analysis of this queueing model remains quite uncharted.
Recent traces from Amazon S3 show that the time to retrieve a fixed size
chunk is random and can be approximated as a constant delay plus an i.i.d.
exponentially distributed random variable. For the tractability of the
theoretical analysis, we assume that the chunk downloading time is i.i.d.
exponentially distributed. Under this assumption, we show that any
work-conserving scheme is delay-optimal among all on-line scheduling schemes
when k = 1. When k > 1, we find that a simple greedy scheme, which allocates
all available threads to the head of line request, is delay optimal among all
on-line scheduling schemes. We also provide some numerical results that point
to the limitations of the exponential assumption, and suggest further research
directions.Comment: Original accepted by IEEE Infocom 2014, 9 pages. Some statements in
the Infocom paper are correcte
- …