13,197 research outputs found
When Do Redundant Requests Reduce Latency ?
Several systems possess the flexibility to serve requests in more than one
way. For instance, a distributed storage system storing multiple replicas of
the data can serve a request from any of the multiple servers that store the
requested data, or a computational task may be performed in a compute-cluster
by any one of multiple processors. In such systems, the latency of serving the
requests may potentially be reduced by sending "redundant requests": a request
may be sent to more servers than needed, and it is deemed served when the
requisite number of servers complete service. Such a mechanism trades off the
possibility of faster execution of at least one copy of the request with the
increase in the delay due to an increased load on the system. Due to this
tradeoff, it is unclear when redundant requests may actually help. Several
recent works empirically evaluate the latency performance of redundant requests
in diverse settings.
This work aims at an analytical study of the latency performance of redundant
requests, with the primary goals of characterizing under what scenarios sending
redundant requests will help (and under what scenarios they will not help), as
well as designing optimal redundant-requesting policies. We first present a
model that captures the key features of such systems. We show that when service
times are i.i.d. memoryless or "heavier", and when the additional copies of
already-completed jobs can be removed instantly, redundant requests reduce the
average latency. On the other hand, when service times are "lighter" or when
service times are memoryless and removal of jobs is not instantaneous, then not
having any redundancy in the requests is optimal under high loads. Our results
hold for arbitrary arrival processes.Comment: Extended version of paper presented at Allerton Conference 201
When Queueing Meets Coding: Optimal-Latency Data Retrieving Scheme in Storage Clouds
In this paper, we study the problem of reducing the delay of downloading data
from cloud storage systems by leveraging multiple parallel threads, assuming
that the data has been encoded and stored in the clouds using fixed rate
forward error correction (FEC) codes with parameters (n, k). That is, each file
is divided into k equal-sized chunks, which are then expanded into n chunks
such that any k chunks out of the n are sufficient to successfully restore the
original file. The model can be depicted as a multiple-server queue with
arrivals of data retrieving requests and a server corresponding to a thread.
However, this is not a typical queueing model because a server can terminate
its operation, depending on when other servers complete their service (due to
the redundancy that is spread across the threads). Hence, to the best of our
knowledge, the analysis of this queueing model remains quite uncharted.
Recent traces from Amazon S3 show that the time to retrieve a fixed size
chunk is random and can be approximated as a constant delay plus an i.i.d.
exponentially distributed random variable. For the tractability of the
theoretical analysis, we assume that the chunk downloading time is i.i.d.
exponentially distributed. Under this assumption, we show that any
work-conserving scheme is delay-optimal among all on-line scheduling schemes
when k = 1. When k > 1, we find that a simple greedy scheme, which allocates
all available threads to the head of line request, is delay optimal among all
on-line scheduling schemes. We also provide some numerical results that point
to the limitations of the exponential assumption, and suggest further research
directions.Comment: Original accepted by IEEE Infocom 2014, 9 pages. Some statements in
the Infocom paper are correcte
- …