3,184 research outputs found
When Do Redundant Requests Reduce Latency ?
Several systems possess the flexibility to serve requests in more than one
way. For instance, a distributed storage system storing multiple replicas of
the data can serve a request from any of the multiple servers that store the
requested data, or a computational task may be performed in a compute-cluster
by any one of multiple processors. In such systems, the latency of serving the
requests may potentially be reduced by sending "redundant requests": a request
may be sent to more servers than needed, and it is deemed served when the
requisite number of servers complete service. Such a mechanism trades off the
possibility of faster execution of at least one copy of the request with the
increase in the delay due to an increased load on the system. Due to this
tradeoff, it is unclear when redundant requests may actually help. Several
recent works empirically evaluate the latency performance of redundant requests
in diverse settings.
This work aims at an analytical study of the latency performance of redundant
requests, with the primary goals of characterizing under what scenarios sending
redundant requests will help (and under what scenarios they will not help), as
well as designing optimal redundant-requesting policies. We first present a
model that captures the key features of such systems. We show that when service
times are i.i.d. memoryless or "heavier", and when the additional copies of
already-completed jobs can be removed instantly, redundant requests reduce the
average latency. On the other hand, when service times are "lighter" or when
service times are memoryless and removal of jobs is not instantaneous, then not
having any redundancy in the requests is optimal under high loads. Our results
hold for arbitrary arrival processes.Comment: Extended version of paper presented at Allerton Conference 201
Product-form solutions for integrated services packet networks and cloud computing systems
We iteratively derive the product-form solutions of stationary distributions
of priority multiclass queueing networks with multi-sever stations. The
networks are Markovian with exponential interarrival and service time
distributions. These solutions can be used to conduct performance analysis or
as comparison criteria for approximation and simulation studies of large scale
networks with multi-processor shared-memory switches and cloud computing
systems with parallel-server stations. Numerical comparisons with existing
Brownian approximating model are provided to indicate the effectiveness of our
algorithm.Comment: 26 pages, 3 figures, short conference version is reported at MICAI
200
- …