955 research outputs found
Randomized Assignment of Jobs to Servers in Heterogeneous Clusters of Shared Servers for Low Delay
We consider the job assignment problem in a multi-server system consisting of
parallel processor sharing servers, categorized into ()
different types according to their processing capacity or speed. Jobs of random
sizes arrive at the system according to a Poisson process with rate . Upon each arrival, a small number of servers from each type is
sampled uniformly at random. The job is then assigned to one of the sampled
servers based on a selection rule. We propose two schemes, each corresponding
to a specific selection rule that aims at reducing the mean sojourn time of
jobs in the system.
We first show that both methods achieve the maximal stability region. We then
analyze the system operating under the proposed schemes as which
corresponds to the mean field. Our results show that asymptotic independence
among servers holds even when is finite and exchangeability holds only
within servers of the same type. We further establish the existence and
uniqueness of stationary solution of the mean field and show that the tail
distribution of server occupancy decays doubly exponentially for each server
type. When the estimates of arrival rates are not available, the proposed
schemes offer simpler alternatives to achieving lower mean sojourn time of
jobs, as shown by our numerical studies
Delay versus Stickiness Violation Trade-offs for Load Balancing in Large-Scale Data Centers
Most load balancing techniques implemented in current data centers tend to
rely on a mapping from packets to server IP addresses through a hash value
calculated from the flow five-tuple. The hash calculation allows extremely fast
packet forwarding and provides flow `stickiness', meaning that all packets
belonging to the same flow get dispatched to the same server. Unfortunately,
such static hashing may not yield an optimal degree of load balancing, e.g.,
due to variations in server processing speeds or traffic patterns. On the other
hand, dynamic schemes, such as the Join-the-Shortest-Queue (JSQ) scheme,
provide a natural way to mitigate load imbalances, but at the expense of
stickiness violation.
In the present paper we examine the fundamental trade-off between stickiness
violation and packet-level latency performance in large-scale data centers. We
establish that stringent flow stickiness carries a significant performance
penalty in terms of packet-level delay. Moreover, relaxing the stickiness
requirement by a minuscule amount is highly effective in clipping the tail of
the latency distribution. We further propose a bin-based load balancing scheme
that achieves a good balance among scalability, stickiness violation and
packet-level delay performance. Extensive simulation experiments corroborate
the analytical results and validate the effectiveness of the bin-based load
balancing scheme
Load Balancing in Large-Scale Systems with Multiple Dispatchers
Load balancing algorithms play a crucial role in delivering robust
application performance in data centers and cloud networks. Recently, strong
interest has emerged in Join-the-Idle-Queue (JIQ) algorithms, which rely on
tokens issued by idle servers in dispatching tasks and outperform power-of-
policies. Specifically, JIQ strategies involve minimal information exchange,
and yet achieve zero blocking and wait in the many-server limit. The latter
property prevails in a multiple-dispatcher scenario when the loads are strictly
equal among dispatchers. For various reasons it is not uncommon however for
skewed load patterns to occur. We leverage product-form representations and
fluid limits to establish that the blocking and wait then no longer vanish,
even for arbitrarily low overall load. Remarkably, it is the least-loaded
dispatcher that throttles tokens and leaves idle servers stranded, thus acting
as bottleneck.
Motivated by the above issues, we introduce two enhancements of the ordinary
JIQ scheme where tokens are either distributed non-uniformly or occasionally
exchanged among the various dispatchers. We prove that these extensions can
achieve zero blocking and wait in the many-server limit, for any subcritical
overall load and arbitrarily skewed load profiles. Extensive simulation
experiments demonstrate that the asymptotic results are highly accurate, even
for moderately sized systems
Adaptive Dispatching of Tasks in the Cloud
The increasingly wide application of Cloud Computing enables the
consolidation of tens of thousands of applications in shared infrastructures.
Thus, meeting the quality of service requirements of so many diverse
applications in such shared resource environments has become a real challenge,
especially since the characteristics and workload of applications differ widely
and may change over time. This paper presents an experimental system that can
exploit a variety of online quality of service aware adaptive task allocation
schemes, and three such schemes are designed and compared. These are a
measurement driven algorithm that uses reinforcement learning, secondly a
"sensible" allocation algorithm that assigns jobs to sub-systems that are
observed to provide a lower response time, and then an algorithm that splits
the job arrival stream into sub-streams at rates computed from the hosts'
processing capabilities. All of these schemes are compared via measurements
among themselves and with a simple round-robin scheduler, on two experimental
test-beds with homogeneous and heterogeneous hosts having different processing
capacities.Comment: 10 pages, 9 figure
Hyper-Scalable JSQ with Sparse Feedback
Load balancing algorithms play a vital role in enhancing performance in data
centers and cloud networks. Due to the massive size of these systems,
scalability challenges, and especially the communication overhead associated
with load balancing mechanisms, have emerged as major concerns. Motivated by
these issues, we introduce and analyze a novel class of load balancing schemes
where the various servers provide occasional queue updates to guide the load
assignment.
We show that the proposed schemes strongly outperform JSQ() strategies
with comparable communication overhead per job, and can achieve a vanishing
waiting time in the many-server limit with just one message per job, just like
the popular JIQ scheme. The proposed schemes are particularly geared however
towards the sparse feedback regime with less than one message per job, where
they outperform corresponding sparsified JIQ versions.
We investigate fluid limits for synchronous updates as well as asynchronous
exponential update intervals. The fixed point of the fluid limit is identified
in the latter case, and used to derive the queue length distribution. We also
demonstrate that in the ultra-low feedback regime the mean stationary waiting
time tends to a constant in the synchronous case, but grows without bound in
the asynchronous case
TailX: Scheduling Heterogeneous Multiget Queries to Improve Tail Latencies in Key-Value Stores
International audienceUsers of interactive services such as e-commerce platforms have high expectations for the performance and responsiveness of these services. Tail latency, denoting the worst service times, contributes greatly to user dissatisfaction and should be minimized. Maintaining low tail latency for interactive services is challenging because a request is not complete until all its operations are completed. The challenge is to identify bottleneck operations and schedule them on uncoordinated backend servers with minimal overhead, when the duration of these operations are heterogeneous and unpredictable. In this paper, we focus on improving the latency of multiget operations in cloud data stores. We present TailX, a task-aware multiget scheduling algorithm that improves tail latencies under heterogeneous workloads. TailX schedules operations according to an estimation of the size of the corresponding data, and allows itself to procrastinate some operations to give way to higher priority ones. We implement TailX in Cassandra, a widely used key-value store. The result is an improved overall performance of the cloud data stores for a wide variety of heterogeneous workloads. Specifically, our experiments under heterogeneous YCSB workloads show that TailX outperforms state-of-the-art solutions and reduces tail latencies by up to 70% and median latencies by up to 75%
Optimal Hyper-Scalable Load Balancing with a Strict Queue Limit
Load balancing plays a critical role in efficiently dispatching jobs in
parallel-server systems such as cloud networks and data centers. A fundamental
challenge in the design of load balancing algorithms is to achieve an optimal
trade-off between delay performance and implementation overhead (e.g.
communication or memory usage). This trade-off has primarily been studied so
far from the angle of the amount of overhead required to achieve asymptotically
optimal performance, particularly vanishing delay in large-scale systems. In
contrast, in the present paper, we focus on an arbitrarily sparse communication
budget, possibly well below the minimum requirement for vanishing delay,
referred to as the hyper-scalable operating region. Furthermore, jobs may only
be admitted when a specific limit on the queue position of the job can be
guaranteed.
The centerpiece of our analysis is a universal upper bound for the achievable
throughput of any dispatcher-driven algorithm for a given communication budget
and queue limit. We also propose a specific hyper-scalable scheme which can
operate at any given message rate and enforce any given queue limit, while
allowing the server states to be captured via a closed product-form network, in
which servers act as customers traversing various nodes. The product-form
distribution is leveraged to prove that the bound is tight and that the
proposed hyper-scalable scheme is throughput-optimal in a many-server regime
given the communication and queue limit constraints. Extensive simulation
experiments are conducted to illustrate the results
- …