15,825 research outputs found
An Algorithm for Dynamic Load Balancing of Synchronous Monte Carlo Simulations on Multiprocessor Systems
We describe an algorithm for dynamic load balancing of geometrically
parallelized synchronous Monte Carlo simulations of physical models. This
algorithm is designed for a (heterogeneous) multiprocessor system of the MIMD
type with distributed memory. The algorithm is based on a dynamic partitioning
of the domain of the algorithm, taking into account the actual processor
resources of the various processors of the multiprocessor system.Comment: 12 pages, uuencoded figures included, 75.93.0
Gunrock: A High-Performance Graph Processing Library on the GPU
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs have been two
significant challenges for developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We evaluate Gunrock on five key graph
primitives and show that Gunrock has on average at least an order of magnitude
speedup over Boost and PowerGraph, comparable performance to the fastest GPU
hardwired primitives, and better performance than any other GPU high-level
graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the
previous version v5
Hyper-Scalable JSQ with Sparse Feedback
Load balancing algorithms play a vital role in enhancing performance in data
centers and cloud networks. Due to the massive size of these systems,
scalability challenges, and especially the communication overhead associated
with load balancing mechanisms, have emerged as major concerns. Motivated by
these issues, we introduce and analyze a novel class of load balancing schemes
where the various servers provide occasional queue updates to guide the load
assignment.
We show that the proposed schemes strongly outperform JSQ() strategies
with comparable communication overhead per job, and can achieve a vanishing
waiting time in the many-server limit with just one message per job, just like
the popular JIQ scheme. The proposed schemes are particularly geared however
towards the sparse feedback regime with less than one message per job, where
they outperform corresponding sparsified JIQ versions.
We investigate fluid limits for synchronous updates as well as asynchronous
exponential update intervals. The fixed point of the fluid limit is identified
in the latter case, and used to derive the queue length distribution. We also
demonstrate that in the ultra-low feedback regime the mean stationary waiting
time tends to a constant in the synchronous case, but grows without bound in
the asynchronous case
- …