1,215 research outputs found
GLB: Lifeline-based Global Load Balancing library in X10
We present GLB, a programming model and an associated implementation that can
handle a wide range of irregular paral- lel programming problems running over
large-scale distributed systems. GLB is applicable both to problems that are
easily load-balanced via static scheduling and to problems that are hard to
statically load balance. GLB hides the intricate syn- chronizations (e.g.,
inter-node communication, initialization and startup, load balancing,
termination and result collection) from the users. GLB internally uses a
version of the lifeline graph based work-stealing algorithm proposed by
Saraswat et al. Users of GLB are simply required to write several pieces of
sequential code that comply with the GLB interface. GLB then schedules and
orchestrates the parallel execution of the code correctly and efficiently at
scale. We have applied GLB to two representative benchmarks: Betweenness
Centrality (BC) and Unbalanced Tree Search (UTS). Among them, BC can be
statically load-balanced whereas UTS cannot. In either case, GLB scales well--
achieving nearly linear speedup on different computer architectures (Power,
Blue Gene/Q, and K) -- up to 16K cores
SKIRT: hybrid parallelization of radiative transfer simulations
We describe the design, implementation and performance of the new hybrid
parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which
has been used extensively for modeling the continuum radiation of dusty
astrophysical systems including late-type galaxies and dusty tori. The hybrid
scheme combines distributed memory parallelization, using the standard Message
Passing Interface (MPI) to communicate between processes, and shared memory
parallelization, providing multiple execution threads within each process to
avoid duplication of data structures. The synchronization between multiple
threads is accomplished through atomic operations without high-level locking
(also called lock-free programming). This improves the scaling behavior of the
code and substantially simplifies the implementation of the hybrid scheme. The
result is an extremely flexible solution that adjusts to the number of
available nodes, processors and memory, and consequently performs well on a
wide variety of computing architectures.Comment: 21 pages, 20 figure
Quasirandom Load Balancing
We propose a simple distributed algorithm for balancing indivisible tokens on
graphs. The algorithm is completely deterministic, though it tries to imitate
(and enhance) a random algorithm by keeping the accumulated rounding errors as
small as possible.
Our new algorithm surprisingly closely approximates the idealized process
(where the tokens are divisible) on important network topologies. On
d-dimensional torus graphs with n nodes it deviates from the idealized process
only by an additive constant. In contrast to that, the randomized rounding
approach of Friedrich and Sauerwald (2009) can deviate up to Omega(polylog(n))
and the deterministic algorithm of Rabani, Sinclair and Wanka (1998) has a
deviation of Omega(n^{1/d}). This makes our quasirandom algorithm the first
known algorithm for this setting which is optimal both in time and achieved
smoothness. We further show that also on the hypercube our algorithm has a
smaller deviation from the idealized process than the previous algorithms.Comment: 25 page
- …