108,660 research outputs found
Distributed Computing on Core-Periphery Networks: Axiom-based Design
Inspired by social networks and complex systems, we propose a core-periphery
network architecture that supports fast computation for many distributed
algorithms and is robust and efficient in number of links. Rather than
providing a concrete network model, we take an axiom-based design approach. We
provide three intuitive (and independent) algorithmic axioms and prove that any
network that satisfies all axioms enjoys an efficient algorithm for a range of
tasks (e.g., MST, sparse matrix multiplication, etc.). We also show the
minimality of our axiom set: for networks that satisfy any subset of the
axioms, the same efficiency cannot be guaranteed for any deterministic
algorithm
Parametric shortest-path algorithms via tropical geometry
We study parameterized versions of classical algorithms for computing
shortest-path trees. This is most easily expressed in terms of tropical
geometry. Applications include shortest paths in traffic networks with variable
link travel times.Comment: 24 pages and 8 figure
Petascale turbulence simulation using a highly parallel fast multipole method on GPUs
This paper reports large-scale direct numerical simulations of
homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08
petaflop/s on gpu hardware using single precision. The simulations use a vortex
particle method to solve the Navier-Stokes equations, with a highly parallel
fast multipole method (FMM) as numerical engine, and match the current record
in mesh size for this application, a cube of 4096^3 computational points solved
with a spectral method. The standard numerical approach used in this field is
the pseudo-spectral method, relying on the FFT algorithm as numerical engine.
The particle-based simulations presented in this paper quantitatively match the
kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted
code. In terms of parallel performance, weak scaling results show the fmm-based
vortex method achieving 74% parallel efficiency on 4096 processes (one gpu per
mpi process, 3 gpus per node of the TSUBAME-2.0 system). The FFT-based spectral
method is able to achieve just 14% parallel efficiency on the same number of
mpi processes (using only cpu cores), due to the all-to-all communication
pattern of the FFT algorithm. The calculation time for one time step was 108
seconds for the vortex method and 154 seconds for the spectral method, under
these conditions. Computing with 69 billion particles, this work exceeds by an
order of magnitude the largest vortex method calculations to date
Parallel Graph Decompositions Using Random Shifts
We show an improved parallel algorithm for decomposing an undirected
unweighted graph into small diameter pieces with a small fraction of the edges
in between. These decompositions form critical subroutines in a number of graph
algorithms. Our algorithm builds upon the shifted shortest path approach
introduced in [Blelloch, Gupta, Koutis, Miller, Peng, Tangwongsan, SPAA 2011].
By combining various stages of the previous algorithm, we obtain a
significantly simpler algorithm with the same asymptotic guarantees as the best
sequential algorithm
- …