20,867 research outputs found
Towards a Scalable Dynamic Spatial Database System
With the rise of GPS-enabled smartphones and other similar mobile devices,
massive amounts of location data are available. However, no scalable solutions
for soft real-time spatial queries on large sets of moving objects have yet
emerged. In this paper we explore and measure the limits of actual algorithms
and implementations regarding different application scenarios. And finally we
propose a novel distributed architecture to solve the scalability issues.Comment: (2012
Algorithmic patterns for -matrices on many-core processors
In this work, we consider the reformulation of hierarchical ()
matrix algorithms for many-core processors with a model implementation on
graphics processing units (GPUs). matrices approximate specific
dense matrices, e.g., from discretized integral equations or kernel ridge
regression, leading to log-linear time complexity in dense matrix-vector
products. The parallelization of matrix operations on many-core
processors is difficult due to the complex nature of the underlying algorithms.
While previous algorithmic advances for many-core hardware focused on
accelerating existing matrix CPU implementations by many-core
processors, we here aim at totally relying on that processor type. As main
contribution, we introduce the necessary parallel algorithmic patterns allowing
to map the full matrix construction and the fast matrix-vector
product to many-core hardware. Here, crucial ingredients are space filling
curves, parallel tree traversal and batching of linear algebra operations. The
resulting model GPU implementation hmglib is the, to the best of the authors
knowledge, first entirely GPU-based Open Source matrix library of
this kind. We conclude this work by an in-depth performance analysis and a
comparative performance study against a standard matrix library,
highlighting profound speedups of our many-core parallel approach
GADGET: A code for collisionless and gasdynamical cosmological simulations
We describe the newly written code GADGET which is suitable both for
cosmological simulations of structure formation and for the simulation of
interacting galaxies. GADGET evolves self-gravitating collisionless fluids with
the traditional N-body approach, and a collisional gas by smoothed particle
hydrodynamics. Along with the serial version of the code, we discuss a parallel
version that has been designed to run on massively parallel supercomputers with
distributed memory. While both versions use a tree algorithm to compute
gravitational forces, the serial version of GADGET can optionally employ the
special-purpose hardware GRAPE instead of the tree. Periodic boundary
conditions are supported by means of an Ewald summation technique. The code
uses individual and adaptive timesteps for all particles, and it combines this
with a scheme for dynamic tree updates. Due to its Lagrangian nature, GADGET
thus allows a very large dynamic range to be bridged, both in space and time.
So far, GADGET has been successfully used to run simulations with up to 7.5e7
particles, including cosmological studies of large-scale structure formation,
high-resolution simulations of the formation of clusters of galaxies, as well
as workstation-sized problems of interacting galaxies. In this study, we detail
the numerical algorithms employed, and show various tests of the code. We
publically release both the serial and the massively parallel version of the
code.Comment: 32 pages, 14 figures, replaced to match published version in New
Astronomy. For download of the code, see
http://www.mpa-garching.mpg.de/gadget (new version 1.1 available
Task-based adaptive multiresolution for time-space multi-scale reaction-diffusion systems on multi-core architectures
A new solver featuring time-space adaptation and error control has been
recently introduced to tackle the numerical solution of stiff
reaction-diffusion systems. Based on operator splitting, finite volume adaptive
multiresolution and high order time integrators with specific stability
properties for each operator, this strategy yields high computational
efficiency for large multidimensional computations on standard architectures
such as powerful workstations. However, the data structure of the original
implementation, based on trees of pointers, provides limited opportunities for
efficiency enhancements, while posing serious challenges in terms of parallel
programming and load balancing. The present contribution proposes a new
implementation of the whole set of numerical methods including Radau5 and
ROCK4, relying on a fully different data structure together with the use of a
specific library, TBB, for shared-memory, task-based parallelism with
work-stealing. The performance of our implementation is assessed in a series of
test-cases of increasing difficulty in two and three dimensions on multi-core
and many-core architectures, demonstrating high scalability
- …