21,308 research outputs found
A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters
In this work, we consider the solution of boundary integral equations by
means of a scalable hierarchical matrix approach on clusters equipped with
graphics hardware, i.e. graphics processing units (GPUs). To this end, we
extend our existing single-GPU hierarchical matrix library hmglib such that it
is able to scale on many GPUs and such that it can be coupled to arbitrary
application codes. Using a model GPU implementation of a boundary element
method (BEM) solver, we are able to achieve more than 67 percent relative
parallel speed-up going from 128 to 1024 GPUs for a model geometry test case
with 1.5 million unknowns and a real-world geometry test case with almost 1.2
million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6
minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the
setup phase and 20 seconds for the iterative solver. To the best of the
authors' knowledge, we here discuss the first fully GPU-based
distributed-memory parallel hierarchical matrix Open Source library using the
traditional H-matrix format and adaptive cross approximation with an
application to BEM problems
A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing
The overwhelmingly increasing amount of stored data has spurred researchers
seeking different methods in order to optimally take advantage of it which
mostly have faced a response time problem as a result of this enormous size of
data. Most of solutions have suggested materialization as a favourite solution.
However, such a solution cannot attain Real- Time answers anyhow. In this paper
we propose a framework illustrating the barriers and suggested solutions in the
way of achieving Real-Time OLAP answers that are significantly used in decision
support systems and data warehouses
A Tuned and Scalable Fast Multipole Method as a Preeminent Algorithm for Exascale Systems
Among the algorithms that are likely to play a major role in future exascale
computing, the fast multipole method (FMM) appears as a rising star. Our
previous recent work showed scaling of an FMM on GPU clusters, with problem
sizes in the order of billions of unknowns. That work led to an extremely
parallel FMM, scaling to thousands of GPUs or tens of thousands of CPUs. This
paper reports on a a campaign of performance tuning and scalability studies
using multi-core CPUs, on the Kraken supercomputer. All kernels in the FMM were
parallelized using OpenMP, and a test using 10^7 particles randomly distributed
in a cube showed 78% efficiency on 8 threads. Tuning of the
particle-to-particle kernel using SIMD instructions resulted in 4x speed-up of
the overall algorithm on single-core tests with 10^3 - 10^7 particles. Parallel
scalability was studied in both strong and weak scaling. The strong scaling
test used 10^8 particles and resulted in 93% parallel efficiency on 2048
processes for the non-SIMD code and 54% for the SIMD-optimized code (which was
still 2x faster). The weak scaling test used 10^6 particles per process, and
resulted in 72% efficiency on 32,768 processes, with the largest calculation
taking about 40 seconds to evaluate more than 32 billion unknowns. This work
builds up evidence for our view that FMM is poised to play a leading role in
exascale computing, and we end the paper with a discussion of the features that
make it a particularly favorable algorithm for the emerging heterogeneous and
massively parallel architectural landscape
Optimal, scalable forward models for computing gravity anomalies
We describe three approaches for computing a gravity signal from a density
anomaly. The first approach consists of the classical "summation" technique,
whilst the remaining two methods solve the Poisson problem for the
gravitational potential using either a Finite Element (FE) discretization
employing a multilevel preconditioner, or a Green's function evaluated with the
Fast Multipole Method (FMM). The methods utilizing the PDE formulation
described here differ from previously published approaches used in gravity
modeling in that they are optimal, implying that both the memory and
computational time required scale linearly with respect to the number of
unknowns in the potential field. Additionally, all of the implementations
presented here are developed such that the computations can be performed in a
massively parallel, distributed memory computing environment. Through numerical
experiments, we compare the methods on the basis of their discretization error,
CPU time and parallel scalability. We demonstrate the parallel scalability of
all these techniques by running forward models with up to voxels on
1000's of cores.Comment: 38 pages, 13 figures; accepted by Geophysical Journal Internationa
Dual Computations of Non-abelian Yang-Mills on the Lattice
In the past several decades there have been a number of proposals for
computing with dual forms of non-abelian Yang-Mills theories on the lattice.
Motivated by the gauge-invariant, geometric picture offered by dual models and
successful applications of duality in the U(1) case, we revisit the question of
whether it is practical to perform numerical computation using non-abelian dual
models. Specifically, we consider three-dimensional SU(2) pure Yang-Mills as an
accessible yet non-trivial case in which the gauge group is non-abelian. Using
methods developed recently in the context of spin foam quantum gravity, we
derive an algorithm for efficiently computing the dual amplitude and describe
Metropolis moves for sampling the dual ensemble. We relate our algorithms to
prior work in non-abelian dual computations of Hari Dass and his collaborators,
addressing several problems that have been left open. We report results of spin
expectation value computations over a range of lattice sizes and couplings that
are in agreement with our conventional lattice computations. We conclude with
an outlook on further development of dual methods and their application to
problems of current interest.Comment: v1: 18 pages, 7 figures, v2: Many changes to appendix, minor changes
throughout, references and figures added, v3: minor corrections, 22 page
- …