6,720 research outputs found
A parallel Heap-Cell Method for Eikonal equations
Numerous applications of Eikonal equations prompted the development of many
efficient numerical algorithms. The Heap-Cell Method (HCM) is a recent serial
two-scale technique that has been shown to have advantages over other serial
state-of-the-art solvers for a wide range of problems. This paper presents a
parallelization of HCM for a shared memory architecture. The numerical
experiments in show that the parallel HCM exhibits good algorithmic
behavior and scales well, resulting in a very fast and practical solver.
We further explore the influence on performance and scaling of data
precision, early termination criteria, and the hardware architecture. A shorter
version of this manuscript (omitting these more detailed tests) has been
submitted to SIAM Journal on Scientific Computing in 2012.Comment: (a minor update to address the reviewers' comments) 31 pages; 15
figures; this is an expanded version of a paper accepted by SIAM Journal on
Scientific Computin
Parallel Algorithms for Summing Floating-Point Numbers
The problem of exactly summing n floating-point numbers is a fundamental
problem that has many applications in large-scale simulations and computational
geometry. Unfortunately, due to the round-off error in standard floating-point
operations, this problem becomes very challenging. Moreover, all existing
solutions rely on sequential algorithms which cannot scale to the huge datasets
that need to be processed.
In this paper, we provide several efficient parallel algorithms for summing n
floating point numbers, so as to produce a faithfully rounded floating-point
representation of the sum. We present algorithms in PRAM, external-memory, and
MapReduce models, and we also provide an experimental analysis of our MapReduce
algorithms, due to their simplicity and practical efficiency.Comment: Conference version appears in SPAA 201
PORTA: A three-dimensional multilevel radiative transfer code for modeling the intensity and polarization of spectral lines with massively parallel computers
The interpretation of the intensity and polarization of the spectral line
radiation produced in the atmosphere of the Sun and of other stars requires
solving a radiative transfer problem that can be very complex, especially when
the main interest lies in modeling the spectral line polarization produced by
scattering processes and the Hanle and Zeeman effects. One of the difficulties
is that the plasma of a stellar atmosphere can be highly inhomogeneous and
dynamic, which implies the need to solve the non-equilibrium problem of the
generation and transfer of polarized radiation in realistic three-dimensional
(3D) stellar atmospheric models. Here we present PORTA, an efficient multilevel
radiative transfer code we have developed for the simulation of the spectral
line polarization caused by scattering processes and the Hanle and Zeeman
effects in 3D models of stellar atmospheres. The numerical method of solution
is based on the non-linear multigrid iterative method and on a novel
short-characteristics formal solver of the Stokes-vector transfer equation
which uses monotonic B\'ezier interpolation. Therefore, with PORTA the
computing time needed to obtain at each spatial grid point the self-consistent
values of the atomic density matrix (which quantifies the excitation state of
the atomic system) scales linearly with the total number of grid points.
Another crucial feature of PORTA is its parallelization strategy, which allows
us to speed up the numerical solution of complicated 3D problems by several
orders of magnitude with respect to sequential radiative transfer approaches,
given its excellent linear scaling with the number of available processors. The
PORTA code can also be conveniently applied to solve the simpler 3D radiative
transfer problem of unpolarized radiation in multilevel systems.Comment: 15 pages, 15 figures, to appear in Astronomy and Astrophysic
goSLP: Globally Optimized Superword Level Parallelism Framework
Modern microprocessors are equipped with single instruction multiple data
(SIMD) or vector instruction sets which allow compilers to exploit superword
level parallelism (SLP), a type of fine-grained parallelism. Current SLP
auto-vectorization techniques use heuristics to discover vectorization
opportunities in high-level language code. These heuristics are fragile, local
and typically only present one vectorization strategy that is either accepted
or rejected by a cost model. We present goSLP, a novel SLP auto-vectorization
framework which solves the statement packing problem in a pairwise optimal
manner. Using an integer linear programming (ILP) solver, goSLP searches the
entire space of statement packing opportunities for a whole function at a time,
while limiting total compilation time to a few minutes. Furthermore, goSLP
optimally solves the vector permutation selection problem using dynamic
programming. We implemented goSLP in the LLVM compiler infrastructure,
achieving a geometric mean speedup of 7.58% on SPEC2017fp, 2.42% on SPEC2006fp
and 4.07% on NAS benchmarks compared to LLVM's existing SLP auto-vectorizer.Comment: Published at OOPSLA 201
Parallel computing for the finite element method
A finite element method is presented to compute time harmonic microwave
fields in three dimensional configurations. Nodal-based finite elements have
been coupled with an absorbing boundary condition to solve open boundary
problems. This paper describes how the modeling of large devices has been made
possible using parallel computation, New algorithms are then proposed to
implement this formulation on a cluster of workstations (10 DEC ALPHA 300X) and
on a CRAY C98. Analysis of the computation efficiency is performed using simple
problems. The electromagnetic scattering of a plane wave by a perfect electric
conducting airplane is finally given as example
Parallelized and Vectorized Tracking Using Kalman Filters with CMS Detector Geometry and Events
The High-Luminosity Large Hadron Collider at CERN will be characterized by
greater pileup of events and higher occupancy, making the track reconstruction
even more computationally demanding. Existing algorithms at the LHC are based
on Kalman filter techniques with proven excellent physics performance under a
variety of conditions. Starting in 2014, we have been developing
Kalman-filter-based methods for track finding and fitting adapted for many-core
SIMD processors that are becoming dominant in high-performance systems.
This paper summarizes the latest extensions to our software that allow it to
run on the realistic CMS-2017 tracker geometry using CMSSW-generated events,
including pileup. The reconstructed tracks can be validated against either the
CMSSW simulation that generated the hits, or the CMSSW reconstruction of the
tracks. In general, the code's computational performance has continued to
improve while the above capabilities were being added. We demonstrate that the
present Kalman filter implementation is able to reconstruct events with
comparable physics performance to CMSSW, while providing generally better
computational performance. Further plans for advancing the software are
discussed
- …