1,487 research outputs found
Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures
Feltor is a modular and free scientific software package. It allows
developing platform independent code that runs on a variety of parallel
computer architectures ranging from laptop CPUs to multi-GPU distributed memory
systems. Feltor consists of both a numerical library and a collection of
application codes built on top of the library. Its main target are two- and
three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin
methods as the main numerical discretization technique. We observe that
numerical simulations of a recently developed gyro-fluid model produce
non-deterministic results in parallel computations. First, we show how we
restore accuracy and bitwise reproducibility algorithmically and
programmatically. In particular, we adopt an implementation of the exactly
rounded dot product based on long accumulators, which avoids accuracy losses
especially in parallel applications. However, reproducibility and accuracy
alone fail to indicate correct simulation behaviour. In fact, in the physical
model slightly different initial conditions lead to vastly different end
states. This behaviour translates to its numerical representation. Pointwise
convergence, even in principle, becomes impossible for long simulation times.
In a second part, we explore important performance tuning considerations. We
identify latency and memory bandwidth as the main performance indicators of our
routines. Based on these, we propose a parallel performance model that predicts
the execution time of algorithms implemented in Feltor and test our model on a
selection of parallel hardware architectures. We are able to predict the
execution time with a relative error of less than 25% for problem sizes between
0.1 and 1000 MB. Finally, we find that the product of latency and bandwidth
gives a minimum array size per compute node to achieve a scaling efficiency
above 50% (both strong and weak)
A -mode integrator for solving evolution equations in Kronecker form
In this paper, we propose a -mode integrator for computing the solution
of stiff evolution equations. The integrator is based on a d-dimensional
splitting approach and uses exact (usually precomputed) one-dimensional matrix
exponentials. We show that the action of the exponentials, i.e. the
corresponding batched matrix-vector products, can be implemented efficiently on
modern computer systems. We further explain how -mode products can be used
to compute spectral transformations efficiently even if no fast transform is
available. We illustrate the performance of the new integrator by solving
three-dimensional linear and nonlinear Schr\"odinger equations, and we show
that the -mode integrator can significantly outperform numerical methods
well established in the field. We also discuss how to efficiently implement
this integrator on both multi-core CPUs and GPUs. Finally, the numerical
experiments show that using GPUs results in performance improvements between a
factor of 10 and 20, depending on the problem
Accelerated computational micromechanics
We present an approach to solving problems in micromechanics that is amenable to massively parallel calculations through the use of graphical processing units and other accelerators. The problems lead to nonlinear differential equations that are typically second order in space and first order in time. This combination of nonlinearity and nonlocality makes such problems difficult to solve in parallel. However, this combination is a result of collapsing nonlocal, but linear and universal physical laws (kinematic compatibility, balance laws), and nonlinear but local constitutive relations. We propose an operator-splitting scheme inspired by this structure. The governing equations are formulated as (incremental) variational problems, the differential constraints like compatibility are introduced using an augmented Lagrangian, and the resulting incremental variational principle is solved by the alternating direction method of multipliers. The resulting algorithm has a natural connection to physical principles, and also enables massively parallel implementation on structured grids. We present this method and use it to study two examples. The first concerns the long wavelength instability of finite elasticity, and allows us to verify the approach against previous numerical simulations. We also use this example to study convergence and parallel performance. The second example concerns microstructure evolution in liquid crystal elastomers and provides new insights into some counter-intuitive properties of these materials. We use this example to validate the model and the approach against experimental observations
Real Time Tracking with ATLAS Silicon Detectors and its Applications to Beauty Hadron Physics
The purpose of the work presented here is a complete characterization of a track reconstruction algorithm, SiTrack, based on data coming from the silicon detectors and designed to operate in the on-line event selection system of the ATLAS The application of the SiTrack algorithm to different physics selections will be discussed and the corresponding results will be provided, both in terms of pure tracking performance and of impact on the physical event selection strategy. After a brief overview of flavour physics, the present status of the Unitarity Triangle determination is presented, along with the expected ATLAS reach in this sector
Precision analysis for hardware acceleration of numerical algorithms
The precision used in an algorithm affects the error and performance of individual computations, the
memory usage, and the potential parallelism for a fixed hardware budget. However, when migrating
an algorithm onto hardware, the potential improvements that can be obtained by tuning the precision
throughout an algorithm to meet a range or error specification are often overlooked; the major reason
is that it is hard to choose a number system which can guarantee any such specification can be met.
Instead, the problem is mitigated by opting to use IEEE standard double precision arithmetic so as to be
‘no worse’ than a software implementation. However, the flexibility in the number representation is one
of the key factors that can be exploited on reconfigurable hardware such as FPGAs, and hence ignoring
this potential significantly limits the performance achievable.
In order to optimise the performance of hardware reliably, we require a method that can tractably
calculate tight bounds for the error or range of any variable within an algorithm, but currently only a
handful of methods to calculate such bounds exist, and these either sacrifice tightness or tractability,
whilst simulation-based methods cannot guarantee the given error estimate. This thesis presents a new
method to calculate these bounds, taking into account both input ranges and finite precision effects,
which we show to be, in general, tighter in comparison to existing methods; this in turn can be used to
tune the hardware to the algorithm specifications.
We demonstrate the use of this software to optimise hardware for various algorithms to accelerate
the solution of a system of linear equations, which forms the basis of many problems in engineering
and science, and show that significant performance gains can be obtained by using this new approach in
conjunction with more traditional hardware optimisations
Dark matter production in association with a single top-quark at the LHC in a two-Higgs-doublet model with a pseudoscalar mediator
The sensitivity of the LHC experiments to the associated production of dark
matter with a single top is studied in the framework of an extension of the
standard model featuring two Higgs doublets and an additional pseudoscalar
mediator. It is found that the experimental sensitivity is dominated by the
on-shell production of a charged Higgs boson, when this assumes a mass below 1
TeV. Dedicated selections considering one and two lepton final states are
developed to assess the coverage in parameter space for this signature at a
centre-of-mass energy of 14 TeV assuming an integrated luminosity of 300
fb. For a pseudoscalar mediator with mass 150 GeV and maximally mixed
with the pseudoscalar of the two Higgs doublets, values of up to 3
and down to 15 can be excluded at 95% CL, if the mass is in the range
300 GeV-1 TeV. This novel signature complements the parameter space coverage of
the mono-Higgs, mono-Z and +
signatures considered in previous publications for this model.Comment: 10 pages, 8 figure
- …