Search CORE

1,487 research outputs found

Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures

Author: Einkemmer Lukas
Gutierrez-Milla Albert
Held Markus
Iakymchuk Roman
Saez Xavier
Wiesenberger Matthias
Publication venue: 'Elsevier BV'
Publication date: 03/11/2018
Field of study

Feltor is a modular and free scientific software package. It allows developing platform independent code that runs on a variety of parallel computer architectures ranging from laptop CPUs to multi-GPU distributed memory systems. Feltor consists of both a numerical library and a collection of application codes built on top of the library. Its main target are two- and three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin methods as the main numerical discretization technique. We observe that numerical simulations of a recently developed gyro-fluid model produce non-deterministic results in parallel computations. First, we show how we restore accuracy and bitwise reproducibility algorithmically and programmatically. In particular, we adopt an implementation of the exactly rounded dot product based on long accumulators, which avoids accuracy losses especially in parallel applications. However, reproducibility and accuracy alone fail to indicate correct simulation behaviour. In fact, in the physical model slightly different initial conditions lead to vastly different end states. This behaviour translates to its numerical representation. Pointwise convergence, even in principle, becomes impossible for long simulation times. In a second part, we explore important performance tuning considerations. We identify latency and memory bandwidth as the main performance indicators of our routines. Based on these, we propose a parallel performance model that predicts the execution time of algorithms implemented in Feltor and test our model on a selection of parallel hardware architectures. We are able to predict the execution time with a relative error of less than 25% for problem sizes between 0.1 and 1000 MB. Finally, we find that the product of latency and bandwidth gives a minimum array size per compute node to achieve a scaling efficiency above 50% (both strong and weak)

arXiv.org e-Print Archive

Online Research Database In Technology

A $\mu$ -mode integrator for solving evolution equations in Kronecker form

Author: Caliari Marco
Cassini Fabio
Einkemmer Lukas
Ostermann Alexander
Zivcovich Franco
Publication venue
Publication date: 02/03/2021
Field of study

In this paper, we propose a

\mu

-mode integrator for computing the solution of stiff evolution equations. The integrator is based on a d-dimensional splitting approach and uses exact (usually precomputed) one-dimensional matrix exponentials. We show that the action of the exponentials, i.e. the corresponding batched matrix-vector products, can be implemented efficiently on modern computer systems. We further explain how

\mu

-mode products can be used to compute spectral transformations efficiently even if no fast transform is available. We illustrate the performance of the new integrator by solving three-dimensional linear and nonlinear Schr\"odinger equations, and we show that the

\mu

-mode integrator can significantly outperform numerical methods well established in the field. We also discuss how to efficiently implement this integrator on both multi-core CPUs and GPUs. Finally, the numerical experiments show that using GPUs results in performance improvements between a factor of 10 and 20, depending on the problem

arXiv.org e-Print Archive

Catalogo dei prodotti della ricerca

Accelerated computational micromechanics

Author: Bhattacharya Kaushik
Zhou Hao
Publication venue
Publication date: 09/10/2020
Field of study

We present an approach to solving problems in micromechanics that is amenable to massively parallel calculations through the use of graphical processing units and other accelerators. The problems lead to nonlinear differential equations that are typically second order in space and first order in time. This combination of nonlinearity and nonlocality makes such problems difficult to solve in parallel. However, this combination is a result of collapsing nonlocal, but linear and universal physical laws (kinematic compatibility, balance laws), and nonlinear but local constitutive relations. We propose an operator-splitting scheme inspired by this structure. The governing equations are formulated as (incremental) variational problems, the differential constraints like compatibility are introduced using an augmented Lagrangian, and the resulting incremental variational principle is solved by the alternating direction method of multipliers. The resulting algorithm has a natural connection to physical principles, and also enables massively parallel implementation on structured grids. We present this method and use it to study two examples. The first concerns the long wavelength instability of finite elasticity, and allows us to verify the approach against previous numerical simulations. We also use this example to study convergence and parallel performance. The second example concerns microstructure evolution in liquid crystal elastomers and provides new insights into some counter-intuitive properties of these materials. We use this example to validate the model and the approach against experimental observations

Real Time Tracking with ATLAS Silicon Detectors and its Applications to Beauty Hadron Physics

Author: Schiavi Carlo
Publication venue: Genova Univ.
Publication date: 01/01/2004
Field of study

The purpose of the work presented here is a complete characterization of a track reconstruction algorithm, SiTrack, based on data coming from the silicon detectors and designed to operate in the on-line event selection system of the ATLAS The application of the SiTrack algorithm to different physics selections will be discussed and the corresponding results will be provided, both in terms of pure tracking performance and of impact on the physical event selection strategy. After a brief overview of flavour physics, the present status of the Unitarity Triangle determination is presented, along with the expected ATLAS reach in this sector

CERN Document Server

Precision analysis for hardware acceleration of numerical algorithms

Author: Boland David Peter
Boland David Peter
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2012
Field of study

The precision used in an algorithm affects the error and performance of individual computations, the memory usage, and the potential parallelism for a fixed hardware budget. However, when migrating an algorithm onto hardware, the potential improvements that can be obtained by tuning the precision throughout an algorithm to meet a range or error specification are often overlooked; the major reason is that it is hard to choose a number system which can guarantee any such specification can be met. Instead, the problem is mitigated by opting to use IEEE standard double precision arithmetic so as to be ‘no worse’ than a software implementation. However, the flexibility in the number representation is one of the key factors that can be exploited on reconfigurable hardware such as FPGAs, and hence ignoring this potential significantly limits the performance achievable. In order to optimise the performance of hardware reliably, we require a method that can tractably calculate tight bounds for the error or range of any variable within an algorithm, but currently only a handful of methods to calculate such bounds exist, and these either sacrifice tightness or tractability, whilst simulation-based methods cannot guarantee the given error estimate. This thesis presents a new method to calculate these bounds, taking into account both input ranges and finite precision effects, which we show to be, in general, tighter in comparison to existing methods; this in turn can be used to tune the hardware to the algorithm specifications. We demonstrate the use of this software to optimise hardware for various algorithms to accelerate the solution of a system of linear equations, which forms the basis of many problems in engineering and science, and show that significant performance gains can be obtained by using this new approach in conjunction with more traditional hardware optimisations

Spiral - Imperial College Digital Repository

Dark matter production in association with a single top-quark at the LHC in a two-Higgs-doublet model with a pseudoscalar mediator

Author: Pani Priscilla
Polesello Giacomo
Publication venue: 'Elsevier BV'
Publication date: 11/12/2017
Field of study

The sensitivity of the LHC experiments to the associated production of dark matter with a single top is studied in the framework of an extension of the standard model featuring two Higgs doublets and an additional pseudoscalar mediator. It is found that the experimental sensitivity is dominated by the on-shell production of a charged Higgs boson, when this assumes a mass below 1 TeV. Dedicated selections considering one and two lepton final states are developed to assess the coverage in parameter space for this signature at a centre-of-mass energy of 14 TeV assuming an integrated luminosity of 300 fb

^{-1}

. For a pseudoscalar mediator with mass 150 GeV and maximally mixed with the pseudoscalar of the two Higgs doublets, values of

tan\beta

up to 3 and down to 15 can be excluded at 95% CL, if the

H^{\pm}

mass is in the range 300 GeV-1 TeV. This novel signature complements the parameter space coverage of the mono-Higgs, mono-Z and

t{\bar t}

E_{\mathrm T}^{\mathrm miss}

signatures considered in previous publications for this model.Comment: 10 pages, 8 figure

arXiv.org e-Print Archive

CERN Document Server