5,386 research outputs found
Expression Templates Revisited: A Performance Analysis of the Current ET Methodology
In the last decade, Expression Templates (ET) have gained a reputation as an
efficient performance optimization tool for C++ codes. This reputation builds
on several ET-based linear algebra frameworks focused on combining both elegant
and high-performance C++ code. However, on closer examination the assumption
that ETs are a performance optimization technique cannot be maintained. In this
paper we demonstrate and explain the inability of current ET-based frameworks
to deliver high performance for dense and sparse linear algebra operations, and
introduce a new "smart" ET implementation that truly allows the combination of
high performance code with the elegance and maintainability of a
domain-specific language.Comment: 16 pages, 7 figure
Best practices for HPM-assisted performance engineering on modern multicore processors
Many tools and libraries employ hardware performance monitoring (HPM) on
modern processors, and using this data for performance assessment and as a
starting point for code optimizations is very popular. However, such data is
only useful if it is interpreted with care, and if the right metrics are chosen
for the right purpose. We demonstrate the sensible use of hardware performance
counters in the context of a structured performance engineering approach for
applications in computational science. Typical performance patterns and their
respective metric signatures are defined, and some of them are illustrated
using case studies. Although these generic concepts do not depend on specific
tools or environments, we restrict ourselves to modern x86-based multicore
processors and use the likwid-perfctr tool under the Linux OS.Comment: 10 pages, 2 figure
Asynchronous Execution of Python Code on Task Based Runtime Systems
Despite advancements in the areas of parallel and distributed computing, the
complexity of programming on High Performance Computing (HPC) resources has
deterred many domain experts, especially in the areas of machine learning and
artificial intelligence (AI), from utilizing performance benefits of such
systems. Researchers and scientists favor high-productivity languages to avoid
the inconvenience of programming in low-level languages and costs of acquiring
the necessary skills required for programming at this level. In recent years,
Python, with the support of linear algebra libraries like NumPy, has gained
popularity despite facing limitations which prevent this code from distributed
runs. Here we present a solution which maintains both high level programming
abstractions as well as parallel and distributed efficiency. Phylanx, is an
asynchronous array processing toolkit which transforms Python and NumPy
operations into code which can be executed in parallel on HPC resources by
mapping Python and NumPy functions and variables into a dependency tree
executed by HPX, a general purpose, parallel, task-based runtime system written
in C++. Phylanx additionally provides introspection and visualization
capabilities for debugging and performance analysis. We have tested the
foundations of our approach by comparing our implementation of widely used
machine learning algorithms to accepted NumPy standards
The LifeV library: engineering mathematics beyond the proof of concept
LifeV is a library for the finite element (FE) solution of partial
differential equations in one, two, and three dimensions. It is written in C++
and designed to run on diverse parallel architectures, including cloud and high
performance computing facilities. In spite of its academic research nature,
meaning a library for the development and testing of new methods, one
distinguishing feature of LifeV is its use on real world problems and it is
intended to provide a tool for many engineering applications. It has been
actually used in computational hemodynamics, including cardiac mechanics and
fluid-structure interaction problems, in porous media, ice sheets dynamics for
both forward and inverse problems. In this paper we give a short overview of
the features of LifeV and its coding paradigms on simple problems. The main
focus is on the parallel environment which is mainly driven by domain
decomposition methods and based on external libraries such as MPI, the Trilinos
project, HDF5 and ParMetis.
Dedicated to the memory of Fausto Saleri.Comment: Review of the LifeV Finite Element librar
Automating embedded analysis capabilities and managing software complexity in multiphysics simulation part I: template-based generic programming
An approach for incorporating embedded simulation and analysis capabilities
in complex simulation codes through template-based generic programming is
presented. This approach relies on templating and operator overloading within
the C++ language to transform a given calculation into one that can compute a
variety of additional quantities that are necessary for many state-of-the-art
simulation and analysis algorithms. An approach for incorporating these ideas
into complex simulation codes through general graph-based assembly is also
presented. These ideas have been implemented within a set of packages in the
Trilinos framework and are demonstrated on a simple problem from chemical
engineering
Revisiting Actor Programming in C++
The actor model of computation has gained significant popularity over the
last decade. Its high level of abstraction makes it appealing for concurrent
applications in parallel and distributed systems. However, designing a
real-world actor framework that subsumes full scalability, strong reliability,
and high resource efficiency requires many conceptual and algorithmic additives
to the original model.
In this paper, we report on designing and building CAF, the "C++ Actor
Framework". CAF targets at providing a concurrent and distributed native
environment for scaling up to very large, high-performance applications, and
equally well down to small constrained systems. We present the key
specifications and design concepts---in particular a message-transparent
architecture, type-safe message interfaces, and pattern matching
facilities---that make native actors a viable approach for many robust,
elastic, and highly distributed developments. We demonstrate the feasibility of
CAF in three scenarios: first for elastic, upscaling environments, second for
including heterogeneous hardware like GPGPUs, and third for distributed runtime
systems. Extensive performance evaluations indicate ideal runtime behaviour for
up to 64 cores at very low memory footprint, or in the presence of GPUs. In
these tests, CAF continuously outperforms the competing actor environments
Erlang, Charm++, SalsaLite, Scala, ActorFoundry, and even the OpenMPI.Comment: 33 page
Correcting PCR amplification errors in unique molecular identifiers to generate accurate numbers of sequencing molecules
Unique molecular identifiers are random oligonucleotide sequences that remove PCR amplification biases. However, the impact that PCR associated sequencing errors have on the accuracy of generating absolute counts of RNA molecules is underappreciated. We show that PCR errors are a source of inaccuracy in both bulk and single-cell sequencing data, and synthesizing unique molecular identifiers using homotrimeric nucleotide blocks provides an error-correcting solution that allows absolute counting of sequenced molecules
- …