1,242 research outputs found
A Review of Lightweight Thread Approaches for High Performance Computing
High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonly-found patterns in current parallel codes. Moreover, we study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns andthat those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.The researchers from the Universitat Jaume I de Castelló were supported by project TIN2014-53495-R of the MINECO, the Generalitat Valenciana fellowship programme Vali+d 2015, and FEDER. This work was partially supported by the U.S. Dept. of Energy, Office of Science, Office of Advanced
Scientific Computing Research (SC-21), under contract DEAC02-06CH11357. We gratefully acknowledge the computing resources provided and operated by the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory.Peer ReviewedPostprint (author's final draft
Accelerating sequential programs using FastFlow and self-offloading
FastFlow is a programming environment specifically targeting cache-coherent
shared-memory multi-cores. FastFlow is implemented as a stack of C++ template
libraries built on top of lock-free (fence-free) synchronization mechanisms. In
this paper we present a further evolution of FastFlow enabling programmers to
offload part of their workload on a dynamically created software accelerator
running on unused CPUs. The offloaded function can be easily derived from
pre-existing sequential code. We emphasize in particular the effective
trade-off between human productivity and execution efficiency of the approach.Comment: 17 pages + cove
Design and Analysis of a Task-based Parallelization over a Runtime System of an Explicit Finite-Volume CFD Code with Adaptive Time Stepping
FLUSEPA (Registered trademark in France No. 134009261) is an advanced
simulation tool which performs a large panel of aerodynamic studies. It is the
unstructured finite-volume solver developed by Airbus Safran Launchers company
to calculate compressible, multidimensional, unsteady, viscous and reactive
flows around bodies in relative motion. The time integration in FLUSEPA is done
using an explicit temporal adaptive method. The current production version of
the code is based on MPI and OpenMP. This implementation leads to important
synchronizations that must be reduced. To tackle this problem, we present the
study of a task-based parallelization of the aerodynamic solver of FLUSEPA
using the runtime system StarPU and combining up to three levels of
parallelism. We validate our solution by the simulation (using a finite-volume
mesh with 80 million cells) of a take-off blast wave propagation for Ariane 5
launcher.Comment: Accepted manuscript of a paper in Journal of Computational Scienc
R friendly multi-threading in C++
Calling multi-threaded C++ code from R has its perils. Since the R
interpreter is single-threaded, one must not check for user interruptions or
print to the R console from multiple threads. One can, however, synchronize
with R from the main thread. The R package RcppThread (current version 0.5.3)
contains a header only C++ library for thread safe communication with R that
exploits this fact. It includes C++ classes for threads, a thread pool, and
parallel loops that routinely synchronize with R. This article explains the
package's functionality and gives examples of its usage. The synchronization
mechanism may also apply to other threading frameworks. Benchmarks suggest
that, although synchronization causes overhead, the parallel abstractions of
RcppThread are competitive with other popular libraries in typical scenarios
encountered in statistical computing
Deterministic Consistency: A Programming Model for Shared Memory Parallelism
The difficulty of developing reliable parallel software is generating
interest in deterministic environments, where a given program and input can
yield only one possible result. Languages or type systems can enforce
determinism in new code, and runtime systems can impose synthetic schedules on
legacy parallel code. To parallelize existing serial code, however, we would
like a programming model that is naturally deterministic without language
restrictions or artificial scheduling. We propose "deterministic consistency",
a parallel programming model as easy to understand as the "parallel assignment"
construct in sequential languages such as Perl and JavaScript, where concurrent
threads always read their inputs before writing shared outputs. DC supports
common data- and task-parallel synchronization abstractions such as fork/join
and barriers, as well as non-hierarchical structures such as producer/consumer
pipelines and futures. A preliminary prototype suggests that software-only
implementations of DC can run applications written for popular parallel
environments such as OpenMP with low (<10%) overhead for some applications.Comment: 7 pages, 3 figure
Parallel Astronomical Data Processing with Python: Recipes for multicore machines
High performance computing has been used in various fields of astrophysical
research. But most of it is implemented on massively parallel systems
(supercomputers) or graphical processing unit clusters. With the advent of
multicore processors in the last decade, many serial software codes have been
re-implemented in parallel mode to utilize the full potential of these
processors. In this paper, we propose parallel processing recipes for multicore
machines for astronomical data processing. The target audience are astronomers
who are using Python as their preferred scripting language and who may be using
PyRAF/IRAF for data processing. Three problems of varied complexity were
benchmarked on three different types of multicore processors to demonstrate the
benefits, in terms of execution time, of parallelizing data processing tasks.
The native multiprocessing module available in Python makes it a relatively
trivial task to implement the parallel code. We have also compared the three
multiprocessing approaches - Pool/Map, Process/Queue, and Parallel Python. Our
test codes are freely available and can be downloaded from our website.Comment: 15 pages, 7 figures, 1 table, "for associated test code, see
http://astro.nuigalway.ie/staff/navtejs", Accepted for publication in
Astronomy and Computin
- …