30,899 research outputs found
The Scalability-Efficiency/Maintainability-Portability Trade-off in Simulation Software Engineering: Examples and a Preliminary Systematic Literature Review
Large-scale simulations play a central role in science and the industry.
Several challenges occur when building simulation software, because simulations
require complex software developed in a dynamic construction process. That is
why simulation software engineering (SSE) is emerging lately as a research
focus. The dichotomous trade-off between scalability and efficiency (SE) on the
one hand and maintainability and portability (MP) on the other hand is one of
the core challenges. We report on the SE/MP trade-off in the context of an
ongoing systematic literature review (SLR). After characterizing the issue of
the SE/MP trade-off using two examples from our own research, we (1) review the
33 identified articles that assess the trade-off, (2) summarize the proposed
solutions for the trade-off, and (3) discuss the findings for SSE and future
work. Overall, we see evidence for the SE/MP trade-off and first solution
approaches. However, a strong empirical foundation has yet to be established;
general quantitative metrics and methods supporting software developers in
addressing the trade-off have to be developed. We foresee considerable future
work in SSE across scientific communities.Comment: 9 pages, 2 figures. Accepted for presentation at the Fourth
International Workshop on Software Engineering for High Performance Computing
in Computational Science and Engineering (SEHPCCSE 2016
Developing numerical libraries in Java
The rapid and widespread adoption of Java has created a demand for reliable
and reusable mathematical software components to support the growing number of
compute-intensive applications now under development, particularly in science
and engineering. In this paper we address practical issues of the Java language
and environment which have an effect on numerical library design and
development. Benchmarks which illustrate the current levels of performance of
key numerical kernels on a variety of Java platforms are presented. Finally, a
strategy for the development of a fundamental numerical toolkit for Java is
proposed and its current status is described.Comment: 11 pages. Revised version of paper presented to the 1998 ACM
Conference on Java for High Performance Network Computing. To appear in
Concurrency: Practice and Experienc
Performance Portability Study of Linear Algebra Kernels in OpenCL
The performance portability of OpenCL kernel implementations for common
memory bandwidth limited linear algebra operations across different hardware
generations of the same vendor as well as across vendors is studied. Certain
combinations of kernel implementations and work sizes are found to exhibit good
performance across compute kernels, hardware generations, and, to a lesser
degree, vendors. As a consequence, it is demonstrated that the optimization of
a single kernel is often sufficient to obtain good performance for a large
class of more complicated operations.Comment: 11 pages, 8 figures, 2 tables, International Workshop on OpenCL 201
Design and optimization of a portable LQCD Monte Carlo code using OpenACC
The present panorama of HPC architectures is extremely heterogeneous, ranging
from traditional multi-core CPU processors, supporting a wide class of
applications but delivering moderate computing performance, to many-core GPUs,
exploiting aggressive data-parallelism and delivering higher performances for
streaming computing applications. In this scenario, code portability (and
performance portability) become necessary for easy maintainability of
applications; this is very relevant in scientific computing where code changes
are very frequent, making it tedious and prone to error to keep different code
versions aligned. In this work we present the design and optimization of a
state-of-the-art production-level LQCD Monte Carlo application, using the
directive-based OpenACC programming model. OpenACC abstracts parallel
programming to a descriptive level, relieving programmers from specifying how
codes should be mapped onto the target architecture. We describe the
implementation of a code fully written in OpenACC, and show that we are able to
target several different architectures, including state-of-the-art traditional
CPUs and GPUs, with the same code. We also measure performance, evaluating the
computing efficiency of our OpenACC code on several architectures, comparing
with GPU-specific implementations and showing that a good level of
performance-portability can be reached.Comment: 26 pages, 2 png figures, preprint of an article submitted for
consideration in International Journal of Modern Physics
Heterogeneous Computing on Mixed Unstructured Grids with PyFR
PyFR is an open-source high-order accurate computational fluid dynamics
solver for mixed unstructured grids that can target a range of hardware
platforms from a single codebase. In this paper we demonstrate the ability of
PyFR to perform high-order accurate unsteady simulations of flow on mixed
unstructured grids using heterogeneous multi-node hardware. Specifically, after
benchmarking single-node performance for various platforms, PyFR v0.2.2 is used
to undertake simulations of unsteady flow over a circular cylinder at Reynolds
number 3 900 using a mixed unstructured grid of prismatic and tetrahedral
elements on a desktop workstation containing an Intel Xeon E5-2697 v2 CPU, an
NVIDIA Tesla K40c GPU, and an AMD FirePro W9100 GPU. Both the performance and
accuracy of PyFR are assessed. PyFR v0.2.2 is freely available under a 3-Clause
New Style BSD license (see www.pyfr.org).Comment: 21 pages, 9 figures, 6 table
- …