21,591 research outputs found
The DUNE-ALUGrid Module
In this paper we present the new DUNE-ALUGrid module. This module contains a
major overhaul of the sources from the ALUgrid library and the binding to the
DUNE software framework. The main changes include user defined load balancing,
parallel grid construction, and an redesign of the 2d grid which can now also
be used for parallel computations. In addition many improvements have been
introduced into the code to increase the parallel efficiency and to decrease
the memory footprint.
The original ALUGrid library is widely used within the DUNE community due to
its good parallel performance for problems requiring local adaptivity and
dynamic load balancing. Therefore, this new model will benefit a number of DUNE
users. In addition we have added features to increase the range of problems for
which the grid manager can be used, for example, introducing a 3d tetrahedral
grid using a parallel newest vertex bisection algorithm for conforming grid
refinement. In this paper we will discuss the new features, extensions to the
DUNE interface, and explain for various examples how the code is used in
parallel environments.Comment: 25 pages, 11 figure
C Language Extensions for Hybrid CPU/GPU Programming with StarPU
Modern platforms used for high-performance computing (HPC) include machines
with both general-purpose CPUs, and "accelerators", often in the form of
graphical processing units (GPUs). StarPU is a C library to exploit such
platforms. It provides users with ways to define "tasks" to be executed on CPUs
or GPUs, along with the dependencies among them, and by automatically
scheduling them over all the available processing units. In doing so, it also
relieves programmers from the need to know the underlying architecture details:
it adapts to the available CPUs and GPUs, and automatically transfers data
between main memory and GPUs as needed. While StarPU's approach is successful
at addressing run-time scheduling issues, being a C library makes for a poor
and error-prone programming interface. This paper presents an effort started in
2011 to promote some of the concepts exported by the library as C language
constructs, by means of an extension of the GCC compiler suite. Our main
contribution is the design and implementation of language extensions that map
to StarPU's task programming paradigm. We argue that the proposed extensions
make it easier to get started with StarPU,eliminate errors that can occur when
using the C library, and help diagnose possible mistakes. We conclude on future
work
FluidFFT: common API (C++ and Python) for Fast Fourier Transform HPC libraries
The Python package fluidfft provides a common Python API for performing Fast
Fourier Transforms (FFT) in sequential, in parallel and on GPU with different
FFT libraries (FFTW, P3DFFT, PFFT, cuFFT). fluidfft is a comprehensive FFT
framework which allows Python users to easily and efficiently perform FFT and
the associated tasks, such as as computing linear operators and energy spectra.
We describe the architecture of the package composed of C++ and Cython FFT
classes, Python "operator" classes and Pythran functions. The package supplies
utilities to easily test itself and benchmark the different FFT solutions for a
particular case and on a particular machine. We present a performance scaling
analysis on three different computing clusters and a microbenchmark showing
that fluidfft is an interesting solution to write efficient Python applications
using FFT
Elements of Design for Containers and Solutions in the LinBox Library
We describe in this paper new design techniques used in the \cpp exact linear
algebra library \linbox, intended to make the library safer and easier to use,
while keeping it generic and efficient. First, we review the new simplified
structure for containers, based on our \emph{founding scope allocation} model.
We explain design choices and their impact on coding: unification of our matrix
classes, clearer model for matrices and submatrices, \etc Then we present a
variation of the \emph{strategy} design pattern that is comprised of a
controller--plugin system: the controller (solution) chooses among plug-ins
(algorithms) that always call back the controllers for subtasks. We give
examples using the solution \mul. Finally we present a benchmark architecture
that serves two purposes: Providing the user with easier ways to produce
graphs; Creating a framework for automatically tuning the library and
supporting regression testing.Comment: 8 pages, 4th International Congress on Mathematical Software, Seoul :
Korea, Republic Of (2014
Runtime-Flexible Multi-dimensional Arrays and Views for C++98 and C++0x
Multi-dimensional arrays are among the most fundamental and most useful data
structures of all. In C++, excellent template libraries exist for arrays whose
dimension is fixed at runtime. Arrays whose dimension can change at runtime
have been implemented in C. However, a generic object-oriented C++
implementation of runtime-flexible arrays has so far been missing. In this
article, we discuss our new implementation called Marray, a package of class
templates that fills this gap. Marray is based on views as an underlying
concept. This concept brings some of the flexibility known from script
languages such as R and MATLAB to C++. Marray is free both for commercial and
non-commercial use and is publicly available from www.andres.sc/marrayComment: Free source code availabl
Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures
Feltor is a modular and free scientific software package. It allows
developing platform independent code that runs on a variety of parallel
computer architectures ranging from laptop CPUs to multi-GPU distributed memory
systems. Feltor consists of both a numerical library and a collection of
application codes built on top of the library. Its main target are two- and
three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin
methods as the main numerical discretization technique. We observe that
numerical simulations of a recently developed gyro-fluid model produce
non-deterministic results in parallel computations. First, we show how we
restore accuracy and bitwise reproducibility algorithmically and
programmatically. In particular, we adopt an implementation of the exactly
rounded dot product based on long accumulators, which avoids accuracy losses
especially in parallel applications. However, reproducibility and accuracy
alone fail to indicate correct simulation behaviour. In fact, in the physical
model slightly different initial conditions lead to vastly different end
states. This behaviour translates to its numerical representation. Pointwise
convergence, even in principle, becomes impossible for long simulation times.
In a second part, we explore important performance tuning considerations. We
identify latency and memory bandwidth as the main performance indicators of our
routines. Based on these, we propose a parallel performance model that predicts
the execution time of algorithms implemented in Feltor and test our model on a
selection of parallel hardware architectures. We are able to predict the
execution time with a relative error of less than 25% for problem sizes between
0.1 and 1000 MB. Finally, we find that the product of latency and bandwidth
gives a minimum array size per compute node to achieve a scaling efficiency
above 50% (both strong and weak)
A machine vision extension for the Ruby programming language
Dynamically typed scripting languages have become popular in recent years. Although interpreted languages allow for substantial reduction of software development time, they are often rejected due to performance concerns.
In this paper we present an extension for the programming
language Ruby, called HornetsEye, which facilitates the development
of real-time machine vision algorithms within Ruby. Apart from providing integration of crucial libraries for input and output, HornetsEye provides fast native implementations (compiled code) for a generic set of array operators. Different array operators were compared with equivalent implementations in C++. Not only was it possible to achieve comparable real-time performance, but also to exceed the efficiency of the C++ implementation in several cases.
Implementations of several algorithms were given to demonstrate
how the array operators can be used to create concise
implementations.</p
- …