10,761 research outputs found
XMDS2: Fast, scalable simulation of coupled stochastic partial differential equations
XMDS2 is a cross-platform, GPL-licensed, open source package for numerically
integrating initial value problems that range from a single ordinary
differential equation up to systems of coupled stochastic partial differential
equations. The equations are described in a high-level XML-based script, and
the package generates low-level optionally parallelised C++ code for the
efficient solution of those equations. It combines the advantages of high-level
simulations, namely fast and low-error development, with the speed, portability
and scalability of hand-written code. XMDS2 is a complete redesign of the XMDS
package, and features support for a much wider problem space while also
producing faster code.Comment: 9 pages, 5 figure
Loo.py: From Fortran to performance via transformation and substitution rules
A large amount of numerically-oriented code is written and is being written
in legacy languages. Much of this code could, in principle, make good use of
data-parallel throughput-oriented computer architectures. Loo.py, a
transformation-based programming system targeted at GPUs and general
data-parallel architectures, provides a mechanism for user-controlled
transformation of array programs. This transformation capability is designed to
not just apply to programs written specifically for Loo.py, but also those
imported from other languages such as Fortran. It eases the trade-off between
achieving high performance, portability, and programmability by allowing the
user to apply a large and growing family of transformations to an input
program. These transformations are expressed in and used from Python and may be
applied from a variety of settings, including a pragma-like manner from other
languages.Comment: ARRAY 2015 - 2nd ACM SIGPLAN International Workshop on Libraries,
Languages and Compilers for Array Programming (ARRAY 2015
Towards an Efficient Use of the BLAS Library for Multilinear Tensor Contractions
Mathematical operators whose transformation rules constitute the building
blocks of a multi-linear algebra are widely used in physics and engineering
applications where they are very often represented as tensors. In the last
century, thanks to the advances in tensor calculus, it was possible to uncover
new research fields and make remarkable progress in the existing ones, from
electromagnetism to the dynamics of fluids and from the mechanics of rigid
bodies to quantum mechanics of many atoms. By now, the formal mathematical and
geometrical properties of tensors are well defined and understood; conversely,
in the context of scientific and high-performance computing, many tensor-
related problems are still open. In this paper, we address the problem of
efficiently computing contractions among two tensors of arbitrary dimension by
using kernels from the highly optimized BLAS library. In particular, we
establish precise conditions to determine if and when GEMM, the kernel for
matrix products, can be used. Such conditions take into consideration both the
nature of the operation and the storage scheme of the tensors, and induce a
classification of the contractions into three groups. For each group, we
provide a recipe to guide the users towards the most effective use of BLAS.Comment: 27 Pages, 7 figures and additional tikz generated diagrams. Submitted
to Applied Mathematics and Computatio
Overview of Parallel Platforms for Common High Performance Computing
The paper deals with various parallel platforms used for high performance computing in the signal processing domain. More precisely, the methods exploiting the multicores central processing units such as message passing interface and OpenMP are taken into account. The properties of the programming methods are experimentally proved in the application of a fast Fourier transform and a discrete cosine transform and they are compared with the possibilities of MATLAB's built-in functions and Texas Instruments digital signal processors with very long instruction word architectures. New FFT and DCT implementations were proposed and tested. The implementation phase was compared with CPU based computing methods and with possibilities of the Texas Instruments digital signal processing library on C6747 floating-point DSPs. The optimal combination of computing methods in the signal processing domain and new, fast routines' implementation is proposed as well
- …