432 research outputs found
High Performance Composition Operators in Component Models
International audienceScientific numerical applications are always expecting more computing and storage capabilities to compute at finer grain and/or to integrate more phenomena in their computations. Even though, they are getting more complex to develop. However, the continual growth of computing and storage capabilities is achieved with an increase complexity of infrastructures. Thus, there is an important challenge to define programming abstractions able to deal with software and hardware complexity. An interesting approach is represented by software component models. This chapter first analyzes how high performance interactions are only partially supported by specialized component models. Then, it introduces HLCM, a component model that aims at efficiently supporting all kinds of static compositions
f90wrap : an automated tool for constructing deep Python interfaces to modern Fortran codes
Abstract: f90wrap is a tool to automatically generate Python extension modules which interface to Fortran libraries that makes use of derived types. It builds on the capabilities of the popular f2py utility by generating a simpler Fortran 90 interface to the original Fortran code which is then suitable for wrapping with f2py, together with a higher-level Pythonic wrapper that makes the existance of an additional layer transparent to the final user. f90wrap has been used to wrap a number of large software packages of relevance to the condensed matter physics community, including the QUIP molecular dynamics code and the CASTEP density functional theory code
On diagonally structured matrix computation
In this thesis, we have proposed efficient implementations of linear algebra kernels such as matrix-vector and matrix-matrix multiplications by formulating arithmetic calculations in terms of diagonals and thereby giving an orientation-neutral (column-/row-major layout) computational scheme. Matrix elements are accessed with stride-1 and no indirect referencing is involved. Access to the transposed matrix requires no additional effort. The proposed storage scheme handles dense matrices and matrices with special structures such as banded, symmetric in a uniform manner. Test results from numerical experiments with OpenMP implementation are promising. We also show that, using our diagonal framework, Java native arrays can yield superior computational performance. We present two alternative implementations for matrix-matrix multiplication operation in Java. The results from numerical testing demonstrate the advantage of our proposed methods
CoreTSAR: Task Scheduling for Accelerator-aware Runtimes
Heterogeneous supercomputers that incorporate computational accelerators
such as GPUs are increasingly popular due to their high
peak performance, energy efficiency and comparatively low cost.
Unfortunately, the programming models and frameworks designed
to extract performance from all computational units still lack the
flexibility of their CPU-only counterparts. Accelerated OpenMP
improves this situation by supporting natural migration of OpenMP
code from CPUs to a GPU. However, these implementations currently
lose one of OpenMPâs best features, its flexibility: typical
OpenMP applications can run on any number of CPUs. GPU implementations
do not transparently employ multiple GPUs on a node
or a mix of GPUs and CPUs. To address these shortcomings, we
present CoreTSAR, our runtime library for dynamically scheduling
tasks across heterogeneous resources, and propose straightforward
extensions that incorporate this functionality into Accelerated
OpenMP. We show that our approach can provide nearly linear
speedup to four GPUs over only using CPUs or one GPU while
increasing the overall flexibility of Accelerated OpenMP
Automatische Codegenerierung fĂŒr Massiv Parallele Applikationen in der Numerischen Strömungsmechanik
Solving partial differential equations (PDEs) is a fundamental challenge in many application domains in industry and academia alike. With increasingly large problems, efficient and highly scalable implementations become more and more crucial. Today, facing this challenge is more difficult than ever due to the increasingly heterogeneous hardware landscape. One promising approach is developing domainâspecific languages (DSLs) for a set of applications. Using code generation techniques then allows targeting a range of hardware platforms while concurrently applying domainâspecific optimizations in an automated fashion. The present work aims to further the state of the art in this field. As domain, we choose PDE solvers and, in particular, those from the group of geometric multigrid methods. To avoid having a focus too broad, we restrict ourselves to methods working on structured and patchâstructured grids.
We face the challenge of handling a domain as complex as ours, while providing different abstractions for diverse user groups, by splitting our external DSL ExaSlang into multiple layers, each specifying different aspects of the final application. Layer 1 is designed to resemble LaTeX and allows inputting continuous equations and functions. Their discretization is expressed on layer 2. It is complemented by algorithmic components which can be implemented in a Matlabâlike syntax on layer 3. All information provided to this point is summarized on layer 4, enriched with particulars about data structures and the employed parallelization. Additionally, we support automated progression between the different layers. All ExaSlang input is processed by our jointly developed Scala code generation framework to ultimately emit C++ code. We particularly focus on how to generate applications parallelized with, e.g., MPI and OpenMP that are able to run on workstations and largeâscale cluster alike.
We showcase the applicability of our approach by implementing simple test problems, like Poissonâs equation, as well as relevant applications from the field of computational fluid dynamics (CFD). In particular, we implement scalable solvers for the Stokes, NavierâStokes and shallow water equations (SWE) discretized using finite differences (FD) and finite volumes (FV). For the case of NavierâStokes, we also extend our implementation towards nonâuniform grids, thereby enabling static mesh refinement, and advanced effects such as the simulated fluid being nonâNewtonian and nonâisothermal
Tools and models for high level parallel and Grid programming
When algorithmic skeletons were first introduced by Cole in
late 1980 (50) the idea had an almost immediate success. The
skeletal approach has been proved to be effective when application algorithms can be expressed in terms of skeletons composition. However, despite both their effectiveness and the progress made in skeletal systems design and implementation, algorithmic skeletons remain absent from mainstream practice. Cole and other researchers, respectively in (51) and (19), focused the problem. They recognized the issues affecting
skeletal systems and stated a set of principles that have
to be tackled in order to make them more effective and to
take skeletal programming into the parallel mainstream. In
this thesis we propose tools and models for addressing some
among the skeletal programming environments issues. We
describe three novel approaches aimed at enhancing skeletons
based systems from different angles. First, we present a model we conceived that allows algorithmic skeletons customization exploiting the macro data-flow abstraction. Then we present two results about the exploitation of metaprogramming techniques for the run-time generation and optimization of macro data-flow graphs. In particular, we show how to generate and how to optimize macro data-flow graphs accordingly both to programmers provided non-functional requirements and to execution platform features. The last result we present are the Behavioural Skeletons, an approach aimed at addressing the limitations of skeletal programming environments when used for the development of component-based Grid applications.
We validated all the approaches conducting several test,
performed exploiting a set of tools we developed
- âŠ