Search CORE

77 research outputs found

A Survey on Hardware-aware and Heterogeneous Computing on Multicore Processors and Accelerators

Author: Buchty Rainer
Heuveline Vincent
Karl Wolfgang
Weiß Jan-Philipp
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2009
Field of study

Parallel Smoothers for Matrix-based Multigrid Methods on Unstructured Meshes Using Multicore CPUs and GPUs

Author: Heuveline Vincent
Lukarski Dimitar
Trost Nico
Weiss Jan-Philipp
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2011
Field of study

Multigrid methods are efficient and fast solvers for problems typically modeled by partial differential equations of elliptic type. For problems with complex geometries and local singularities stencil-type discrete operators on equidistant Cartesian grids need to be replaced by more flexible concepts for unstructured meshes in order to properly resolve all problem-inherent specifics and for maintaining a moderate number of unknowns. However, flexibility in the meshes goes along with severe drawbacks with respect to parallel execution – especially with respect to the definition of adequate smoothers. This point becomes in particular pronounced in the framework of fine-grained parallelism on GPUs with hundreds of execution units. We use the approach of matrixbased multigrid that has high flexibility and adapts well to the exigences of modern computing platforms. In this work we investigate multi-colored Gauß-Seidel type smoothers, the power(q)-pattern enhanced multi-colored ILU(p) smoothers with fillins

CiteSeerX

KITopen

HONEI: A collection of libraries for numerical computations targeting multiple processor architectures.

Author: Geveler Markus
Gutwenger Carsten
Göddeke Dominik
Mallach Sven
Ribbrock Dirk
van Dyk Danny
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with two test applications, a Finite Element multigrid solver for the Poisson problem and a robust and fast simulation of shallow water waves. By linking against HONEI's libraries, we achieve a two-fold speedup over straight forward C++ code using HONEI's SSE backend, and additional 3--4 and 4--16 times faster execution on the Cell and a GPU. A second important aspect of our approach is that the full performance capabilities of the hardware under consideration can be exploited by adding optimised application-specific operations to the HONEI libraries. HONEI provides all necessary infrastructure for development and evaluation of such kernels, significantly simplifying their development

arXiv.org e-Print Archive

computer science publication server

Kölner UniversitätsPublikationsServer

Enhanced Parallel ILU(p)-based Preconditioners for Multi-core CPUs and GPUs - The Power(q)-pattern Method

Author: Heuveline Vincent
Lukarski Dimitar
Weiss Jan-Philipp
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2011
Field of study

KITopen

Maximizing Communication Overlap with Dynamic Program Analysis

Author: Iancu Costin
Lavrijsen Wim
Saillard Emmanuelle
Sen Koushik
Publication venue: HAL CCSD
Publication date: 28/01/2018
Field of study

International audienceWe present a dynamic program analysis approach to optimize communication overlap in scientific applications. Our tool instruments the code to generate a trace of the application's memory and synchronization behavior. An offline analysis determines the program optimal points for maximal overlap when considering several programming constructs: nonblocking one-sided communication operations, non-blocking collectives and bespoke synchronization patterns and operations. Feedback about possible transformations is presented to the user and the tool can perform the directed transformations, which are supported by a lightweight runtime. The value of our approach comes from: 1) the ability to optimize across boundaries of software modules or libraries, while specializing for the intrinsics of the underlying communication runtime; and 2) providing upper bounds on the expected performance improvements after communication optimizations. We have reduced the time spent in communication by as much as 64% for several applications that were already aggressively optimized for overlap; this indicates that manual optimizations leave untapped performance. Although demonstrated mainly for the UPC programming language, the methodology can be easily adapted to any other communication and synchronization API

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

An autotuning framework for Intel Xeon Phi platforms

Author: Christoforidis Eleftherios - Iordanis
Χριστοφορίδης Ελευθέριος - Ιορδάνης
Publication venue
Publication date: 15/09/2016
Field of study

DSpace at NTUA