Search CORE

382 research outputs found

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

A Survey on Hardware-aware and Heterogeneous Computing on Multicore Processors and Accelerators

Author: Buchty Rainer
Heuveline Vincent
Karl Wolfgang
Weiß Jan-Philipp
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2009
Field of study

KITopen

Characterizing the efficiency of multicore and manycore processors for the solution of sparse linear systems

Author: Aliaga Estellés José Ignacio
Barreda Vayá Maria
Dufrechou Ernesto
Ezzatti Pablo
Quintana-Orti Enrique S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

We analyze the efficiency of servers equipped with state-of-the-art general-purpose multicore processors as well as platforms based on accelerators such as graphics processing units (GPUs) and the Intel Xeon Phi. Following the proposal recently advocated in the High Performance Conjugate Gradient (HPCG) benchmark, we leverage for this purpose efficient implementations of ILUPACK, a preconditioned solver for sparse linear systems that comprises numerical kernels and data access patterns analogous to those of HPCG. Our study analyzes the (computational) performance and energy efficiency, with two different metrics for each: time/floating-point throughput for the former; and energy/floating-point throughput-per-Watt for the latter.This work was supported by the CICYT project TIN2011-23283 of MINECO and FEDER, and the EU Project FP7 318793 “EXA2GREEN”

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Repositori Institucional de la Universitat Jaume I

Energy-Aware High Performance Computing

Author: Dolz Manuel F.
Heuveline Vincent
I. Malossi A. Cristiano
Ludwig Thomas
Quintana-Orti Enrique S.
Reza Heidari M.
Wlotzka Martin
Publication venue: 'IntechOpen'
Publication date: 22/03/2017
Field of study

High performance computing centres consume substantial amounts of energy to power large-scale supercomputers and the necessary building and cooling infrastructure. Recently, considerable performance gains resulted predominantly from developments in multi-core, many-core and accelerator technology. Computing centres rapidly adopted this hardware to serve the increasing demand for computational power. However, further performance increases in large-scale computing systems are limited by the aggregate energy budget required to operate them. Power consumption has become a major cost factor for computing centres. Furthermore, energy consumption results in carbon dioxide emissions, a hazard for the environment and public health; and heat, which reduces the reliability and lifetime of hardware components. Energy efficiency is therefore crucial in high performance computing

IntechOpen

Performance and Energy Optimization of the Iterative Solution of Sparse Linear Systems on Multicore Processors

Author: Barreda Vayá María
Publication venue: 'Universitat Jaume I'
Publication date: 01/01/2017
Field of study

En esta tesis doctoral se aborda la solución de sistemas dispersos de ecuaciones lineales utilizando métodos iterativos precondicionados basados en subespacios de Krylov. En concreto, se centra en ILUPACK, una biblioteca que implementa precondicionadores de tipo ILU multinivel para la solución eficiente de sistemas lineales dispersos. El incremento en el número de ecuaciones, y la aparición de nuevas arquitecturas, motiva el desarrollo de una versión paralela de ILUPACK que optimice tanto el tiempo de ejecución como el consumo energético en arquitecturas multinúcleo actuales y en clusters de nodos construidos con esta tecnología. El objetivo principal de la tesis es el diseño, implementación y valuación de resolutores paralelos energéticamente eficientes para sistemas lineales dispersos orientados a procesadores multinúcleo así como aceleradores hardware como el Intel Xeon Phi. Para lograr este objetivo, se aprovecha el paralelismo de tareas mediante OmpSs y MPI, y se desarrolla un entorno automático para detectar ineficiencias energéticas.In this dissertation we target the solution of large sparse systems of linear equations using preconditioned iterative methods based on Krylov subspaces. Specifically, we focus on ILUPACK, a library that offers multi-level ILU preconditioners for the effective solution of sparse linear systems. The increase of the number of equations and the introduction of new HPC architectures motivates us to develop a parallel version of ILUPACK which optimizes both execution time and energy consumption on current multicore architectures and clusters of nodes built from this type of technology. Thus, the main goal of this thesis is the design, implementation and evaluation of parallel and energy-efficient iterative sparse linear system solvers for multicore processors as well as recent manycore accelerators such as the Intel Xeon Phi. To fulfill the general objective, we optimize ILUPACK exploiting task parallelism via OmpSs and MPI, and also develope an automatic framework to detect energy inefficiencies

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Tesis Doctorals en Xarxa

Repositori Institucional de la Universitat Jaume I

Parallel Smoothers for Matrix-based Multigrid Methods on Unstructured Meshes Using Multicore CPUs and GPUs

Author: Heuveline Vincent
Lukarski Dimitar
Trost Nico
Weiss Jan-Philipp
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2011
Field of study

Multigrid methods are efficient and fast solvers for problems typically modeled by partial differential equations of elliptic type. For problems with complex geometries and local singularities stencil-type discrete operators on equidistant Cartesian grids need to be replaced by more flexible concepts for unstructured meshes in order to properly resolve all problem-inherent specifics and for maintaining a moderate number of unknowns. However, flexibility in the meshes goes along with severe drawbacks with respect to parallel execution – especially with respect to the definition of adequate smoothers. This point becomes in particular pronounced in the framework of fine-grained parallelism on GPUs with hundreds of execution units. We use the approach of matrixbased multigrid that has high flexibility and adapts well to the exigences of modern computing platforms. In this work we investigate multi-colored Gauß-Seidel type smoothers, the power(q)-pattern enhanced multi-colored ILU(p) smoothers with fillins

CiteSeerX

KITopen

HONEI: A collection of libraries for numerical computations targeting multiple processor architectures.

Author: Geveler Markus
Gutwenger Carsten
Göddeke Dominik
Mallach Sven
Ribbrock Dirk
van Dyk Danny
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with two test applications, a Finite Element multigrid solver for the Poisson problem and a robust and fast simulation of shallow water waves. By linking against HONEI's libraries, we achieve a two-fold speedup over straight forward C++ code using HONEI's SSE backend, and additional 3--4 and 4--16 times faster execution on the Cell and a GPU. A second important aspect of our approach is that the full performance capabilities of the hardware under consideration can be exploited by adding optimised application-specific operations to the HONEI libraries. HONEI provides all necessary infrastructure for development and evaluation of such kernels, significantly simplifying their development

arXiv.org e-Print Archive

computer science publication server

Kölner UniversitätsPublikationsServer

An Experimental Study of Two-Level Schwarz Domain Decomposition Preconditioners on GPUs

Author: Heinlein Alexander
Rajamanickam Sivasankaran
Yamazaki Ichitaro
Publication venue
Publication date: 10/04/2023
Field of study

The generalized Dryja--Smith--Widlund (GDSW) preconditioner is a two-level overlapping Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlapping Schwarz preconditioner with an energy-minimizing coarse space. When used to accelerate the convergence rate of Krylov subspace iterative methods, the GDSW preconditioner provides robustness and scalability for the solution of sparse linear systems arising from the discretization of a wide range of partial different equations. In this paper, we present FROSch (Fast and Robust Schwarz), a domain decomposition solver package which implements GDSW-type preconditioners for both CPU and GPU clusters. To improve the solver performance on GPUs, we use a novel decomposition to run multiple MPI processes on each GPU, reducing both solver's computational and storage costs and potentially improving the convergence rate. This allowed us to obtain competitive or faster performance using GPUs compared to using CPUs alone. We demonstrate the performance of FROSch on the Summit supercomputer with NVIDIA V100 GPUs, where we used NVIDIA Multi-Process Service (MPS) to implement our decomposition strategy. The solver has a wide variety of algorithmic and implementation choices, which poses both opportunities and challenges for its GPU implementation. We conduct a thorough experimental study with different solver options including the exact or inexact solution of the local overlapping subdomain problems on a GPU. We also discuss the effect of using the iterative variant of the incomplete LU factorization and sparse-triangular solve as the approximate local solver, and using lower precision for computing the whole FROSch preconditioner. Overall, the solve time was reduced by factors of about

2\times

using GPUs, while the GPU acceleration of the numerical setup time depend on the solver options and the local matrix sizes.Comment: Accepted for publication in IPDPS'2

arXiv.org e-Print Archive