Search CORE

74 research outputs found

ParIC : A Family of Parallel Incomplete Cholesky Preconditioners

Author: B.F. Smith
G. Haase
G.H. Golub
H.A. Vorst van der
I.S. Duff
I.S. Duff
J.A. Meijerink
J.A. Meijerink
J.J. Dongarra
M. Magolu monga
M. Magolu monga
M. Magolu monga
R. Beauwens
R.F. Barret
S. Doi
S. Doi
Y. Notay
Publication venue
Publication date: 01/05/2000
Field of study

A class of parallel incomplete factorization preconditionings for the solution of large linear systems is investigated. The approach may be regarded as a generalized domain decomposition method. Adjacent subdomains have to communicate during the setting up of the precon ditioner, and during the application of the preconditioner. Overlap is not necessary to achieve high performance. Fillin levels are considered in a global way. If necessary, the technique may be implemented as a global reordering of the unknowns. Experimental results are reported for twodimensional problems

Crossref

Utrecht University Repository

Knowledge-Based Automatic Generation of Linear Algebra Algorithms and Code

Author: Fabregat-Traver Diego
Publication venue
Publication date: 01/01/2013
Field of study

This dissertation focuses on the design and the implementation of domain-specific compilers for linear algebra matrix equations. The development of efficient libraries for such equations, which lie at the heart of most software for scientific computing, is a complex process that requires expertise in a variety of areas, including the application domain, algorithms, numerical analysis and high-performance computing. Moreover, the process involves the collaboration of several people for a considerable amount of time. With our compilers, we aim to relieve the developers from both designing algorithms and writing code, and to generate routines that match or even surpass the performance of those written by human experts.Comment: Dissertatio

arXiv.org e-Print Archive

Publikationsserver der RWTH Aachen University

Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor

Author: Broquedis Francois
Ferreira Lima Joao Vicente
Gautier Thierry
Raffin Bruno
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/10/2013
Field of study

International audienceThis paper presents preliminary performance comparisons of parallel applications developed natively for the Intel Xeon Phi accelerator using three different parallel programming environments and their associated runtime systems. We compare Intel OpenMP, Intel CilkPlus and XKaapi together on the same benchmark suite and we provide comparisons between an Intel Xeon Phi coprocessor and a Sandy Bridge Xeon-based machine. Our benchmark suite is composed of three computing kernels: a Fibonacci computation that allows to study the overhead and the scalability of the runtime system, a NQueens application generating irregular and dynamic tasks and a Cholesky factorization algorithm. We also compare the Cholesky factorization with the parallel algorithm provided by the Intel MKL library for Intel Xeon Phi. Performance evaluation shows our XKaapi data-flow parallel programming environment exposes the lowest overhead of all and is highly competitive with native OpenMP and CilkPlus environments on Xeon Phi. Moreover, the efficient handling of data-flow dependencies between tasks makes our XKaapi environment exhibit more parallelism for some applications such as the Cholesky factorization. In that case, we observe substantial gains with up to 180 hardware threads over the state of the art MKL, with a 47% performance increase for 60 hardware threads

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Automatic Structure Detection in Constraints of Tabular Data

Author: A. Hundepool
B.W. Kernighan
G. Karypis
G.B. Dantzig
J. Castro
J. Castro
J. Castro
J. Gondzio
J.F. Benders
J.J. Salazar
L.H. Cox
L.H. Cox
M. Fischetti
R.E. Bixby
S.J. Wright
S.P. Bradley
Y.F. Hu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Abstract. Methods for the protection of statistical tabular data—as controlled tabular adjustment, cell suppression, or controlled rounding— need to solve several linear programming subproblems. For large multi-dimensional linked and hierarchical tables, such subproblems turn out to be computationally challenging. One of the techniques used to reduce the solution time of mathematical programming problems is to exploit the constraints structure using some specialized algorithm. Two of the most usual structures are block-angular matrices with either linking rows (primal block-angular structure) or linking columns (dual block-angular structure). Although constraints associated to tabular data have intrin-sically a lot of structure, current software for tabular data protection neither detail nor exploit it, and simply provide a single matrix, or at most a set of smallest submatrices. We provide in this work an efficient tool for the automatic detection of primal or dual block-angular struc-ture in constraints matrices. We test it on some of the complex CSPLIB instances, showing that when the number of linking rows or columns is small, the computational savings are significant

CiteSeerX

Crossref

X-Kaapi: a Multi Paradigm Runtime for Multicore Architectures

Author: Faucher Vincent
Gautier Thierry
Lementec Fabien
Raffin Bruno
Publication venue: HAL CCSD
Publication date: 01/10/2013
Field of study

International audienceThe paper presents X-Kaapi, a compact runtime for multicore architec- tures that brings multi parallel paradigms (parallel independent loops, fork-join tasks and dataflow tasks) in a unified framework without performance penalty. Comparisons on independent loops with OpenMP and on dense linear algebra with QUARK/PLASMA confirm our design decisions. Applied to EUROPLEXUS, an industrial simulation code for fast transient dynamics, we show that X-Kaapi achieves high speedups on multicore architectures by efficiently parallelizing both independent loops and dataflow tasks.Ce rapport présente X-Kaapi, un support exécutif pour archi- tecture multi-cœur qui permet l'exploitation conjointe de plusieurs paradigmes de programmation parallèle (boucles indépendantes, fork-join, flot de don- nées). Les surcoûts à l'exécution sont faibles et nous présentons des compara- isons pour la programmation de boucles indépendantes avec OpenMP, et sur des problèmes en algèbre linéaire dense nous nous comparons à QUARK/- PLASMA. Enfin nous présentons les résultats obtenus lors de la parallélisa- tion du code EUROPLEXUS de dynamique rapide et qui utilise plusieurs de ces paradigmes

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-CEA

A bibliography on parallel and vector numerical algorithms

Author: Ortega J. M.
Voigt R. G.
Publication venue
Publication date
Field of study

This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

NASA Technical Reports Server

Multicore Performance of Block Algebraic Iterative Reconstruction Methods

Author: Hansen Per Christian
Sørensen Hans Henrik B.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2014
Field of study

Online Research Database In Technology

Graph partitioning using matrix values for preconditioning symmetric positive definite systems

Author: Saad Yousef
Sosonkina Masha
Vecharynski Eugene
Publication venue
Publication date: 17/11/2013
Field of study

Prior to the parallel solution of a large linear system, it is required to perform a partitioning of its equations/unknowns. Standard partitioning algorithms are designed using the considerations of the efficiency of the parallel matrix-vector multiplication, and typically disregard the information on the coefficients of the matrix. This information, however, may have a significant impact on the quality of the preconditioning procedure used within the chosen iterative scheme. In the present paper, we suggest a spectral partitioning algorithm, which takes into account the information on the matrix coefficients and constructs partitions with respect to the objective of enhancing the quality of the nonoverlapping additive Schwarz (block Jacobi) preconditioning for symmetric positive definite linear systems. For a set of test problems with large variations in magnitudes of matrix coefficients, our numerical experiments demonstrate a noticeable improvement in the convergence of the resulting solution scheme when using the new partitioning approach

arXiv.org e-Print Archive

Old Dominion University