74 research outputs found
ParIC : A Family of Parallel Incomplete Cholesky Preconditioners
A class of parallel incomplete factorization preconditionings
for the solution of large linear systems is investigated. The approach may
be regarded as a generalized domain decomposition method. Adjacent
subdomains have to communicate during the setting up of the preconÂ
ditioner, and during the application of the preconditioner. Overlap is
not necessary to achieve high performance. FillÂin levels are considered
in a global way. If necessary, the technique may be implemented as a
global reÂordering of the unknowns. Experimental results are reported
for twoÂdimensional problems
Knowledge-Based Automatic Generation of Linear Algebra Algorithms and Code
This dissertation focuses on the design and the implementation of
domain-specific compilers for linear algebra matrix equations. The development
of efficient libraries for such equations, which lie at the heart of most
software for scientific computing, is a complex process that requires expertise
in a variety of areas, including the application domain, algorithms, numerical
analysis and high-performance computing. Moreover, the process involves the
collaboration of several people for a considerable amount of time. With our
compilers, we aim to relieve the developers from both designing algorithms and
writing code, and to generate routines that match or even surpass the
performance of those written by human experts.Comment: Dissertatio
Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor
International audienceThis paper presents preliminary performance comparisons of parallel applications developed natively for the Intel Xeon Phi accelerator using three different parallel programming environments and their associated runtime systems. We compare Intel OpenMP, Intel CilkPlus and XKaapi together on the same benchmark suite and we provide comparisons between an Intel Xeon Phi coprocessor and a Sandy Bridge Xeon-based machine. Our benchmark suite is composed of three computing kernels: a Fibonacci computation that allows to study the overhead and the scalability of the runtime system, a NQueens application generating irregular and dynamic tasks and a Cholesky factorization algorithm. We also compare the Cholesky factorization with the parallel algorithm provided by the Intel MKL library for Intel Xeon Phi. Performance evaluation shows our XKaapi data-flow parallel programming environment exposes the lowest overhead of all and is highly competitive with native OpenMP and CilkPlus environments on Xeon Phi. Moreover, the efficient handling of data-flow dependencies between tasks makes our XKaapi environment exhibit more parallelism for some applications such as the Cholesky factorization. In that case, we observe substantial gains with up to 180 hardware threads over the state of the art MKL, with a 47% performance increase for 60 hardware threads
Automatic Structure Detection in Constraints of Tabular Data
Abstract. Methods for the protection of statistical tabular data—as controlled tabular adjustment, cell suppression, or controlled rounding— need to solve several linear programming subproblems. For large multi-dimensional linked and hierarchical tables, such subproblems turn out to be computationally challenging. One of the techniques used to reduce the solution time of mathematical programming problems is to exploit the constraints structure using some specialized algorithm. Two of the most usual structures are block-angular matrices with either linking rows (primal block-angular structure) or linking columns (dual block-angular structure). Although constraints associated to tabular data have intrin-sically a lot of structure, current software for tabular data protection neither detail nor exploit it, and simply provide a single matrix, or at most a set of smallest submatrices. We provide in this work an efficient tool for the automatic detection of primal or dual block-angular struc-ture in constraints matrices. We test it on some of the complex CSPLIB instances, showing that when the number of linking rows or columns is small, the computational savings are significant
X-Kaapi: a Multi Paradigm Runtime for Multicore Architectures
International audienceThe paper presents X-Kaapi, a compact runtime for multicore architec- tures that brings multi parallel paradigms (parallel independent loops, fork-join tasks and dataflow tasks) in a unified framework without performance penalty. Comparisons on independent loops with OpenMP and on dense linear algebra with QUARK/PLASMA confirm our design decisions. Applied to EUROPLEXUS, an industrial simulation code for fast transient dynamics, we show that X-Kaapi achieves high speedups on multicore architectures by efficiently parallelizing both independent loops and dataflow tasks.Ce rapport présente X-Kaapi, un support exécutif pour archi- tecture multi-cœur qui permet l'exploitation conjointe de plusieurs paradigmes de programmation parallèle (boucles indépendantes, fork-join, flot de don- nées). Les surcoûts à l'exécution sont faibles et nous présentons des compara- isons pour la programmation de boucles indépendantes avec OpenMP, et sur des problèmes en algèbre linéaire dense nous nous comparons à QUARK/- PLASMA. Enfin nous présentons les résultats obtenus lors de la parallélisa- tion du code EUROPLEXUS de dynamique rapide et qui utilise plusieurs de ces paradigmes
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Graph partitioning using matrix values for preconditioning symmetric positive definite systems
Prior to the parallel solution of a large linear system, it is required to
perform a partitioning of its equations/unknowns. Standard partitioning
algorithms are designed using the considerations of the efficiency of the
parallel matrix-vector multiplication, and typically disregard the information
on the coefficients of the matrix. This information, however, may have a
significant impact on the quality of the preconditioning procedure used within
the chosen iterative scheme. In the present paper, we suggest a spectral
partitioning algorithm, which takes into account the information on the matrix
coefficients and constructs partitions with respect to the objective of
enhancing the quality of the nonoverlapping additive Schwarz (block Jacobi)
preconditioning for symmetric positive definite linear systems. For a set of
test problems with large variations in magnitudes of matrix coefficients, our
numerical experiments demonstrate a noticeable improvement in the convergence
of the resulting solution scheme when using the new partitioning approach
- …