Search CORE

10 research outputs found

A parallel tiled solver for dense symmetric indefinite systems on multicore

Author: Dulceneia Becker
Dulceneia Becker
Jack Dongarra
Jack Dongarra
Marc Baboulin
Marc Baboulin
Publication venue
Publication date: 12/10/2011
Field of study

architecture

Towards a Parallel Tile LDL Factorization for Multicore Architectures

Author: Dulceneia Becker
Jack Dongarra
Mathieu Faverge
Publication venue
Publication date: 01/01/2011
Field of study

The increasing number of cores in modern architectures requires the development of new algorithms as a means to achieving concurrency and hence scalability. This paper presents an algorithm to compute the LDL T factorization of symmetric indefinite matrices without taking pivoting into consideration. The algorithm, based on the factorizations presented by Buttari et al. [11], represents operations as a sequence of small tasks that operate on square blocks of data. These tasks can be scheduled for execution based on dependencies among them and on computational resources available. This allows an out of order execution of tasks that removes the intrinsically sequential nature of the factorization. Numerical and performance results are presented. Numerical results were limited to matrices for which pivoting is not numerically necessary. A performance comparison between LDL T, Cholesky and LU factorizations and the performance of the kernels required by LDL T, which are an extension of level-3 BLAS kernels, are presented

CiteSeerX

Hal-Diderot

Communication-Avoiding Symmetric-Indefinite Factorization

Author: Dulceneia Becker
Grey Ballard
Jack Dongarra
James Demmel
Publication venue
Publication date
Field of study

Abstract. We describe and analyze a novel symmetric triangular factorization algorithm. The algorithm is essentially a block version of Aasen’s triangular tridiagonalization. It factors a dense symmetric matrix A as the product A = P LT L T P T where P is a permutation matrix, L is lower triangular, and T is block tridiagonal and banded. The algorithm is the first symmetric-indefinite communication-avoiding factorization: it performs an asymptotically optimal amount of communication in a two-level memory hierarchy for almost any cache-line size. Adaptations of the algorithm to parallel computers are likely to be communication efficient as well; one such adaptation has been recently published. The current paper describes the algorithm, proves that it is numerically stable, and proves that it is communication optimal. 1

CiteSeerX

An efficient distributed randomized solver with application to large dense linear

Author: Anthony Danalis
Dulceneia Becker
George Bosilca
Jack Dongarra
Marc Baboulin
Publication venue
Publication date: 16/08/2012
Field of study

system

HAL-CentraleSupelec

CiteSeerX

HAL-Rennes 1

An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems

Author: Baboulin Marc
Becker Dulceneia
Bosilca George
Danalis Anthony
Dongarra Jack
Publication venue: 'Elsevier BV'
Publication date: 01/07/2014
Field of study

International audienceRandomized algorithms are gaining ground in high-performance computing applications as they have the potential to outperform deterministic methods, while still providing accurate results. We propose a randomized solver for distributed multicore architectures to efficiently solve large dense symmetric indefinite linear systems that are encountered, for instance, in parameter estimation problems or electromagnetism simulations. The contribution of this paper is to propose efficient kernels for applying random butterfly transformations and a new distributed implementation combined with a runtime (PaRSEC) that automatically adjusts data structures, data mappings, and the scheduling as systems scale up. Both the parallel distributed solver and the supporting runtime environment are innovative. To our knowledge, the randomization approach associated with this solver has never been used in public domain software for symmetric indefinite systems. The underlying runtime framework allows seamless data mapping and task scheduling, mapping its capabilities to the underlying hardware features of heterogeneous distributed architectures. The performance of our software is similar to that obtained for symmetric positive definite systems, but requires only half the execution time and half the amount of data storage of a general dense solver

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

a

Author: Alex Druinsky
Dulceneia Becker
Grey Ballard
Ichitaro Yamazaki
Inon Peled
Jack Dongarra
James Demmel
Oded Schwartz
Sivan Toledo
Publication venue
Publication date: 08/04/2013
Field of study

a blocked Aasen’s algorithm wit

CiteSeerX

Crossref

Communication-Avoiding Symmetric-Indefinite Factorization

Author: Alex Druinsky
Ballard G.
Davis T. A.
Dulceneia Becker
Grey Ballard
Higham N. J.
Ichitaro Yamazaki
Inon Peled
Jack Dongarra
James Demmel
Oded Schwartz
Parlett B. N.
Sivan Toledo
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date
Field of study

Crossref