Search CORE

176 research outputs found

The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form

Author: C. Bischof
C. Lawson
David W. Walker
E. Anderson
G.C. Fox
G.H. Golub
J. Dongarra
J.J. Dongarra
J.J. Dongarra
J.J. Dongarra
J.J. Dongarra
J.J. Dongarra
J.J. Dongarra
Jack J. Dongarra
Jaeyoung Choi
R. Schreiber
W. Lichtenstein
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

The quest for petascale computing

Author: D.W. Walker
J.J. Dongarra
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Algorithmic redistribution methods for block-cyclic decompositions

Author: A.P. Petitet
J.J. Dongarra
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

An experience report on (auto-)tuning of mesh-based PDE solvers on shared memory systems.

Author: Charrier Dominic E.
Deelman Ewa
Dongarra J.J.
Karczewski Konrad
Weinzierl Tobias
Wyrzykowski Roman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

With the advent of manycore systems, shared memory parallelisation has gained importance in high performance computing. Once a code is decomposed into tasks or parallel regions, it becomes crucial to identify reasonable grain sizes, i.e. minimum problem sizes per task that make the algorithm expose a high concurrency at low overhead. Many papers do not detail what reasonable task sizes are, and consider their findings craftsmanship not worth discussion. We have implemented an autotuning algorithm, a machine learning approach, for a project developing a hyperbolic equation system solver. Autotuning here is important as the grid and task workload are multifaceted and change frequently during runtime. In this paper, we summarise our lessons learned. We infer tweaks and idioms for general autotuning algorithms and we clarify that such a approach does not free users completely from grain size awareness

Durham Research Online

Crossref

Parallel computation of echelon forms

Author: A. Buttari
C.-P. Jeannerod
F. Broquedis
F.G. Gustavson
J. Kurzak
J.-C. Faugère
J.-G. Dumas
J.-G. Dumas
J.J. Dongarra
J.V. Gathen
S. Toledo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

International audienceWe propose efficient parallel algorithms and implementations on shared memory architectures of LU factorization over a finite field. Compared to the corresponding numerical routines, we have identified three main difficulties specific to linear algebra over finite fields. First, the arithmetic complexity could be dominated by modular reductions. Therefore, it is mandatory to delay as much as possible these reductions while mixing fine-grain parallelizations of tiled iterative and recursive algorithms. Second, fast linear algebra variants, e.g., using Strassen-Winograd algorithm, never suffer from instability and can thus be widely used in cascade with the classical algorithms. There, trade-offs are to be made between size of blocks well suited to those fast variants or to load and communication balancing. Third, many applications over finite fields require the rank profile of the matrix (quite often rank deficient) rather than the solution to a linear system. It is thus important to design parallel algorithms that preserve and compute this rank profile. Moreover, as the rank profile is only discovered during the algorithm, block size has then to be dynamic. We propose and compare several block decomposition: tile iterative with left-looking, right-looking and Crout variants, slab and tile recursive. Experiments demonstrate that the tile recursive variant performs better and matches the performance of reference numerical software when no rank deficiency occur. Furthermore, even in the most heterogeneous case, namely when all pivot blocks are rank deficient, we show that it is possbile to maintain a high efficiency

HAL-ENS-LYON

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot

Fast DEM collision checks on multicore nodes.

Author: Deelman Ewa
Dongarra J.J.
Karczewski Konrad
Koziara Tomasz
Krestenitis Konstantinos
Weinzierl Tobias
Wyrzykowski Roman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Many particle simulations today rely on spherical or analytical particle shape descriptions. They find non-spherical, triangulated particle models computationally infeasible due to expensive collision detections. We propose a hybrid collision detection algorithm based upon an iterative solve of a minimisation problem that automatically falls back to a brute-force comparison-based algorithm variant if the problem is ill-posed. Such a hybrid can exploit the vector facilities of modern chips and it is well-prepared for the arising manycore era. Our approach pushes the boundary where non-analytical particle shapes and the aligning of more accurate first principle physics become manageable

Durham Research Online

Crossref

Materials used in implantology

Author: E. Anderson
E.S. Quintana-Ortí
I. Jonsson
I. Jonsson
J.J. Dongarra
M. Frigo
P. Bientinesi
Publication venue
Publication date: 01/01/2011
Field of study

Electronic archive of Tomsk Polytechnic University

Crossref

ParIC : A Family of Parallel Incomplete Cholesky Preconditioners

Author: B.F. Smith
G. Haase
G.H. Golub
H.A. Vorst van der
I.S. Duff
I.S. Duff
J.A. Meijerink
J.A. Meijerink
J.J. Dongarra
M. Magolu monga
M. Magolu monga
M. Magolu monga
R. Beauwens
R.F. Barret
S. Doi
S. Doi
Y. Notay
Publication venue
Publication date: 01/05/2000
Field of study

A class of parallel incomplete factorization preconditionings for the solution of large linear systems is investigated. The approach may be regarded as a generalized domain decomposition method. Adjacent subdomains have to communicate during the setting up of the precon ditioner, and during the application of the preconditioner. Overlap is not necessary to achieve high performance. Fillin levels are considered in a global way. If necessary, the technique may be implemented as a global reordering of the unknowns. Experimental results are reported for twodimensional problems

Crossref

Utrecht University Repository

Two-sided Grassmann-Rayleigh quotient iteration

Author: A. Edelman
A.M. Ostrowski
B.N. Parlett
D.S. Watkins
G. Peters
G.H. Golub
G.W. Stewart
G.W. Stewart
G.W. Stewart
I.C.F. Ipsen
J. Brandts
J.D. Gardiner
J.H. Wilkinson
J.J. Dongarra
J.W. Demmel
J.W. Rayleigh
K. Scharnhorst
L. Qiu
P. Benner
P. Van Dooren
P.-A. Absil
P.-A. Absil
P.-A. Absil
P.-A. Absil
P.-A. Absil
R. Lösche
R.H. Bartels
S. Batterson
S. Batterson
S.H. Crandall
V. Simoncini
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/03/2008
Field of study

The two-sided Rayleigh quotient iteration proposed by Ostrowski computes a pair of corresponding left-right eigenvectors of a matrix

C

. We propose a Grassmannian version of this iteration, i.e., its iterates are pairs of

p

-dimensional subspaces instead of one-dimensional subspaces in the classical case. The new iteration generically converges locally cubically to the pairs of left-right

p

-dimensional invariant subspaces of

C

. Moreover, Grassmannian versions of the Rayleigh quotient iteration are given for the generalized Hermitian eigenproblem, the Hamiltonian eigenproblem and the skew-Hamiltonian eigenproblem.Comment: The text is identical to a manuscript that was submitted for publication on 19 April 200

arXiv.org e-Print Archive

CiteSeerX

Crossref

DIAL UCLouvain

On the stability and uniqueness of the flow of a fluid through a porous medium

Author: A. A. Hill
A. Oberbeck
B. Straughan
D. Munaf
F. Franchi
H. Darcy
H.C. Brinkman
H.C. Brinkman
I. Samohyl
J. Boussinesq
J. Guo
J. Kampede Feriet
J. Serrin
J.J. Dongarra
K. Kannan
K. R. Rajagopal
K.R. Rajagopal
K.R. Rajagopal
K.R. Rajagopal
K.R. Rajagopal
L. Vergori
L.E. Payne
L.L. Richardson
O. Ladyzhenskaya
O. Reynolds
P. Forchheimer
R. Berker
R.J. Atkin
S. Rionero
S.C. Subramaniam
S.L. Synge
T.Y. Thomas
W.M. Orr
Y. Qin
Y. Qin
Y. Qin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

© 2016, The Author(s). In this short note, we study the stability of flows of a fluid through porous media that satisfies a generalization of Brinkman’s equation to include inertial effects. Such flows could have relevance to enhanced oil recovery and also to the flow of dense liquids through porous media. In any event, one cannot ignore the fact that flows through porous media are inherently unsteady, and thus, at least a part of the inertial term needs to be retained in many situations. We study the stability of the rest state and find it to be asymptotically stable. Next, we study the stability of a base flow and find that the flow is asymptotically stable, provided the base flow is sufficiently slow. Finally, we establish results concerning the uniqueness of the flow under appropriate conditions, and present some corresponding numerical results

Crossref

Springer - Publisher Connector

UWE Bristol Research Repository

Enlighten