Search CORE

2,034 research outputs found

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

A methodology for exploiting parallelism in the finite element process

Author: Adams L. M.
Voigt R. G.
Publication venue
Publication date
Field of study

A methodology is described for developing a parallel system using a top down approach taking into account the requirements of the user. Substructuring, a popular technique in structural analysis, is used to illustrate this approach

NASA Technical Reports Server

Solution of partial differential equations on vector and parallel computers

Author: Ortega J. M.
Voigt R. G.
Publication venue
Publication date
Field of study

The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

NASA Technical Reports Server

Where are the parallel algorithms?

Author: Voigt R. G.
Publication venue
Publication date
Field of study

Four paradigms that can be useful in developing parallel algorithms are discussed. These include computational complexity analysis, changing the order of computation, asynchronous computation, and divide and conquer. Each is illustrated with an example from scientific computation, and it is shown that computational complexity must be used with great care or an inefficient algorithm may be selected

NASA Technical Reports Server

Partially Ordered Two-way B\"uchi Automata

Author: A.P. Sistla
C. Baier
C.A. Kapoutsis
E.M. Clarke Jr.
J.-P. Pécuchet
J.-É. Pin
J.R. Büchi
K. Lodaya
K. Lodaya
M.Y. Vardi
O. Kupferman
T. Schwentick
W. Thomas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

We introduce partially ordered two-way B\"uchi automata and characterize their expressive power in terms of fragments of first-order logic FO[<]. Partially ordered two-way B\"uchi automata are B\"uchi automata which can change the direction in which the input is processed with the constraint that whenever a state is left, it is never re-entered again. Nondeterministic partially ordered two-way B\"uchi automata coincide with the first-order fragment Sigma2. Our main contribution is that deterministic partially ordered two-way B\"uchi automata are expressively complete for the first-order fragment Delta2. As an intermediate step, we show that deterministic partially ordered two-way B\"uchi automata are effectively closed under Boolean operations. A small model property yields coNP-completeness of the emptiness problem and the inclusion problem for deterministic partially ordered two-way B\"uchi automata.Comment: The results of this paper were presented at CIAA 2010; University of Stuttgart, Computer Scienc

arXiv.org e-Print Archive

Crossref

A hierarchically blocked Jacobi SVD algorithm for single and multiple graphics processing units

Author: Novaković Vedran
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 27/09/2014
Field of study

We present a hierarchically blocked one-sided Jacobi algorithm for the singular value decomposition (SVD), targeting both single and multiple graphics processing units (GPUs). The blocking structure reflects the levels of GPU's memory hierarchy. The algorithm may outperform MAGMA's dgesvd, while retaining high relative accuracy. To this end, we developed a family of parallel pivot strategies on GPU's shared address space, but applicable also to inter-GPU communication. Unlike common hybrid approaches, our algorithm in a single GPU setting needs a CPU for the controlling purposes only, while utilizing GPU's resources to the fullest extent permitted by the hardware. When required by the problem size, the algorithm, in principle, scales to an arbitrary number of GPU nodes. The scalability is demonstrated by more than twofold speedup for sufficiently large matrices on a Tesla S2050 system with four GPUs vs. a single Fermi card.Comment: Accepted for publication in SIAM Journal on Scientific Computin

arXiv.org e-Print Archive

CiteSeerX