Search CORE

239 research outputs found

Minimizing Communication for Eigenproblems and the Singular Value Decomposition

Author: Ballard Grey
Demmel James
Dumitriu Ioana
Publication venue
Publication date: 01/01/2010
Field of study

Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds were presented on the amount of communication required for essentially all

O(n^3)

-like algorithms for linear algebra, including eigenvalue problems and the SVD. Conventional algorithms, including those currently implemented in (Sca)LAPACK, perform asymptotically more communication than these lower bounds require. In this paper we present parallel and sequential eigenvalue algorithms (for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms that do attain these lower bounds, and analyze their convergence and communication costs.Comment: 43 pages, 11 figure

arXiv.org e-Print Archive

CiteSeerX

Domain decomposition methods for the parallel computation of reacting flows

Author: Keyes David E.
Publication venue
Publication date
Field of study

Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors

NASA Technical Reports Server

Preconditioning for Sparse Linear Systems at the Dawn of the 21st Century: History, Current Developments, and Future Perspectives

Author: Massimiliano Ferronato
Publication venue
Publication date: 01/01/2012
Field of study

Iterative methods are currently the solvers of choice for large sparse linear systems of equations. However, it is well known that the key factor for accelerating, or even allowing for, convergence is the preconditioner. The research on preconditioning techniques has characterized the last two decades. Nowadays, there are a number of different options to be considered when choosing the most appropriate preconditioner for the specific problem at hand. The present work provides an overview of the most popular algorithms available today, emphasizing the respective merits and limitations. The overview is restricted to algebraic preconditioners, that is, general-purpose algorithms requiring the knowledge of the system matrix only, independently of the specific problem it arises from. Along with the traditional distinction between incomplete factorizations and approximate inverses, the most recent developments are considered, including the scalable multigrid and parallel approaches which represent the current frontier of research. A separate section devoted to saddle-point problems, which arise in many different applications, closes the paper

Directory of Open Access Journals

Open Access Repository

Archivio istituzionale della ricerca - Università di Padova

Streamlining of the state-dependent Riccati equation controller algorithm for an embedded implementation

Author: Katsev Sergey
Publication venue: RIT Scholar Works
Publication date: 01/11/2006
Field of study

In many practical control problems the dynamics of the plant to be controlled are nonlinear. However, in most cases the controller design is based on a linear approximation of the dynamics. One of the reasons for this is that, in general, nonlinear control design methods are difficult to apply to practical problems. The State Dependent Riccati Equation (SDRE) control approach is a relatively new practical approach to nonlinear control that has the simplicity of the classical Linear Quadratic control method. This approach has been recently applied to control experimental autonomous air vehicles with relative success. To make the SDRE approach practical in applications where the computational resources are limited and where the dynamic models are more complex it would be necessary to re-examine and streamline this control algorithm. The main objective of this work is to identify improvements that can be made to the implementation of the SDRE algorithm to improve its performance. This is accomplished by analyzing the structure of the algorithm and the underlying functions used to implement it. At the core of the SDRE algorithm is the solution, in real time, of an Algebraic Riccati Equation. The impact of the selection of a suitable algorithm to solve the Riccati Equation is analyzed. Three different algorithms were studied. Experimental results indicate that the Kleinman algorithm performs better than two other algorithms based on Newton’s method. This work also demonstrates that appropriately setting a maximum number of iterations for the Kleinman approach can improve the overall system performance without degrading accuracy significantly. Finally, a software implementation of the SDRE algorithm was developed and benchmarked to study the potential performance improvements of a hardware implementation. The test plant was an inverted pendulum simulation based on experimental hardware. Bottlenecks in the software implementation were identified and a possible hardware design to remove one such bottleneck was developed

RIT Scholar Works

On large-scale diagonalization techniques for the Anderson model of localization

Author: Brandes T.
Cain P.
Golub Gene
Matthias Bollhöfer
Olaf Schenk
Parlett Beresford
Rudolf A. Römer
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2005
Field of study

We propose efficient preconditioning algorithms for an eigenvalue problem arising in quantum physics, namely the computation of a few interior eigenvalues and their associated eigenvectors for large-scale sparse real and symmetric indefinite matrices of the Anderson model of localization. We compare the Lanczos algorithm in the 1987 implementation by Cullum and Willoughby with the shift-and-invert techniques in the implicitly restarted Lanczos method and in the Jacobi–Davidson method. Our preconditioning approaches for the shift-and-invert symmetric indefinite linear system are based on maximum weighted matchings and algebraic multilevel incomplete LDLT factorizations. These techniques can be seen as a complement to the alternative idea of using more complete pivoting techniques for the highly ill-conditioned symmetric indefinite Anderson matrices. We demonstrate the effectiveness and the numerical accuracy of these algorithms. Our numerical examples reveal that recent algebraic multilevel preconditioning solvers can accelerate the computation of a large-scale eigenvalue problem corresponding to the Anderson model of localization by several orders of magnitude

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors

Author: Dubey Pradeep
Heybrock Simon
Joó Bálint
Kalamkar Dhiraj D.
Smelyanskiy Mikhail
Vaidyanathan Karthikeyan
Wettig Tilo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/12/2014
Field of study

The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel Xeon Phi co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5.Comment: 12 pages, 7 figures, presented at Supercomputing 2014, November 16-21, 2014, New Orleans, Louisiana, USA, speaker Simon Heybrock; SC '14 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 69-80, IEEE Press Piscataway, NJ, USA (c)201

arXiv.org e-Print Archive

Crossref

VBARMS: A variable block algebraic recursive multilevel solver for sparse linear systems

Author: Liao Jia
Publication venue: 'University of Groningen Press'
Publication date: 01/01/2015
Field of study

ARTS repository - University of Groningen

VBARMS: A variable block algebraic recursive multilevel solver for sparse linear systems

Author: Liao Jia
Publication venue: 'University of Groningen Press'
Publication date: 01/01/2015
Field of study

Dissertations of the University of Groningen