Search CORE

1,366 research outputs found

A Jacobi-based algorithm for computing symmetric eigenvalues and eigenvectors in a two-dimensional mesh

Author: González Colás Antonio María
Royo Vallés María Dolores
Valero García Miguel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1998
Field of study

The paper proposes an algorithm for computing symmetric eigenvalues and eigenvectors that uses a one-sided Jacobi approach and is targeted to a multicomputer in which nodes can be arranged as a two-dimensional mesh with an arbitrary number of rows and columns. The algorithm is analysed through simple analytical models of execution time, which show that an adequate choice of the mesh configuration (number of rows and columns) can improve performance significantly, with respect to a one-dimensional configuration, which is the most frequently considered scenario in current proposals. This improvement is especially noticeable in large systems.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

A GPU-based hyperbolic SVD algorithm

Author: A.H. Sameh
F.T. Luk
F.T. Luk
G.S. Sachdev
H. Zha
I. Slapničar
I. Slapničar
I. Slapničar
J.R. Bunch
K. Veselić
R. Mathias
R.P. Brent
S. Lahabar
S. Singer
S. Singer
S. Singer
S. Zhang
Sanja Singer
V. Hari
V. Hari
Vedran Novaković
Z. Drmač
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

A one-sided Jacobi hyperbolic singular value decomposition (HSVD) algorithm, using a massively parallel graphics processing unit (GPU), is developed. The algorithm also serves as the final stage of solving a symmetric indefinite eigenvalue problem. Numerical testing demonstrates the gains in speed and accuracy over sequential and MPI-parallelized variants of similar Jacobi-type HSVD algorithms. Finally, possibilities of hybrid CPU--GPU parallelism are discussed.Comment: Accepted for publication in BIT Numerical Mathematic

arXiv.org e-Print Archive

CiteSeerX

Crossref

FAMENA Repository

Three-Level Parallel J-Jacobi Algorithms for Hermitian Matrices

Author: Aleksandar Ušćumlić
Bečka
Bojanczyk
Brent
Bunch
Bunch
Davor Davidović
Demmel
Dopico
Drmač
Eberlein
Hansen
Hari
Hari
Higham
Krešimir Bokulić
Luk
Luk
Okša
Parlett
Royo
Rutishauser
Sanja Singer
Saša Singer
Shroff
Singer
Singer
Slapničar
Slapničar
van der Sluis
Vedran Novaković
Veselić
Whiteside
Zha
Zhou
Publication venue: 'Elsevier BV'
Publication date: 24/08/2010
Field of study

The paper describes several efficient parallel implementations of the one-sided hyperbolic Jacobi-type algorithm for computing eigenvalues and eigenvectors of Hermitian matrices. By appropriate blocking of the algorithms an almost ideal load balancing between all available processors/cores is obtained. A similar blocking technique can be used to exploit local cache memory of each processor to further speed up the process. Due to diversity of modern computer architectures, each of the algorithms described here may be the method of choice for a particular hardware and a given matrix size. All proposed block algorithms compute the eigenvalues with relative accuracy similar to the original non-blocked Jacobi algorithm.Comment: Submitted for publicatio

arXiv.org e-Print Archive

CiteSeerX

Crossref

FAMENA Repository

Full-text Institutional Repository of the Ruđer Bošković Institute

A hierarchically blocked Jacobi SVD algorithm for single and multiple graphics processing units

Author: Novaković Vedran
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 27/09/2014
Field of study

We present a hierarchically blocked one-sided Jacobi algorithm for the singular value decomposition (SVD), targeting both single and multiple graphics processing units (GPUs). The blocking structure reflects the levels of GPU's memory hierarchy. The algorithm may outperform MAGMA's dgesvd, while retaining high relative accuracy. To this end, we developed a family of parallel pivot strategies on GPU's shared address space, but applicable also to inter-GPU communication. Unlike common hybrid approaches, our algorithm in a single GPU setting needs a CPU for the controlling purposes only, while utilizing GPU's resources to the fullest extent permitted by the hardware. When required by the problem size, the algorithm, in principle, scales to an arbitrary number of GPU nodes. The scalability is demonstrated by more than twofold speedup for sufficiently large matrices on a Tesla S2050 system with four GPUs vs. a single Fermi card.Comment: Accepted for publication in SIAM Journal on Scientific Computin

arXiv.org e-Print Archive

CiteSeerX

Using reconfigurable computing technology to accelerate matrix decomposition and applications

Author: Wang Xinying
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2016
Field of study

Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time

Digital Repository @ Iowa State University (ISU)

A parallel implementation of Davidson methods for large-scale eigenvalue problems in SLEPc

Author: Balay S.
Campos C.
Eloy Romero
Freitag M. A.
Hochstenbach M. E.
Jose E. Roman
Sleijpen G. L. G.
Stathopoulos A.
van der Vorst H. A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2014
Field of study

In the context of large-scale eigenvalue problems, methods of Davidson type such as Jacobi-Davidson can be competitive with respect to other types of algorithms, especially in some particularly difficult situations such as computing interior eigenvalues or when matrix factorization is prohibitive or highly inefficient. However, these types of methods are not generally available in the form of high-quality parallel implementations, especially for the case of non-Hermitian eigenproblems. We present our implementation of various Davidson-type methods in SLEPc, the Scalable Library for Eigenvalue Problem Computations. The solvers incorporate many algorithmic variants for subspace expansion and extraction, and cover a wide range of eigenproblems including standard and generalized, Hermitian and non-Hermitian, with either real or complex arithmetic. We provide performance results on a large battery of test problems.This work was supported by the Spanish Ministerio de Ciencia e Innovacion under project TIN2009-07519. Author's addresses: E. Romero, Institut I3M, Universitat Politecnica de Valencia, Cami de Vera s/n, 46022 Valencia, Spain), and J. E. Roman, Departament de Sistemes Informatics i Computacio, Universitat Politecnica de Valencia, Cami de Vera s/n, 46022 Valencia, Spain; email: [email protected] Alcalde, E.; Román Moltó, JE. (2014). A parallel implementation of Davidson methods for large-scale eigenvalue problems in SLEPc. ACM Transactions on Mathematical Software. 40(2):13:01-13:29. https://doi.org/10.1145/2543696S13:0113:29402P. Arbenz, M. Becka, R. Geus, U. Hetmaniuk, and T. Mengotti. 2006. On a parallel multilevel preconditioned Maxwell eigensolver. Parallel Comput. 32, 2, 157--165.Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, Eds. 2000. Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Philadelphia, PA.C. G. Baker, U. L. Hetmaniuk, R. B. Lehoucq, and H. K. Thornquist. 2009. Anasazi software for the numerical solution of large-scale eigenvalue problems. ACM Trans. Math. Softw. 36, 3, 13:1--13:23.S. Balay, J. Brown, K. Buschelman, V. Eijkhout, W. Gropp, D. Kaushik, M. Knepley, L. C. McInnes, B. Smith, and H. Zhang. 2011. PETSc users manual. Tech. Rep. ANL-95/11-Revision 3.2, Argonne National Laboratory.S. Balay, W. D. Gropp, L. C. McInnes, and B. F. Smith. 1997. Efficient management of parallelism in object oriented numerical software libraries. In Modern Software Tools in Scientific Computing, E. Arge, A. M. Bruaset, and H. P. Langtangen, Eds., Birkhaüser, 163--202.M. A. Brebner and J. Grad. 1982. Eigenvalues of Ax =λ Bx for real symmetric matrices A and B computed by reduction to a pseudosymmetric form and the HR process. Linear Algebra Appl. 43, 99--118.C. Campos, J. E. Roman, E. Romero, and A. Tomas. 2011. SLEPc users manual. Tech. Rep. DSICII/24/02 - Revision 3.2, D. Sistemes Informàtics i Computació, Universitat Politècnica de València. http://www.grycap.upv.es/slepc.T. Dannert and F. Jenko. 2005. Gyrokinetic simulation of collisionless trapped-electronmode turbulence. Phys. Plasmas 12, 7, 072309.E. R. Davidson. 1975. The iterative calculation of a few of the lowest eigenvalues and corresponding eigenvectors of large real-symmetric matrices. J. Comput. Phys. 17, 1, 87--94.T. A. Davis and Y. Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, 1:1--1:25.H. C. Elman, A. Ramage, and D. J. Silvester. 2007. Algorithm 866: IFISS, a Matlab toolbox for modelling incompressible flow. ACM Trans. Math. Softw. 33, 2. Article 14.T. Ericsson and A. Ruhe. 1980. The spectral transformation Lanczos method for the numerical solution of large sparse generalized symmetric eigenvalue problems. Math. Comp. 35, 152, 1251--1268.M. Ferronato, C. Janna, and G. Pini. 2012. Efficient parallel solution to large-size sparse eigenproblems with block FSAI preconditioning. Numer. Linear Algebra Appl. 19, 5, 797--815.D. R. Fokkema, G. L. G. Sleijpen, and H. A. van der Vorst. 1998. Jacobi--Davidson style QR and QZ algorithms for the reduction of matrix pencils. SIAM J. Sci. Comput. 20, 1, 94--125.M. A. Freitag and A. Spence. 2007. Convergence theory for inexact inverse iteration applied to the generalised nonsymmetric eigenproblem. Electron. Trans. Numer. Anal. 28, 40--64.M. Genseberger. 2010. Improving the parallel performance of a domain decomposition preconditioning technique in the Jacobi-Davidson method for large scale eigenvalue problems. App. Numer. Math. 60, 11, 1083--1099.V. Hernandez, J. E. Roman, and A. Tomas. 2007. Parallel Arnoldi eigensolvers with enhanced scalability via global communications rearrangement. Parallel Comput. 33, 7--8, 521--540.V. Hernandez, J. E. Roman, and V. Vidal. 2005. SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Softw. 31, 3, 351--362.V. Heuveline, B. Philippe, and M. Sadkane. 1997. Parallel computation of spectral portrait of large matrices by Davidson type methods. Numer. Algor. 16, 1, 55--75.M. E. Hochstenbach. 2005a. Generalizations of harmonic and refined Rayleigh-Ritz. Electron. Trans. Numer. Anal. 20, 235--252.M. E. Hochstenbach. 2005b. Variations on harmonic Rayleigh--Ritz for standard and generalized eigenproblems. Preprint, Department of Mathematics, Case Western Reserve University.M. E. Hochstenbach and Y. Notay. 2006. The Jacobi--Davidson method. GAMM Mitt. 29, 2, 368--382.F.-N. Hwang, Z.-H. Wei, T.-M. Huang, and W. Wang. 2010. A parallel additive Schwarz preconditioned Jacobi-Davidson algorithm for polynomial eigenvalue problems in quantum dot simulation. J. Comput. Phys. 229, 8, 2932--2947.A. V. Knyazev. 2001. Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method. SIAM J. Sci. Comput. 23, 2, 517--541.A. V. Knyazev, M. E. Argentati, I. Lashuk, and E. E. Ovtchinnikov. 2007. Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) in HYPRE and PETSc. SIAM J. Sci. Comput. 29, 5, 2224--2239.J. Kopal, M. Rozložník, M. Tuma, and A. Smoktunowicz. 2012. Rounding error analysis of orthogonalization with a non-standard inner product. Numer. Math. 52, 4, 1035--1058.D. Kressner. 2006. Block algorithms for reordering standard and generalized Schur forms. ACM Trans. Math. Softw. 32, 4, 521--532.R. B. Lehoucq, D. C. Sorensen, and C. Yang. 1998. ARPACK Users' Guide, Solution of Large-Scale Eigenvalue Problems by Implicitly Restarted Arnoldi Methods. SIAM, Philadelphia, PA.Z. Li, Y. Saad, and M. Sosonkina. 2003. pARMS: a parallel version of the algebraic recursive multilevel solver. Numer. Linear Algebra Appl. 10, 5--6, 485--509.J. R. McCombs and A. Stathopoulos. 2006. Iterative validation of eigensolvers: a scheme for improving the reliability of Hermitian eigenvalue solvers. SIAM J. Sci. Comput. 28, 6, 2337--2358.F. Merz, C. Kowitz, E. Romero, J. E. Roman, and F. Jenko. 2012. Multi-dimensional gyrokinetic parameter studies based on eigenvalues computations. Comput. Phys. Commun. 183, 4, 922--930.R. B. Morgan. 1990. Davidson's method and preconditioning for generalized eigenvalue problems. J. Comput. Phys. 89, 241--245.R. B. Morgan. 1991. Computing interior eigenvalues of large matrices. Linear Algebra Appl. 154--156, 289--309.R. B. Morgan and D. S. Scott. 1986. Generalizations of Davidson's method for computing eigenvalues of sparse symmetric matrices. SIAM J. Sci. Statist. Comput. 7, 3, 817--825.R. Natarajan and D. Vanderbilt. 1989. A new iterative scheme for obtaining eigenvectors of large, real-symmetric matrices. J. Comput. Phys. 82, 1, 218--228.M. Nool and A. van der Ploeg. 2000. A parallel Jacobi--Davidson-type method for solving large generalized eigenvalue problems in magnetohydrodynamics. SIAM J. Sci. Comput. 22, 1, 95--112.J. Olsen, P. Jørgensen, and J. Simons. 1990. Passing the one-billion limit in full configuration-interaction (FCI) calculations. Chem. Phys. Lett. 169, 6, 463--472.C. C. Paige, B. N. Parlett, and H. A. van der Vorst. 1995. Approximate solutions and eigenvalue bounds from Krylov subspaces. Numer. Linear Algebra Appl. 2, 2, 115--133.E. Romero and J. E. Roman. 2011. Computing subdominant unstable modes of turbulent plasma with a parallel Jacobi--Davidson eigensolver. Concur. Comput.: Pract. Exp. 23, 17, 2179--2191.Y. Saad. 1993. A flexible inner-outer preconditioned GMRES algorithm. SIAM J. Sci. Comput. 14, 2, 461--469.G. L. G. Sleijpen, A. G. L. Booten, D. R. Fokkema, and H. A. van der Vorst. 1996. Jacobi-Davidson type methods for generalized eigenproblems and polynomial eigenproblems. BIT 36, 3, 595--633.G. L. G. Sleijpen and H. A. van der Vorst. 1996. A Jacobi--Davidson iteration method for linear eigenvalue problems. SIAM J. Matrix Anal. Appl. 17, 2, 401--425.G. L. G. Sleijpen and H. A. van der Vorst. 2000. A Jacobi--Davidson iteration method for linear eigenvalue problems. SIAM Rev. 42, 2, 267--293.G. L. G. Sleijpen, H. A. van der Vorst, and E. Meijerink. 1998. Efficient expansion of subspaces in the Jacobi--Davidson method for standard and generalized eigenproblems. Electron. Trans. Numer. Anal. 7, 75--89.A. Stathopoulos. 2007. Nearly optimal preconditioned methods for Hermitian eigenproblems under limited memory. Part I: Seeking one eigenvalue. SIAM J. Sci. Comput. 29, 2, 481--514.A. Stathopoulos and J. R. McCombs. 2007. Nearly optimal preconditioned methods for Hermitian eigenproblems under limited memory. Part II: Seeking many eigenvalues. SIAM J. Sci. Comput. 29, 5, 2162--2188.A. Stathopoulos and J. R. McCombs. 2010. PRIMME: PReconditioned Iterative MultiMethod Eigensolver: Methods and software description. ACM Trans. Math. Softw. 37, 2, 21:1--21:30.A. Stathopoulos and Y. Saad. 1998. Restarting techniques for the (Jacobi-)Davidson symmetric eigenvalue methods. Electron. Trans. Numer. Anal. 7, 163--181.A. Stathopoulos, Y. Saad, and C. F. Fischer. 1995. Robust preconditioning of large, sparse, symmetric eigenvalue problems. J. Comput. Appl. Math. 64, 3, 197--215.A. Stathopoulos, Y. Saad, and K. Wu. 1998. Dynamic thick restarting of the Davidson, and the implicitly restarted Arnoldi methods. SIAM J. Sci. Comput. 19, 1, 227--245.G. W. Stewart. 2001. Matrix Algorithms. Volume II: Eigensystems. SIAM, Philadelphia, PA.H. A. van der Vorst. 2002. Computational methods for large eigenvalue problems. In Handbook of Numerical Analysis, P. G. Ciarlet and J. L. Lions, Eds., Vol. VIII, Elsevier, 3--179.H. A. van der Vorst. 2004. Modern methods for the iterative computation of eigenpairs of matrices of high dimension. Z. Angew. Math. Mech. 84, 7, 444--451.T. van Noorden and J. Rommes 2007. Computing a partial generalized real Schur form using the Jacobi--Davidson method. Numer. Linear Algebra Appl. 14, 3, 197--215.T. D. Young, E. Romero, and J. E. Roman. 2013. Parallel finite element density functional computations exploiting grid refinement and subspace recycling. Comput. Phys. Commun. 184, 1, 66--72

Crossref

RiuNet