3,767 research outputs found

    Using reconfigurable computing technology to accelerate matrix decomposition and applications

    Get PDF
    Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time

    An a posteriori verification method for generalized real-symmetric eigenvalue problems in large-scale electronic state calculations

    Full text link
    An a posteriori verification method is proposed for the generalized real-symmetric eigenvalue problem and is applied to densely clustered eigenvalue problems in large-scale electronic state calculations. The proposed method is realized by a two-stage process in which the approximate solution is computed by existing numerical libraries and is then verified in a moderate computational time. The procedure returns intervals containing one exact eigenvalue in each interval. Test calculations were carried out for organic device materials, and the verification method confirms that all exact eigenvalues are well separated in the obtained intervals. This verification method will be integrated into EigenKernel (https://github.com/eigenkernel/), which is middleware for various parallel solvers for the generalized eigenvalue problem. Such an a posteriori verification method will be important in future computational science.Comment: 15 pages, 7 figure

    Improving the Efficiency of FP-LAPW Calculations

    Full text link
    The full-potential linearized augmented-plane wave (FP-LAPW) method is well known to enable most accurate calculations of the electronic structure and magnetic properties of crystals and surfaces. The implementation of atomic forces has greatly increased it's applicability, but it is still generally believed that FP-LAPW calculations require substantial higher computational effort compared to the pseudopotential plane wave (PPW) based methods. In the present paper we analyse the FP-LAPW method from a computational point of view. Starting from an existing implementation (WIEN95 code), we identified the time consuming parts and show how some of them can be formulated more efficiently. In this context also the hardware architecture plays a crucial role. The remaining computational effort is mainly determined by the setup and diagonalization of the Hamiltonian matrix. For the latter, two different iterative schemes are compared. The speed-up gained by these optimizations is compared to the runtime of the ``original'' version of the code, and the PPW approach. We expect that the strategies described here, can also be used to speed up other computer codes, where similar tasks must be performed.Comment: 20 pages, 3 figures. Appears in Comp. Phys. Com. Other related publications can be found at http://www.rz-berlin.mpg.de/th/paper.htm

    A GPU-based hyperbolic SVD algorithm

    Get PDF
    A one-sided Jacobi hyperbolic singular value decomposition (HSVD) algorithm, using a massively parallel graphics processing unit (GPU), is developed. The algorithm also serves as the final stage of solving a symmetric indefinite eigenvalue problem. Numerical testing demonstrates the gains in speed and accuracy over sequential and MPI-parallelized variants of similar Jacobi-type HSVD algorithms. Finally, possibilities of hybrid CPU--GPU parallelism are discussed.Comment: Accepted for publication in BIT Numerical Mathematic

    Algebraic Multigrid for Disordered Systems and Lattice Gauge Theories

    Get PDF
    The construction of multigrid operators for disordered linear lattice operators, in particular the fermion matrix in lattice gauge theories, by means of algebraic multigrid and block LU decomposition is discussed. In this formalism, the effective coarse-grid operator is obtained as the Schur complement of the original matrix. An optimal approximation to it is found by a numerical optimization procedure akin to Monte Carlo renormalization, resulting in a generalized (gauge-path dependent) stencil that is easily evaluated for a given disorder field. Applications to preconditioning and relaxation methods are investigated.Comment: 43 pages, 14 figures, revtex4 styl
    corecore