3,767 research outputs found
Using reconfigurable computing technology to accelerate matrix decomposition and applications
Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications.
The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions:
• We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices.
• We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices.
• We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each.
• We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns.
• By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture.
• We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update.
Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time
An a posteriori verification method for generalized real-symmetric eigenvalue problems in large-scale electronic state calculations
An a posteriori verification method is proposed for the generalized
real-symmetric eigenvalue problem and is applied to densely clustered
eigenvalue problems in large-scale electronic state calculations. The proposed
method is realized by a two-stage process in which the approximate solution is
computed by existing numerical libraries and is then verified in a moderate
computational time. The procedure returns intervals containing one exact
eigenvalue in each interval. Test calculations were carried out for organic
device materials, and the verification method confirms that all exact
eigenvalues are well separated in the obtained intervals. This verification
method will be integrated into EigenKernel (https://github.com/eigenkernel/),
which is middleware for various parallel solvers for the generalized eigenvalue
problem. Such an a posteriori verification method will be important in future
computational science.Comment: 15 pages, 7 figure
Improving the Efficiency of FP-LAPW Calculations
The full-potential linearized augmented-plane wave (FP-LAPW) method is well
known to enable most accurate calculations of the electronic structure and
magnetic properties of crystals and surfaces. The implementation of atomic
forces has greatly increased it's applicability, but it is still generally
believed that FP-LAPW calculations require substantial higher computational
effort compared to the pseudopotential plane wave (PPW) based methods.
In the present paper we analyse the FP-LAPW method from a computational point
of view. Starting from an existing implementation (WIEN95 code), we identified
the time consuming parts and show how some of them can be formulated more
efficiently. In this context also the hardware architecture plays a crucial
role. The remaining computational effort is mainly determined by the setup and
diagonalization of the Hamiltonian matrix. For the latter, two different
iterative schemes are compared. The speed-up gained by these optimizations is
compared to the runtime of the ``original'' version of the code, and the PPW
approach. We expect that the strategies described here, can also be used to
speed up other computer codes, where similar tasks must be performed.Comment: 20 pages, 3 figures. Appears in Comp. Phys. Com. Other related
publications can be found at http://www.rz-berlin.mpg.de/th/paper.htm
A GPU-based hyperbolic SVD algorithm
A one-sided Jacobi hyperbolic singular value decomposition (HSVD) algorithm,
using a massively parallel graphics processing unit (GPU), is developed. The
algorithm also serves as the final stage of solving a symmetric indefinite
eigenvalue problem. Numerical testing demonstrates the gains in speed and
accuracy over sequential and MPI-parallelized variants of similar Jacobi-type
HSVD algorithms. Finally, possibilities of hybrid CPU--GPU parallelism are
discussed.Comment: Accepted for publication in BIT Numerical Mathematic
Algebraic Multigrid for Disordered Systems and Lattice Gauge Theories
The construction of multigrid operators for disordered linear lattice
operators, in particular the fermion matrix in lattice gauge theories, by means
of algebraic multigrid and block LU decomposition is discussed. In this
formalism, the effective coarse-grid operator is obtained as the Schur
complement of the original matrix. An optimal approximation to it is found by a
numerical optimization procedure akin to Monte Carlo renormalization, resulting
in a generalized (gauge-path dependent) stencil that is easily evaluated for a
given disorder field. Applications to preconditioning and relaxation methods
are investigated.Comment: 43 pages, 14 figures, revtex4 styl
- …