Search CORE

946 research outputs found

An algorithm for the solution of dynamic linear programs

Author: Psiaki Mark L.
Publication venue
Publication date
Field of study

The algorithm's objective is to efficiently solve Dynamic Linear Programs (DLP) by taking advantage of their special staircase structure. This algorithm constitutes a stepping stone to an improved algorithm for solving Dynamic Quadratic Programs, which, in turn, would make the nonlinear programming method of Successive Quadratic Programs more practical for solving trajectory optimization problems. The ultimate goal is to being trajectory optimization solution speeds into the realm of real-time control. The algorithm exploits the staircase nature of the large constraint matrix of the equality-constrained DLPs encountered when solving inequality-constrained DLPs by an active set approach. A numerically-stable, staircase QL factorization of the staircase constraint matrix is carried out starting from its last rows and columns. The resulting recursion is like the time-varying Riccati equation from multi-stage LQR theory. The resulting factorization increases the efficiency of all of the typical LP solution operations over that of a dense matrix LP code. At the same time numerical stability is ensured. The algorithm also takes advantage of dynamic programming ideas about the cost-to-go by relaxing active pseudo constraints in a backwards sweeping process. This further decreases the cost per update of the LP rank-1 updating procedure, although it may result in more changes of the active set that if pseudo constraints were relaxed in a non-stagewise fashion. The usual stability of closed-loop Linear/Quadratic optimally-controlled systems, if it carries over to strictly linear cost functions, implies that the saving due to reduced factor update effort may outweigh the cost of an increased number of updates. An aerospace example is presented in which a ground-to-ground rocket's distance is maximized. This example demonstrates the applicability of this class of algorithms to aerospace guidance. It also sheds light on the efficacy of the proposed pseudo constraint relaxation scheme

NASA Technical Reports Server

A fast semi-direct least squares algorithm for hierarchically block separable matrices

Author: Greengard Leslie
Ho Kenneth L.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2014
Field of study

We present a fast algorithm for linear least squares problems governed by hierarchically block separable (HBS) matrices. Such matrices are generally dense but data-sparse and can describe many important operators including those derived from asymptotically smooth radial kernels that are not too oscillatory. The algorithm is based on a recursive skeletonization procedure that exposes this sparsity and solves the dense least squares problem as a larger, equality-constrained, sparse one. It relies on a sparse QR factorization coupled with iterative weighted least squares methods. In essence, our scheme consists of a direct component, comprised of matrix compression and factorization, followed by an iterative component to enforce certain equality constraints. At most two iterations are typically required for problems that are not too ill-conditioned. For an

M \times N

HBS matrix with

M \geq N

having bounded off-diagonal block rank, the algorithm has optimal

\mathcal{O} (M + N)

complexity. If the rank increases with the spatial dimension as is common for operators that are singular at the origin, then this becomes

\mathcal{O} (M + N)

in 1D,

\mathcal{O} (M + N^{3/2})

in 2D, and

\mathcal{O} (M + N^{2})

in 3D. We illustrate the performance of the method on both over- and underdetermined systems in a variety of settings, with an emphasis on radial basis function approximation and efficient updating and downdating.Comment: 24 pages, 8 figures, 6 tables; to appear in SIAM J. Matrix Anal. App

arXiv.org e-Print Archive

CiteSeerX

Using reconfigurable computing technology to accelerate matrix decomposition and applications

Author: Wang Xinying
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2016
Field of study

Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time

Digital Repository @ Iowa State University (ISU)

Efficient Randomized Algorithms for the Fixed-Precision Low-Rank Matrix Approximation

Author: Gu Yu
Li Yaohang
Yu Wenjian
Publication venue
Publication date: 01/01/2018
Field of study

Randomized algorithms for low-rank matrix approximation are investigated, with the emphasis on the fixed-precision problem and computational efficiency for handling large matrices. The algorithms are based on the so-called QB factorization, where Q is an orthonormal matrix. Firstly, a mechanism for calculating the approximation error in Frobenius norm is proposed, which enables efficient adaptive rank determination for large and/or sparse matrix. It can be combined with any QB-form factorization algorithm in which B's rows are incrementally generated. Based on the blocked randQB algorithm by P.-G. Martinsson and S. Voronin, this results in an algorithm called randQB EI. Then, we further revise the algorithm to obtain a pass-efficient algorithm, randQB FP, which is mathematically equivalent to the existing randQB algorithms and also suitable for the fixed-precision problem. Especially, randQB FP can serve as a single-pass algorithm for calculating leading singular values, under certain condition. With large and/or sparse test matrices, we have empirically validated the merits of the proposed techniques, which exhibit remarkable speedup and memory saving over the blocked randQB algorithm. We have also demonstrated that the single-pass algorithm derived by randQB FP is much more accurate than an existing single-pass algorithm. And with data from a scenic image and an information retrieval application, we have shown the advantages of the proposed algorithms over the adaptive range finder algorithm for solving the fixed-precision problem.Comment: 21 pages, 10 figure

arXiv.org e-Print Archive

Old Dominion University