Search CORE

1,237 research outputs found

A survey of the state of the art and focused research in range systems, task 2

Author: Yao K.
Publication venue
Publication date
Field of study

Contract generated publications are compiled which describe the research activities for the reporting period. Study topics include: equivalent configurations of systolic arrays; least squares estimation algorithms with systolic array architectures; modeling and equilization of nonlinear bandlimited satellite channels; and least squares estimation and Kalman filtering by systolic arrays

NASA Technical Reports Server

A bibliography on parallel and vector numerical algorithms

Author: Ortega J. M.
Voigt R. G.
Publication venue
Publication date
Field of study

This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

NASA Technical Reports Server

Hardware implementation of multiple-input multiple-output transceiver for wireless communication

Author: Han Bing
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2014
Field of study

This dissertation proposes an efficient hardware implementation scheme for iterative multi-input multi-output orthogonal frequency-division multiplexing (MIMO-OFDM) transceiver. The transmitter incorporates linear precoder designed with instantaneous channel state information (CSI). The receiver implements MMSE-IC (minimum mean square error interference cancelation) detector, channel estimator, low-density parity-check (LDPC) decoder and other supporting modules. The proposed implementation uses QR decomposition (QRD) of complex-valued matrices with four co-ordinate rotation digital computer (CORDIC) cores and back substitution to achieve the best tradeoff between resource and throughput. The MIMO system is used in field test and the results indicate that the instantaneous CSI varies very fast in practices and the performance of linear precoder designed with instantaneous CSI is limited. Instead, statistic CSI had to be used. This dissertation also proposes a higher-rank principle Kronecker model (PKM). That exploits the statistic CSI to simulate the fading channels. The PKM is constructed by decomposing the channel correlation matrices with the higher-order singular value decomposition (HOSVD) method. The proposed PKM-HOSVD model is validated by extensive field experiments conducted for 4-by-4 MIMO systems in both indoor and outdoor environments. The results confirm that the statistic CSI varies slowly and the PKM-HOSVD will be helpful in the design of linear precoders. --Abstract, page iv

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Architectures for block Toeplitz systems

Author: Bouras Ilias
Glentis George-Othon
Kalouptsidis Nicholas
Publication venue: Elsevier
Publication date: 01/01/1995
Field of study

In this paper efficient VLSI architectures of highly concurrent algorithms for the solution of block linear systems with Toeplitz or near-to-Toeplitz entries are presented. The main features of the proposed scheme are the use of scalar only operations, multiplications/divisions and additions, and the local communication which enables the development of wavefront array architecture. Both the mean squared error and the total squared error formulations are described and a variety of implementations are given

CiteSeerX

University of Twente Research Information

Using reconfigurable computing technology to accelerate matrix decomposition and applications

Author: Wang Xinying
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2016
Field of study

Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time

Digital Repository @ Iowa State University (ISU)

Computational structures for robotic computations

Author: Chang P. R.
Lee C. S. G.
Publication venue
Publication date
Field of study

The computational problem of inverse kinematics and inverse dynamics of robot manipulators by taking advantage of parallelism and pipelining architectures is discussed. For the computation of inverse kinematic position solution, a maximum pipelined CORDIC architecture has been designed based on a functional decomposition of the closed-form joint equations. For the inverse dynamics computation, an efficient p-fold parallel algorithm to overcome the recurrence problem of the Newton-Euler equations of motion to achieve the time lower bound of O(log sub 2 n) has also been developed

NASA Technical Reports Server

A high-accuracy optical linear algebra processor for finite element applications

Author: Casasent D.
Taylor B. K.
Publication venue
Publication date
Field of study

Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced

NASA Technical Reports Server

Efficient Parallel Algorithms and VLSI Architectures for Manipulator Jacobian Computation

Author: Lee C. S. G.
Yeung T. B.
Publication venue: 'Purdue University (bepress)'
Publication date: 01/05/1987
Field of study

Real-time computations of manipulator Jacobian are examined for executing on uniprocessor computers, parallel computers, and VLSI pipelines. The characteristics of the Jacobian equations are found to be in the form of the first-order linear recurrence. The time lower bound of computing the first-order linear recurrence, and hence the Jacobian, is of order O(N) on uniprocessor computers, and of order O(log2N) on parallel SIMD computers, where TV is the number of degrees-of-freedom of the manipulator. The Generalized-^ method, which achieves the time lower bound on uniprocessor computers, is derived to compute the Jacobian at any desired reference coordinate frame A; from the base coordinate frame to the end-effector coordinate frame. We find that if the reference coordinate frame k is in the range [3 , N—4], then the computational effort is the minimum. To reduce the computational complexity from the order of O (N) to O (log2N), we derive the parallel forward and backward recursive doubling algorithm to compute the Jacobian on parallel computers. Again, any reference coordinate frame k can be used, and the minimum computation occurs at k = (N—1)/2. To further reduce the Jacobian computation complexity, we design two VLSI systolic pipelined architectures. A linear VLSI pipe, which uses the least number of modular processors, takes 3N floating-point operations to compute the Jacobian, and a parallel VLSI pipe takes 3 floating-point operations. We also show that if the reference coordinate frame is selected at k — (N—1)/2, then the parallel pipe will require the least number of modular processors, and the communication paths are much shorter

Purdue E-Pubs

Solution of partial differential equations on vector and parallel computers

Author: Ortega J. M.
Voigt R. G.
Publication venue
Publication date
Field of study

The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

NASA Technical Reports Server