80 research outputs found

    A review of parallel processing approaches to robot kinematics and Jacobian

    Get PDF
    Due to continuously increasing demands in the area of advanced robot control, it became necessary to speed up the computation. One way to reduce the computation time is to distribute the computation onto several processing units. In this survey we present different approaches to parallel computation of robot kinematics and Jacobian. Thereby, we discuss both the forward and the reverse problem. We introduce a classification scheme and classify the references by this scheme

    Training a Linear Neural Network with a Stable LSP Solution for Jamming Cancellation

    Get PDF
    Two jamming cancellation algorithms are developed based on a stable solution of least squares problem (LSP) provided by regularization. They are based on filtered singular value decomposition (SVD) and modifications of the Greville formula. Both algorithms allow an efficient hardware implementation. Testing results on artificial data modeling difficult real-world situations are also provided

    Application of Gauss-Seidel method and singular value decomposition techniques to recursive least squares algorithm

    Get PDF
    Ankara : The Department of Electrical and Electronics Engineering and the Institute of Engineering and Sciences of Bilkent University, 1991.Thesis (Master's) -- Bilkent University, 1991.Includes bibliographical references leaves 42-43System identification algorithms are utilized in many practical and theoretical applications such as parameter estimation of sj'stems, adaptive control and signal processing . Least squares algorithm is one of the most popular algorithms in system identification, but it has some drawbacks such as large time consumption and small convergence rates. In this thesis, Gauss-Seidel method is implemented on recursive least squares algorithm and convergence behaviors of the resultant algorithms are analyzed. .Also in standard recursive least squares algorithm the excitation of modes are monitored using data matrices and this algorithm is accordingly altered. A parallel scheme is proposed in this analysis for efficient computation of the modes. The simulation results are also presented.Malaş, AtillaM.S

    HIGH PERFORMANCE, LOW COST SUBSPACE DECOMPOSITION AND POLYNOMIAL ROOTING FOR REAL TIME DIRECTION OF ARRIVAL ESTIMATION: ANALYSIS AND IMPLEMENTATION

    Get PDF
    This thesis develops high performance real-time signal processing modules for direction of arrival (DOA) estimation for localization systems. It proposes highly parallel algorithms for performing subspace decomposition and polynomial rooting, which are otherwise traditionally implemented using sequential algorithms. The proposed algorithms address the emerging need for real-time localization for a wide range of applications. As the antenna array size increases, the complexity of signal processing algorithms increases, making it increasingly difficult to satisfy the real-time constraints. This thesis addresses real-time implementation by proposing parallel algorithms, that maintain considerable improvement over traditional algorithms, especially for systems with larger number of antenna array elements. Singular value decomposition (SVD) and polynomial rooting are two computationally complex steps and act as the bottleneck to achieving real-time performance. The proposed algorithms are suitable for implementation on field programmable gated arrays (FPGAs), single instruction multiple data (SIMD) hardware or application specific integrated chips (ASICs), which offer large number of processing elements that can be exploited for parallel processing. The designs proposed in this thesis are modular, easily expandable and easy to implement. Firstly, this thesis proposes a fast converging SVD algorithm. The proposed method reduces the number of iterations it takes to converge to correct singular values, thus achieving closer to real-time performance. A general algorithm and a modular system design are provided making it easy for designers to replicate and extend the design to larger matrix sizes. Moreover, the method is highly parallel, which can be exploited in various hardware platforms mentioned earlier. A fixed point implementation of proposed SVD algorithm is presented. The FPGA design is pipelined to the maximum extent to increase the maximum achievable frequency of operation. The system was developed with the objective of achieving high throughput. Various modern cores available in FPGAs were used to maximize the performance and details of these modules are presented in detail. Finally, a parallel polynomial rooting technique based on Newton’s method applicable exclusively to root-MUSIC polynomials is proposed. Unique characteristics of root-MUSIC polynomial’s complex dynamics were exploited to derive this polynomial rooting method. The technique exhibits parallelism and converges to the desired root within fixed number of iterations, making this suitable for polynomial rooting of large degree polynomials. We believe this is the first time that complex dynamics of root-MUSIC polynomial were analyzed to propose an algorithm. In all, the thesis addresses two major bottlenecks in a direction of arrival estimation system, by providing simple, high throughput, parallel algorithms

    Partition Pruning: Parallelization-Aware Pruning for Deep Neural Networks

    Full text link
    Parameters of recent neural networks require a huge amount of memory. These parameters are used by neural networks to perform machine learning tasks when processing inputs. To speed up inference, we develop Partition Pruning, an innovative scheme to reduce the parameters used while taking into consideration parallelization. We evaluated the performance and energy consumption of parallel inference of partitioned models, which showed a 7.72x speed up of performance and a 2.73x reduction in the energy used for computing pruned layers of TinyVGG16 in comparison to running the unpruned model on a single accelerator. In addition, our method showed a limited reduction some numbers in accuracy while partitioning fully connected layers

    Using reconfigurable computing technology to accelerate matrix decomposition and applications

    Get PDF
    Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time

    RTL implementation of one-sided jacobi algorithm for singular value decomposition

    Get PDF
    Multi-dimensional digital signal processing such as image processing and image reconstruction involve manipulating of matrix data. Better quality images involve large amount of data, which result in unacceptably slow computation. A parallel processing scheme is a possible solution to solve this problem. This project presented an analysis and comparison to various algorithms for widely used matrix decomposition techniques and various computer architectures. As the result, a parallel implementation of one-sided Jacobi algorithm for computing singular value decomposition (SVD) of a 2х2 matrix on field programmable gate arrays (FPGA) is developed. The proposed SVD design is based on pipelined-datapath architecture The design process is started by evaluating the algorithm using Matlab, design datapath unit and control unit, coding in SystemVerilog HDL, verification and synthesis using Quartus II and simulated on ModelSim-Altera. The original matrix size of 4x4 and 8x8 is used to with the SVD processing element (PE). The result are compared with the Matlab version of the algorithm to evaluate the PE. The computation of SVD can be speed-up of more than 2 by increasing the number of PE at the cost of increased in circuit area

    Parallel algorithm for singular value decomposition as applied to failure tolerant manipulators, A

    Get PDF
    Includes bibliographical references (pages [348-349]).The system of equations that govern kinematically redundant manipulators is commonly solved by finding the singular value decomposition (SVD) of the corresponding Jacobian matrix. This can require considerable amounts of time to compute, thus a parallel SVD algorithm minimizing execution time is sought. The approach employed here lends itself to parallelization by using Givens rotations and information from previous decompositions. The key contributions of this research include the presentation and implementation of a new variation of a parallel SVD algorithm to compute the SVD for a set of post-fault Jacobians. Results from implementation of the algorithm on a MasPar MP-1 and an IBM SP2 are provided. Specific issues considered for each implementation include how data is mapped to the processing elements, the effect that increasing the number of processing elements has on execution time, and the type of parallel architecture used
    corecore