Search CORE

80 research outputs found

A review of parallel processing approaches to robot kinematics and Jacobian

Author: Henrich Dominik
Karl Joachim
Woern Heinz
Publication venue
Publication date: 02/08/2007
Field of study

Due to continuously increasing demands in the area of advanced robot control, it became necessary to speed up the computation. One way to reduce the computation time is to distribute the computation onto several processing units. In this survey we present different approaches to parallel computation of robot kinematics and Jacobian. Thereby, we discuss both the forward and the reverse problem. We introduce a classification scheme and classify the references by this scheme

KITopen

Training a Linear Neural Network with a Stable LSP Solution for Jamming Cancellation

Author: Rachkovskij Dmitri
Revunova Elena
Publication venue: Institute of Information Theories and Applications FOI ITHEA
Publication date: 01/01/2005
Field of study

Two jamming cancellation algorithms are developed based on a stable solution of least squares problem (LSP) provided by regularization. They are based on filtered singular value decomposition (SVD) and modifications of the Greville formula. Both algorithms allow an efficient hardware implementation. Testing results on artificial data modeling difficult real-world situations are also provided

Bulgarian Digital Mathematics Library at IMI-BAS

Application of Gauss-Seidel method and singular value decomposition techniques to recursive least squares algorithm

Author: Malaş Atilla
Publication venue: Bilkent University
Publication date: 01/01/1991
Field of study

Ankara : The Department of Electrical and Electronics Engineering and the Institute of Engineering and Sciences of Bilkent University, 1991.Thesis (Master's) -- Bilkent University, 1991.Includes bibliographical references leaves 42-43System identification algorithms are utilized in many practical and theoretical applications such as parameter estimation of sj'stems, adaptive control and signal processing . Least squares algorithm is one of the most popular algorithms in system identification, but it has some drawbacks such as large time consumption and small convergence rates. In this thesis, Gauss-Seidel method is implemented on recursive least squares algorithm and convergence behaviors of the resultant algorithms are analyzed. .Also in standard recursive least squares algorithm the excitation of modes are monitored using data matrices and this algorithm is accordingly altered. A parallel scheme is proposed in this analysis for efficient computation of the modes. The simulation results are also presented.Malaş, AtillaM.S

Bilkent University Institutional Repository

HIGH PERFORMANCE, LOW COST SUBSPACE DECOMPOSITION AND POLYNOMIAL ROOTING FOR REAL TIME DIRECTION OF ARRIVAL ESTIMATION: ANALYSIS AND IMPLEMENTATION

Author: Athi Mrudula V.
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2014
Field of study

This thesis develops high performance real-time signal processing modules for direction of arrival (DOA) estimation for localization systems. It proposes highly parallel algorithms for performing subspace decomposition and polynomial rooting, which are otherwise traditionally implemented using sequential algorithms. The proposed algorithms address the emerging need for real-time localization for a wide range of applications. As the antenna array size increases, the complexity of signal processing algorithms increases, making it increasingly difficult to satisfy the real-time constraints. This thesis addresses real-time implementation by proposing parallel algorithms, that maintain considerable improvement over traditional algorithms, especially for systems with larger number of antenna array elements. Singular value decomposition (SVD) and polynomial rooting are two computationally complex steps and act as the bottleneck to achieving real-time performance. The proposed algorithms are suitable for implementation on field programmable gated arrays (FPGAs), single instruction multiple data (SIMD) hardware or application specific integrated chips (ASICs), which offer large number of processing elements that can be exploited for parallel processing. The designs proposed in this thesis are modular, easily expandable and easy to implement. Firstly, this thesis proposes a fast converging SVD algorithm. The proposed method reduces the number of iterations it takes to converge to correct singular values, thus achieving closer to real-time performance. A general algorithm and a modular system design are provided making it easy for designers to replicate and extend the design to larger matrix sizes. Moreover, the method is highly parallel, which can be exploited in various hardware platforms mentioned earlier. A fixed point implementation of proposed SVD algorithm is presented. The FPGA design is pipelined to the maximum extent to increase the maximum achievable frequency of operation. The system was developed with the objective of achieving high throughput. Various modern cores available in FPGAs were used to maximize the performance and details of these modules are presented in detail. Finally, a parallel polynomial rooting technique based on Newton’s method applicable exclusively to root-MUSIC polynomials is proposed. Unique characteristics of root-MUSIC polynomial’s complex dynamics were exploited to derive this polynomial rooting method. The technique exhibits parallelism and converges to the desired root within fixed number of iterations, making this suitable for polynomial rooting of large degree polynomials. We believe this is the first time that complex dynamics of root-MUSIC polynomial were analyzed to propose an algorithm. In all, the thesis addresses two major bottlenecks in a direction of arrival estimation system, by providing simple, high throughput, parallel algorithms

CiteSeerX

Michigan Technological University

Partition Pruning: Parallelization-Aware Pruning for Deep Neural Networks

Author: Albaqsami Ahmad
Bagherzadeh Nader
Jasemi Masoomeh
Shahhosseini Sina
Publication venue
Publication date: 27/02/2019
Field of study

Parameters of recent neural networks require a huge amount of memory. These parameters are used by neural networks to perform machine learning tasks when processing inputs. To speed up inference, we develop Partition Pruning, an innovative scheme to reduce the parameters used while taking into consideration parallelization. We evaluated the performance and energy consumption of parallel inference of partitioned models, which showed a 7.72x speed up of performance and a 2.73x reduction in the energy used for computing pruned layers of TinyVGG16 in comparison to running the unpruned model on a single accelerator. In addition, our method showed a limited reduction some numbers in accuracy while partitioning fully connected layers

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Using reconfigurable computing technology to accelerate matrix decomposition and applications

Author: Wang Xinying
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2016
Field of study

Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time

Digital Repository @ Iowa State University (ISU)

RTL implementation of one-sided jacobi algorithm for singular value decomposition

Author: Wan Mohamad Wan Ahmad Zainie
Publication venue
Publication date: 01/06/2016
Field of study

Multi-dimensional digital signal processing such as image processing and image reconstruction involve manipulating of matrix data. Better quality images involve large amount of data, which result in unacceptably slow computation. A parallel processing scheme is a possible solution to solve this problem. This project presented an analysis and comparison to various algorithms for widely used matrix decomposition techniques and various computer architectures. As the result, a parallel implementation of one-sided Jacobi algorithm for computing singular value decomposition (SVD) of a 2х2 matrix on field programmable gate arrays (FPGA) is developed. The proposed SVD design is based on pipelined-datapath architecture The design process is started by evaluating the algorithm using Matlab, design datapath unit and control unit, coding in SystemVerilog HDL, verification and synthesis using Quartus II and simulated on ModelSim-Altera. The original matrix size of 4x4 and 8x8 is used to with the SVD processing element (PE). The result are compared with the Matlab version of the algorithm to evaluate the PE. The computation of SVD can be speed-up of more than 2 by increasing the number of PE at the cost of increased in circuit area

Universiti Teknologi Malaysia Institutional Repository

Jacobi-type algorithms for eigenvalues on vector- and parallel computers

Author: Paardekooper M.H.C.
Publication venue
Publication date
Field of study

Algebra;Numerical Computation

Research Papers in Economics

Parallel algorithm for singular value decomposition as applied to failure tolerant manipulators, A

Author: Braun Tracy D.
Maciejewski Anthony A.
Siegel Howard Jay
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1999
Field of study

Includes bibliographical references (pages [348-349]).The system of equations that govern kinematically redundant manipulators is commonly solved by finding the singular value decomposition (SVD) of the corresponding Jacobian matrix. This can require considerable amounts of time to compute, thus a parallel SVD algorithm minimizing execution time is sought. The approach employed here lends itself to parallelization by using Givens rotations and information from previous decompositions. The key contributions of this research include the presentation and implementation of a new variation of a parallel SVD algorithm to compute the SVD for a set of post-fault Jacobians. Results from implementation of the algorithm on a MasPar MP-1 and an IBM SP2 are provided. Specific issues considered for each implementation include how data is mapped to the processing elements, the effect that increasing the number of processing elements has on execution time, and the type of parallel architecture used

Mountain Scholar (Digital Collections of Colorado and Wyoming)