36 research outputs found

    Efficient floating-point givens rotation unit

    This is a post-peer-review, pre-copyedit version of an article published in Circuits, Systems, and Signal Processing.High-throughput QR decomposition is a key operation in many advanced signal processing and communication applications. For some of these applications, using floating-point computation is becoming almost compulsory. However, there are scarce works in hardware implementations of floating-point QR decomposition for embedded systems. In this paper, we propose a very efficient high-throughput floating-point Givens rotation unit for QR decomposition. Moreover, the initial proposed design for conventional number formats is enhanced by using the new Half-Unit Biased format. The provided error analysis shows the effectiveness of our proposals and the trade-off of different implementation parameters. We also present FPGA implementation results and a thorough comparison between both approaches. These implementation results also reveal outstanding improvements compared to other previous similar designs in terms of area, latency, and throughput.This work was supported in part by following Spanish projects: TIN2016-80920-R, and JA2012 P12-TIC-169

    Application-specific instruction set processor for SoC implementation of modern signal processing algorithms

    Study of CORDIC based processing element for digital signal processing algorithms

    There is a high demand for the efficient implementation of complex arithmetic operations in many Digital Signal Processing (DSP) algorithms. The COordinate Rotation DIgital Computer (CORDIC) algorithm is suitable to be implemented in DSP algorithms since its calculation for complex arithmetic is simple and elegant. Besides, since it avoids using multiplications, adopting the CORDIC algorithm can reduce the complexity. Here, in this project CORDIC based processing element for the construction of digital signal processing algorithms is implemented. This is a flexible device that can be used in the implementation of functions such as Singular Value Decomposition (SVD), Discrete Cosine Transform (DCT) as well as many other important functions. It uses a CORDIC module to perform arithmetic operations and the result is a flexible computational processing element (PE) for digital signal processing algorithms. To implement the CORDIC based architectures for functions like SVD and DCT, it is required to decompose their computations in terms of CORDIC operations. SVD is widely used in digital signal processing applications such as direction estimation, recursive least squares (RLS) filtering and system identification. Two different Jacobi-type methods for SVD parallel computation are usually considered, namely the Kogbetliantz (two-sided rotation) and the Hestenes (one- sided rotation) method. Kogbetliantz’s method has been considered, because it is suitable for mapping onto CORDIC array architecture and highly suitable for parallel computation. Here in its implementation, CORDIC algorithm provides the arithmetic units required in the processing elements as these enable the efficient implementation of plane rotation and phase computation. Many fundamental aspects of linear algebra rely on determining the rank of a matrix, making the SVD an important and widely used technique. DCT is one of the most widely used transform techniques in digital signal processing and it computation involves many multiplications and additions. The DCT based on CORDIC algorithm does not need multipliers. Moreover, it has regularity and simple architecture and it is used to compress a wide variety of images by transferring data into frequency domain. These digital signal-processing algorithms are used in many applications. The purpose of this thesis is to describe a solution in which a conventional CORDIC system is used to implement an SVD and DCT processing elements. The approach presented combines the low circuit complexity with high performance

    The implementation and applications of multiple-valued logic

    Multiple-Valued Logic (MVL) takes two major forms. Multiple-valued circuits can implement the logic directly by using multiple-valued signals, or the logic can be implemented indirectly with binary circuits, by using more than one binary signal to represent a single multiple-valued signal. Techniques such as carry-save addition can be viewed as indirectly implemented MVL. Both direct and indirect techniques have been shown in the past to provide advantages over conventional arithmetic and logic techniques in algorithms required widely in computing for applications such as image and signal processing. It is possible to implement basic MVL building blocks at the transistor level. However, these circuits are difficult to design due to their non binary nature. In the design stage they are more like analogue circuits than binary circuits. Current integrated circuit technologies are biased towards binary circuitry. However, in spite of this, there is potential for power and area savings from MVL circuits, especially in technologies such as BiCMOS. This thesis shows that the use of voltage mode MVL will, in general not provide bandwidth increases on circuit buses because the buses become slower as the number of signal levels increases. Current mode MVL circuits however do have potential to reduce power and area requirements of arithmetic circuitry. The design of transistor level circuits is investigated in terms of a modern production technology. A novel methodology for the design of current mode MVL circuits is developed. The methodology is based upon the novel concept of the use of non-linear current encoding of signals, providing the opportunity for the efficient design of many previously unimplemented circuits in current mode MVL. This methodology is used to design a useful set of basic MVL building blocks, and fabrication results are reported. The creation of libraries of MVL circuits is also discussed. The CORDIC algorithm for two dimensional vector rotation is examined in detail as an example for indirect MVL implementation. The algorithm is extended to a set of three dimensional vector rotators using conventional arithmetic, redundant radix four arithmetic, and Taylor's series expansions. These algorithms can be used for two dimensional vector rotations in which no scale factor corrections are needed. The new algorithms are compared in terms of basic VLSI criteria against previously reported algorithms. A pipelined version of the redundant arithmetic algorithm is floorplanned and partially laid out to give indications of wiring overheads, and layout densities. An indirectly implemented MVL algorithm such as the CORDIC algorithm described in this thesis would clearly benefit from direct implementation in MVL

    Efficient arithmetic for high speed DSP implementation on FPGAs

    The author was sponsored by EnTegra Ltd, a company who develop hardware and software products and services for the real time implementation of DSP and RF systems. The field programmable gate array (FPGA) is being used increasingly in the field of DSP. This is due to the fact that the parallel computing power of such devices is ideal for today’s truly demanding DSP algorithms. Algorithms such as the QR-RLS update are computationally intensive and must be carried out at extremely high speeds (MHz). This means that the DSP processor is simply not an option. ASICs can be used but the expense of developing custom logic is prohibitive. The increased use of the FPGA in DSP means that there is a significant requirement for efficient arithmetic cores that utilises the resources on such devices. This thesis presents the research and development effort that was carried out to produce fixed point division and square root cores for use in a new Electronic Design Automation (EDA) tool for EnTegra, which is targeted at FPGA implementation of DSP systems. Further to this, a new technique for predicting the accuracy of CORDIC systems computing vector magnitudes and cosines/sines is presented. This work allows the most efficient CORDIC design for a specified level of accuracy to be found quickly and easily without the need to run lengthy simulations, as was the case before. The CORDIC algorithm is a technique using mainly shifts and additions to compute many arithmetic functions and is thus ideal for FPGA implementation

    Diseño de una arquitectura enbebida para el cálculo cinemático inverso de una extremidad robótica hexápoda de tres grados de libertad mediante el algoritmo CORDIC EN FPGA

    Este trabajo de investigación presenta el diseño de una arquitectura embebida para FPGA basada en CORDIC para un cálculo cinemático inverso de una extremidad robótica hexápoda de tres grados de libertad (3-DOF). Esta propuesta de diseño de arquitectura se aborda primero mediante un análisis de ecuaciones de cinemática inversa de una extremidad hexápoda de 3-DOF y como éstas son adaptadas para diseñar un esquema de arquitectura basada en operaciones de CORDIC. Después de esto, se analiza un área de trabajo de la extremidad del hexápodo de 3-DOF para obtener los requisitos de convergencia de CORDIC. Con respecto a esto, se diseñó una entidad CORDIC de punto flotante de 32 bits de alta precisión que alcanzó los requisitos de convergencia y precisión. Finalmente, se obtiene una comparación de los resultados obtenidos por la propuesta realizada y la realización de los cálculos cinemáticos en software, obteniéndose las ecuaciones de ángulos de articulación que ilustran la velocidad de procesamiento del FPGA, la precisión y los requerimientos de hardware.This research work presents a CORDIC-based FPGA realization for a three degree of free (3-DOF) hexapod leg inverse kinematics calculation. This proposal architecture design is approached first by a 3-DOF hexapod leg inverse kinematics equations analysis and how are these adaptations to design an architecture scheme based on CORDIC operations. After that, a 3-DOF hexapod leg work area is analyzed to get the CORDIC convergence requirements. Regarding to this, an iterative, high-accuracy, 32-bit floating point CORDIC entity was designed which achieved the convergence and accuracy requirements. Finally, a comparison of the results obtained by the proposal made and the realization of the kinematic calculations in software are obtained, obtaining the angles equations illustrating the precision, hardware requirements and processing speed.Tesi

    RTL implementation of one-sided jacobi algorithm for singular value decomposition

    Multi-dimensional digital signal processing such as image processing and image reconstruction involve manipulating of matrix data. Better quality images involve large amount of data, which result in unacceptably slow computation. A parallel processing scheme is a possible solution to solve this problem. This project presented an analysis and comparison to various algorithms for widely used matrix decomposition techniques and various computer architectures. As the result, a parallel implementation of one-sided Jacobi algorithm for computing singular value decomposition (SVD) of a 2х2 matrix on field programmable gate arrays (FPGA) is developed. The proposed SVD design is based on pipelined-datapath architecture The design process is started by evaluating the algorithm using Matlab, design datapath unit and control unit, coding in SystemVerilog HDL, verification and synthesis using Quartus II and simulated on ModelSim-Altera. The original matrix size of 4x4 and 8x8 is used to with the SVD processing element (PE). The result are compared with the Matlab version of the algorithm to evaluate the PE. The computation of SVD can be speed-up of more than 2 by increasing the number of PE at the cost of increased in circuit area

    A novel implementation of CORDIC algorithm using backward angle recoding (BAR)

