Search CORE

27 research outputs found

Efficient floating-point givens rotation unit

Author: Hormigo-Aguilar Javier
Muñoz Sergio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2021
Field of study

This is a post-peer-review, pre-copyedit version of an article published in Circuits, Systems, and Signal Processing.High-throughput QR decomposition is a key operation in many advanced signal processing and communication applications. For some of these applications, using floating-point computation is becoming almost compulsory. However, there are scarce works in hardware implementations of floating-point QR decomposition for embedded systems. In this paper, we propose a very efficient high-throughput floating-point Givens rotation unit for QR decomposition. Moreover, the initial proposed design for conventional number formats is enhanced by using the new Half-Unit Biased format. The provided error analysis shows the effectiveness of our proposals and the trade-off of different implementation parameters. We also present FPGA implementation results and a thorough comparison between both approaches. These implementation results also reveal outstanding improvements compared to other previous similar designs in terms of area, latency, and throughput.This work was supported in part by following Spanish projects: TIN2016-80920-R, and JA2012 P12-TIC-169

Repositorio Institucional Universidad de Málaga

Low-Complexity Multiplierless Constant Rotators Based on Combined Coefficient Selection and Shift-and-Add Implementation (CCSSI)

Author: Fahad Qureshi
Mario Garrido
Oscar Gustafsson
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Implementing FFT-based digital channelized receivers on FPGA platforms

Author: Garrido Gálvez Mario
Grajal de la Fuente Jesús
López Vallejo Marisa
Sánchez Marcos Miguel Ángel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

This paper presents an in-depth study of the implementation and characterization of fast Fourier transform (FFT) pipelined architectures suitable for broadband digital channelized receivers. When implementing the FFT algorithm on field-programmable gate array (FPGA) platforms, the primary goal is to maximize throughput and minimize area. Feedback and feedforward architectures have been analyzed regarding key design parameters: radix, bitwidth, number of points and stage scaling. Moreover, a simplification of the FFT algorithm, the monobit FFT, has been implemented in order to achieve faster real time performance in broadband digital receivers. The influence of the hardware implementation on the performance of digital channelized receivers has been analyzed in depth, revealing interesting implementation trade-offs which should be taken into account when designing this kind of signal processing systems on FPGA platforms

Publikationer från Linköpings universitet

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Archivo Digital UPM

Reliable Hardware Architectures of CORDIC Algorithm with Fixed Angle of Rotations

Author: Ramadoss Rajkumar
Publication venue: RIT Scholar Works
Publication date: 01/08/2015
Field of study

Fixed-angle rotation operation of vectors is widely used in signal processing, graphics, and robotics. Various optimized coordinate rotation digital computer (CORDIC) designs have been proposed for uniform rotation of vectors through known and specified angles. Nevertheless, in the presence of faults, such hardware architectures are potentially vulnerable. In this thesis, we propose efficient error detection schemes for two fixed-angle rotation designs, i.e., the Interleaved Scaling and Cascaded Single-rotation CORDIC. To the best of our knowledge, this work is the first in providing reliable architectures for these variants of CORDIC. The former is suitable for low-area applications and, hence, we propose recomputing with encoded operands schemes which add negligible area overhead to the designs. Moreover, the proposed error detection schemes for the latter variant are optimized for efficient applications which hamper the performance of the architectures negligibly. We present three variants of recomputing with encoded operands to detect both transient and permanent faults, coupled with signature-based schemes. The overheads of the proposed designs are assessed through Xilinx FPGA implementations and their effectiveness is benchmarked through error simulations. The results give confidence for the proposed efficient architectures which can be tailored based on the reliability requirements and the overhead to be tolerated

RIT Scholar Works

FPGA Implementation of Fast Fourier Transform Core Using NEDA

Author: Mankar Abhishek
Publication venue
Publication date: 01/01/2013
Field of study

Transforms like DFT are a major block in communication systems such as OFDM, etc. This thesis reports architecture of a DFT core using NEDA. The advantage of the proposed architecture is that the entire transform can be implemented using adder/subtractors and shifters only, thus minimising the hardware requirement compared to other architectures. The proposed design is implemented for 16-bit data path (12–bit for comparison) considering both integer representation as well as fixed point representation, thus increasing the scope of usage. The proposed design is mapped on to Xilinx XC2VP30 FPGA, which is fabricated using 130 nm process technology. The maximum on board frequency of operation of the proposed design is 122 MHz. NEDA is one of the techniques to implement many signal processing systems that require multiply and accumulate units. FFT is one of the most employed blocks in many communication and signal processing systems. The FPGA implementation of a 16 point radix-4 complex FFT is proposed. The proposed design has improvement in terms of hardware utilization compared to traditional methods. The design has been implemented on a range of FPGAs to compare the performance. The maximum frequency achieved is 114.27 MHz on XC5VLX330 FPGA and the maximum throughput, 1828.32 Mbit/s and minimum slice delay product, 9.18. The design is also implemented using synopsys DC synthesis in both 65 nm and 180 nm technology libraries. The advantages of multiplier-less architectures are reduced hardware and improved latency. The multiplier-less architectures for the implementation of radix-2^2 folded pipelined complex FFT core are based on NEDA. The number of points considered in the work is sixteen and the folding is done by a factor of four. The proposed designs are implemented on Xilinx XC5VSX240T FPGA. Proposed designs based on NEDA have reduced area over 83%. The observed slice-delay product for NEDA based designs are 2.196 and 5.735

ethesis@nitr

Study of CORDIC based processing element for digital signal processing algorithms

Author: S Syam Babu
Publication venue
Publication date: 01/01/2007
Field of study

There is a high demand for the efficient implementation of complex arithmetic operations in many Digital Signal Processing (DSP) algorithms. The COordinate Rotation DIgital Computer (CORDIC) algorithm is suitable to be implemented in DSP algorithms since its calculation for complex arithmetic is simple and elegant. Besides, since it avoids using multiplications, adopting the CORDIC algorithm can reduce the complexity. Here, in this project CORDIC based processing element for the construction of digital signal processing algorithms is implemented. This is a flexible device that can be used in the implementation of functions such as Singular Value Decomposition (SVD), Discrete Cosine Transform (DCT) as well as many other important functions. It uses a CORDIC module to perform arithmetic operations and the result is a flexible computational processing element (PE) for digital signal processing algorithms. To implement the CORDIC based architectures for functions like SVD and DCT, it is required to decompose their computations in terms of CORDIC operations. SVD is widely used in digital signal processing applications such as direction estimation, recursive least squares (RLS) filtering and system identification. Two different Jacobi-type methods for SVD parallel computation are usually considered, namely the Kogbetliantz (two-sided rotation) and the Hestenes (one- sided rotation) method. Kogbetliantz’s method has been considered, because it is suitable for mapping onto CORDIC array architecture and highly suitable for parallel computation. Here in its implementation, CORDIC algorithm provides the arithmetic units required in the processing elements as these enable the efficient implementation of plane rotation and phase computation. Many fundamental aspects of linear algebra rely on determining the rank of a matrix, making the SVD an important and widely used technique. DCT is one of the most widely used transform techniques in digital signal processing and it computation involves many multiplications and additions. The DCT based on CORDIC algorithm does not need multipliers. Moreover, it has regularity and simple architecture and it is used to compress a wide variety of images by transferring data into frequency domain. These digital signal-processing algorithms are used in many applications. The purpose of this thesis is to describe a solution in which a conventional CORDIC system is used to implement an SVD and DCT processing elements. The approach presented combines the low circuit complexity with high performance

ethesis@nitr

Architectural implementation of cordic unit and its applications

Author: Prasad N
Publication venue
Publication date: 01/01/2013
Field of study

The ubiquity of DSP has made increasing demand to develop area efficient and accurate architectures in carrying out many nonlinear arithmetic operations. One such architecture is CORDIC unit which has many applications in the field of DSP including implementing transforms based on Fourier basis. This report presents architecture of CORDIC, embedded with a scaling unit that has only minimal number of adders and shifters. It can be implemented in rotation mode as well as vectoring mode. The purpose of the design is to get a scaling free CORDIC unit preserving the design of original algorithm. The proposed design has a considerable reduction in hardware when compared with other scaling free architectures. The analysis of error for different word lengths and different input ranges for fixed word length gives a better choice to choose the parameters. The error in rotation mode for 16 bit data path, obtained for Y equivalent input is 0.073% and for X equivalent input is 0.067%. We also report architecture of a DFT core that is implemented using low latency CORDIC. A scaling unit has been included to get scaled outputs. The reported DFT core architecture has 22 adders in total, in addition to 2 CORDIC units. DDS or NCO are nowadays prominently used in the applications of RF signal processing, satellite communications, etc. This report also brings out the FPGA implementation of one such DDS which has quadrature outputs. The proposed DDS design, which is based on pipelined CORDIC, has considerable improvement in terms of SFDR compared to other existing designs at reduced hardware. This report also proposes multiplier-less architecture for the implementation of radix-2^2 folded pipelined complex FFT core based on CORDIC technique. The number of points considered in the work is sixteen and the folding is done by a factor of four

ethesis@nitr

Systolic VLSI chip for implementing orthogonal transforms, A

Author: Burleson Wayne P.
Endsley Neil H.
Gabriel Arthur R.
Scharf Louis L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1989
Field of study

Includes bibliographical references.This paper describes the design of a systolic VLSI chip for the implementation of signal processing algorithms that may be decomposed into a product of simple real rotations. These include orthogonal transformations. Applications of this chip include projections, discrete Fourier and cosine transforms, and geometrical transformations. Large transforms may be computed by "tiling" together many chips for increased throughput. A CMOS VLSI chip containing 138 000 transistors in a 5x3 array of rotators has been designed, fabricated, and tested. The chip has a 32-MHz clock and performs real rotations at a rate of 30 MHz. The systolic nature of the chip makes use of fully synchronous bit-serial interconnect and a very regular structure at the rotator and bit levels. A distributed arithmetic scheme is used to implement the matrix-vector multiplication of the rotation.This work was supported by Ball Aerospace, Boulder, CO, and by the Office of Naval Research, Electronics Branch, Arlington, VA, under Contract ONR 85-K-0693

Mountain Scholar (Digital Collections of Colorado and Wyoming)

The implementation and applications of multiple-valued logic

Author: Clarke Christopher Turrall
Publication venue
Publication date
Field of study

Multiple-Valued Logic (MVL) takes two major forms. Multiple-valued circuits can implement the logic directly by using multiple-valued signals, or the logic can be implemented indirectly with binary circuits, by using more than one binary signal to represent a single multiple-valued signal. Techniques such as carry-save addition can be viewed as indirectly implemented MVL. Both direct and indirect techniques have been shown in the past to provide advantages over conventional arithmetic and logic techniques in algorithms required widely in computing for applications such as image and signal processing. It is possible to implement basic MVL building blocks at the transistor level. However, these circuits are difficult to design due to their non binary nature. In the design stage they are more like analogue circuits than binary circuits. Current integrated circuit technologies are biased towards binary circuitry. However, in spite of this, there is potential for power and area savings from MVL circuits, especially in technologies such as BiCMOS. This thesis shows that the use of voltage mode MVL will, in general not provide bandwidth increases on circuit buses because the buses become slower as the number of signal levels increases. Current mode MVL circuits however do have potential to reduce power and area requirements of arithmetic circuitry. The design of transistor level circuits is investigated in terms of a modern production technology. A novel methodology for the design of current mode MVL circuits is developed. The methodology is based upon the novel concept of the use of non-linear current encoding of signals, providing the opportunity for the efficient design of many previously unimplemented circuits in current mode MVL. This methodology is used to design a useful set of basic MVL building blocks, and fabrication results are reported. The creation of libraries of MVL circuits is also discussed. The CORDIC algorithm for two dimensional vector rotation is examined in detail as an example for indirect MVL implementation. The algorithm is extended to a set of three dimensional vector rotators using conventional arithmetic, redundant radix four arithmetic, and Taylor's series expansions. These algorithms can be used for two dimensional vector rotations in which no scale factor corrections are needed. The new algorithms are compared in terms of basic VLSI criteria against previously reported algorithms. A pipelined version of the redundant arithmetic algorithm is floorplanned and partially laid out to give indications of wiring overheads, and layout densities. An indirectly implemented MVL algorithm such as the CORDIC algorithm described in this thesis would clearly benefit from direct implementation in MVL

Warwick Research Archives Portal Repository