170 research outputs found

    A study and comparison of COordinate Rotation DIgital Computer (CORDIC) architectures

    Full text link
    Most of the digital signal processing applications performs operations like multiplication, addition, square-root calculation, solving linear equations etc. The physical implementation of these operations consumes a lot of hardware and, software implementation consumes large memory. Even if they are implemented in hardware, they do not provide high speed, and due to this reason, even today the software implementation dominates hardware. For realizing operations from basic to very complex ones with less hardware, a Co-ordinate Rotation Digital Computer (CORDIC) proves beneficial. It is capable of performing mathematical operations right from addition to highly complex functions with the help of arithmetic unit and shifters only. This paper gives a brief overview of various existing CORDIC architectures, their working principle, application domain and a comparison of these architectures. Different designs are available as per the target, i.e. high accuracy and precision, low area, low latency, hardware efficient, low power, reconfigurability, etc. that can be used as per the application in which the architecture needs to be employed

    FPGA Implementation of Fast Fourier Transform Core Using NEDA

    Get PDF
    Transforms like DFT are a major block in communication systems such as OFDM, etc. This thesis reports architecture of a DFT core using NEDA. The advantage of the proposed architecture is that the entire transform can be implemented using adder/subtractors and shifters only, thus minimising the hardware requirement compared to other architectures. The proposed design is implemented for 16-bit data path (12–bit for comparison) considering both integer representation as well as fixed point representation, thus increasing the scope of usage. The proposed design is mapped on to Xilinx XC2VP30 FPGA, which is fabricated using 130 nm process technology. The maximum on board frequency of operation of the proposed design is 122 MHz. NEDA is one of the techniques to implement many signal processing systems that require multiply and accumulate units. FFT is one of the most employed blocks in many communication and signal processing systems. The FPGA implementation of a 16 point radix-4 complex FFT is proposed. The proposed design has improvement in terms of hardware utilization compared to traditional methods. The design has been implemented on a range of FPGAs to compare the performance. The maximum frequency achieved is 114.27 MHz on XC5VLX330 FPGA and the maximum throughput, 1828.32 Mbit/s and minimum slice delay product, 9.18. The design is also implemented using synopsys DC synthesis in both 65 nm and 180 nm technology libraries. The advantages of multiplier-less architectures are reduced hardware and improved latency. The multiplier-less architectures for the implementation of radix-2^2 folded pipelined complex FFT core are based on NEDA. The number of points considered in the work is sixteen and the folding is done by a factor of four. The proposed designs are implemented on Xilinx XC5VSX240T FPGA. Proposed designs based on NEDA have reduced area over 83%. The observed slice-delay product for NEDA based designs are 2.196 and 5.735

    Architectural implementation of cordic unit and its applications

    Get PDF
    The ubiquity of DSP has made increasing demand to develop area efficient and accurate architectures in carrying out many nonlinear arithmetic operations. One such architecture is CORDIC unit which has many applications in the field of DSP including implementing transforms based on Fourier basis. This report presents architecture of CORDIC, embedded with a scaling unit that has only minimal number of adders and shifters. It can be implemented in rotation mode as well as vectoring mode. The purpose of the design is to get a scaling free CORDIC unit preserving the design of original algorithm. The proposed design has a considerable reduction in hardware when compared with other scaling free architectures. The analysis of error for different word lengths and different input ranges for fixed word length gives a better choice to choose the parameters. The error in rotation mode for 16 bit data path, obtained for Y equivalent input is 0.073% and for X equivalent input is 0.067%. We also report architecture of a DFT core that is implemented using low latency CORDIC. A scaling unit has been included to get scaled outputs. The reported DFT core architecture has 22 adders in total, in addition to 2 CORDIC units. DDS or NCO are nowadays prominently used in the applications of RF signal processing, satellite communications, etc. This report also brings out the FPGA implementation of one such DDS which has quadrature outputs. The proposed DDS design, which is based on pipelined CORDIC, has considerable improvement in terms of SFDR compared to other existing designs at reduced hardware. This report also proposes multiplier-less architecture for the implementation of radix-2^2 folded pipelined complex FFT core based on CORDIC technique. The number of points considered in the work is sixteen and the folding is done by a factor of four

    An approach to the application of shift-and-add algorithms on engineering and industrial processes

    Get PDF
    Different kinds of algorithms can be chosen so as to compute elementary functions. Among all of them, it is worthwhile mentioning the shift-and-add algorithms due to the fact that they have been specifically designed to be very simple and to save computer resources. In fact, almost the only operations usually involved with these methods are additions and shifts, which can be easily and efficiently performed by a digital processor. Shift-and-add algorithms allow fairly good precision with low cost iterations. The most famous algorithm belonging to this type is CORDIC. CORDIC has the capability of approximating a wide variety of functions with only the help of a slight change in their iterations. In this paper, we will analyze the requirements of some engineering and industrial problems in terms of type of operands and functions to approximate. Then, we will propose the application of shift-and-add algorithms based on CORDIC to these problems. We will make a comparison between the different methods applied in terms of the precision of the results and the number of iterations required.This research was supported by the Conselleria de Educacion of the Valencia Region Government under grant number GV/2011/043

    Digital Fixed-Point Low Powered Area Efficient Function Estimation for Implantable Devices

    Get PDF
    This article introduces a new multiplier-less 32-bit fixed point architecture for estimating complex non-linear functions based on adapted shift only series expansions. This novel hardware structure has been proposed for use as a dedicated core unit in implantable medical devices. Its implementation in FPGA produces a mean squared error of 0.23% over the functions sin(x),cos(x),eix and tan−1(x) when compared to unrestricted CPU implementations. These results are achieved with the use of only 133 sliced registers and 399 Look-up-tables (LUTs). Furthermore, the hardware performs extremely well in our hardware-in-the-loop real use case application for the detection of epilepsy by correctly detecting true positive seizures. When implemented into 130 nm technology via GOOGLE Sky130 PDK and Openlane EDA tools, the ASIC occupies a space of 0.0625 mm2 which represents a 47% reduction when compared to competitors. In addition, its power consumption is reduced to 6.46 mW at 100 MHz fo and just 0.4 μW at 1KHz fo .Universidad Loyola Andaluci

    A 16-bit CORDIC rotator for high-performance wireless LAN

    No full text
    In this paper we propose a novel 16-bit low power CORDIC rotator that is used for high-speed wireless LAN. The algorithm converges to the final target angle by adaptively selecting appropriate iteration steps while keeping the scale factor virtually constant. The VLSI architecture of the proposed design eliminates the entire arithmetic hardware in the angle approximation datapath and reduces the number of iterations by 50% on an average. The cell area of the processor is 0.7 mm2 and it dissipates 7 mW power at 20 MHz frequency

    Studio e realizzazione di un'architettura VLSI di un processore per l'implementazione dell'algoritmo FFT

    Get PDF
    Poiché lo standard di connessione 5G è utilizzato da un numero sempre crescente di dispositivi e si sta evolvendo per soddisfare nuove esigenze e requisiti, è diventato fondamentale studiare e progettare nuovi trasmettitori e ricevitori più veloci ed efficienti. Un ruolo fondamentale nella connessione 5G è svolto dal multiplexing a divisione di frequenza ortogonale (OFDM), una metodologia di modulazione. Poiché la demodulazione è basata sulla trasformata di Fourier, lo scopo di questa tesi è realizzare un processore in grado di implementare algoritmi FFT e DFT su sequenze di lunghezza variabile che rispetti i criteri dello standard 5G. Per fare ciò, è stata prima condotta un'analisi del rapporto dell'Unione internazionale delle telecomunicazioni ITU-R M.2410-0 per definire i requisiti minimi per il processore. Successivamente, uno studio dello stato dell'arte per dispositivi simili ha portato allo sviluppo di un'architettura VLSI adatta all'applicazione. Una versione RTL dell'architettura è stata implementata in VHDL e testata.Since the 5G connection standard is utilized by a rising number of devices and is evolving to meet new needs and requirements, it has become crucial to study and design new, faster, and more efficient transmitters and receivers. A fundamental role in the 5G connection is played by Orthogonal frequency-division multiplexing (OFDM), an encoding methodology. Since the demodulation is based on the Fourier Transform, the purpose of this thesis is to realize a processor capable of implementing FFT and DFT algorithms on variable length sequences that complies with the 5G standard criteria. In order to do so, first an analysis of the International Telecommunication Union report ITU-R M.2410-0 has been conducted to define the minimum requirements for the processor. Then, a study of the state of the art for similar devices led to the development of a VLSI architecture suitable for the application. An RTL version of the architecture has been implemented in VHDL and tested

    High sample-rate Givens rotations for recursive least squares

    Get PDF
    The design of an application-specific integrated circuit of a parallel array processor is considered for recursive least squares by QR decomposition using Givens rotations, applicable in adaptive filtering and beamforming applications. Emphasis is on high sample-rate operation, which, for this recursive algorithm, means that the time to perform arithmetic operations is critical. The algorithm, architecture and arithmetic are considered in a single integrated design procedure to achieve optimum results. A realisation approach using standard arithmetic operators, add, multiply and divide is adopted. The design of high-throughput operators with low delay is addressed for fixed- and floating-point number formats, and the application of redundant arithmetic considered. New redundant multiplier architectures are presented enabling reductions in area of up to 25%, whilst maintaining low delay. A technique is presented enabling the use of a conventional tree multiplier in recursive applications, allowing savings in area and delay. Two new divider architectures are presented showing benefits compared with the radix-2 modified SRT algorithm. Givens rotation algorithms are examined to determine their suitability for VLSI implementation. A novel algorithm, based on the Squared Givens Rotation (SGR) algorithm, is developed enabling the sample-rate to be increased by a factor of approximately 6 and offering area reductions up to a factor of 2 over previous approaches. An estimated sample-rate of 136 MHz could be achieved using a standard cell approach and O.35pm CMOS technology. The enhanced SGR algorithm has been compared with a CORDIC approach and shown to benefit by a factor of 3 in area and over 11 in sample-rate. When compared with a recent implementation on a parallel array of general purpose (GP) DSP chips, it is estimated that a single application specific chip could offer up to 1,500 times the computation obtained from a single OP DSP chip

    VLSI architectures for high speed Fourier transform processing

    Get PDF