39 research outputs found

    A novel implementation of CORDIC algorithm using backward angle recoding (BAR)

    Full text link

    High sample-rate Givens rotations for recursive least squares

    Get PDF
    The design of an application-specific integrated circuit of a parallel array processor is considered for recursive least squares by QR decomposition using Givens rotations, applicable in adaptive filtering and beamforming applications. Emphasis is on high sample-rate operation, which, for this recursive algorithm, means that the time to perform arithmetic operations is critical. The algorithm, architecture and arithmetic are considered in a single integrated design procedure to achieve optimum results. A realisation approach using standard arithmetic operators, add, multiply and divide is adopted. The design of high-throughput operators with low delay is addressed for fixed- and floating-point number formats, and the application of redundant arithmetic considered. New redundant multiplier architectures are presented enabling reductions in area of up to 25%, whilst maintaining low delay. A technique is presented enabling the use of a conventional tree multiplier in recursive applications, allowing savings in area and delay. Two new divider architectures are presented showing benefits compared with the radix-2 modified SRT algorithm. Givens rotation algorithms are examined to determine their suitability for VLSI implementation. A novel algorithm, based on the Squared Givens Rotation (SGR) algorithm, is developed enabling the sample-rate to be increased by a factor of approximately 6 and offering area reductions up to a factor of 2 over previous approaches. An estimated sample-rate of 136 MHz could be achieved using a standard cell approach and O.35pm CMOS technology. The enhanced SGR algorithm has been compared with a CORDIC approach and shown to benefit by a factor of 3 in area and over 11 in sample-rate. When compared with a recent implementation on a parallel array of general purpose (GP) DSP chips, it is estimated that a single application specific chip could offer up to 1,500 times the computation obtained from a single OP DSP chip

    Reliable Hardware Architectures of CORDIC Algorithm with Fixed Angle of Rotations

    Get PDF
    Fixed-angle rotation operation of vectors is widely used in signal processing, graphics, and robotics. Various optimized coordinate rotation digital computer (CORDIC) designs have been proposed for uniform rotation of vectors through known and specified angles. Nevertheless, in the presence of faults, such hardware architectures are potentially vulnerable. In this thesis, we propose efficient error detection schemes for two fixed-angle rotation designs, i.e., the Interleaved Scaling and Cascaded Single-rotation CORDIC. To the best of our knowledge, this work is the first in providing reliable architectures for these variants of CORDIC. The former is suitable for low-area applications and, hence, we propose recomputing with encoded operands schemes which add negligible area overhead to the designs. Moreover, the proposed error detection schemes for the latter variant are optimized for efficient applications which hamper the performance of the architectures negligibly. We present three variants of recomputing with encoded operands to detect both transient and permanent faults, coupled with signature-based schemes. The overheads of the proposed designs are assessed through Xilinx FPGA implementations and their effectiveness is benchmarked through error simulations. The results give confidence for the proposed efficient architectures which can be tailored based on the reliability requirements and the overhead to be tolerated

    Super - cordic: Low delay cordic architectures for computing complex functions

    Get PDF
    This thesis proposes an optimized Co-ordinate Rotation Digital Computer (CORDIC) algorithm in the rotation and extended vectoring mode of the circular co-ordinate system. The CORDIC algorithm computes the values of trigonometric functions and their inverses. The proposed algorithm provides the result with a lower overall latency than existing systems. This is done by using redundant representations and approximations of the required direction and angle of each rotation. The algorithm has been designed to provide the result in a fixed number of iterations nn for the rotation mode and 3n/2+n/23\lceil n/2 \rceil + \lfloor n/2 \rfloor for the extended vectoring mode; where, nn is a design parameter. In each iteration, the algorithm performs between 0 and p/np/n parallel rotations, where, pp is the number of precision bits and nn is the selected number of iterations. A technique to handle the scaling factor compensation for such an algorithm is proposed. The results of the functional verification for different values of nn and an estimation of the overall latency are presented. Based on the results, guidelines to choosing a value of nn to meet the required performance have also been presented.M.S

    Design of 2D discrete cosine transform using CORDIC architectures in VHDL

    Get PDF
    The Discrete Cosine Transform is one of the most widely transform techniques in digital signal processing. In addition, this is also most computationally intensive transforms which require many multiplications and additions. Real time data processing necessitates the use of special purpose hardware which involves hardware efficiency as well as high throughput. Many DCT algorithms were proposed in order to achieve high speed DCT. Those architectures which involves multipliers, for example Chen’s algorithm has less regular architecture due to complex routing and requires large silicon area. On the other hand, the DCT architecture based on distributed arithmetic (DA) which is also a multiplier less architecture has the inherent disadvantage of less throughputs because of the ROM access time and the need of accumulator. Also this DA algorithm requires large silicon area if it requires large ROM size. Systolic array architecture for the real-time DCT computation may have the large number of gates and clock skew problem. The other ways of implementation of DCT which involves in multiplierless, thus power efficient and which results in regular architecture and less complicated routing, consequently less area, simultaneously lead to high throughput. So for that purpose CORDIC seems to be a best solution. CORDIC offers a unified iterative formulation to efficiently evaluate the rotation operation. This thesis presents the implementation of 2D Discrete Cosine Transform (DCT) using the Angle Recoded (AR) Cordic algorithm, the new scaling less CORDIC algorithm and the conventional Chen’s algorithm which is multiplier dependant algorithm. The 2D DCT is implemented by exploiting the Separability property of 2D Discrete Cosine Transform. Here first one dimensional DCT is designed first and later a transpose buffer which consists of 64 memory elements, fully pipelined is designed. Later all these blocks are joined with the help of a controller element which is a mealy type FSM which produces some status signals also. The three resulting architectures are all well synthesized in Xilinx 9.1ise, simulated in Modelsim 5.6f and the power is calculated in Xilinx Xpower. Results prove that AR Cordic algorithm is better than Chen’s algorithm, even the new scaling less CORDIC algorithm

    Modified virtually scaling-free adaptive CORDIC rotator algorithm and architecture

    Full text link

    An Efficient Power Reduction in Multiplexer Based On Cordic Using Cadence-Digital IC Design

    Get PDF
    ABSTRACT: CORDIC is an iterative Algorithm to perform a wide range of functions including vector rotations, certain trigonometric, hyperbolic, linear and logarithmic functions. Both non pipelined and 2 level pipelined CORDIC with 8 stages, using two schemes was performed. First scheme was original unrolled CORDIC and second scheme was MUX based pipelined unrolled CORDIC. Compared to first scheme, the second scheme is more reliable, since the second scheme uses multiplexer and registers. By adding multiplexer the area is reduced comparatively to the first architecture, since the first scheme uses only addition, subtraction and shifting operation in all the 8 stages.8 iterations are performed and it is implemented on QUARTUS II software. The same is implemented in cadence tool and the results was compared with QUARTUS II and CADENCE TOOL. An efficient power reduction is obtained in CADENCE (Digital) implementation
    corecore