29 research outputs found

    An Architecture for Improving Variable Radix Real and Complex Division Using Recurrence Division

    Get PDF
    International audienceThis paper shows the details of an implementation of variable radix floating-point complex division based on previous implementations of the algorithm. This implementation takes advantage of the easier prescaling offered by low-radix division and recodes it as necessary for higher radix iterations throughout the design. This, along with proper use of redundant digit sets, allows us to significantly altar performance characteristics relative to exclusively high-radix division implementations. Comparisons to existing architectures are shown, as well as common implementation optimizations for future iterations. Results are given in cmos32soi 32nm MTCMOS technology using ARMbased standard-cells and commercial EDA toolsets

    Carry-Free Radix-2 Subtractive Division Algorithm and Implementation of the Divider

    Get PDF
    [[abstract]]A carry-free subtractive division algorithm is proposed in this paper. In the conventional subtractive divider, adders are used to find both quotient bit and partial remainder. Carries are usually generated in the addition operation, and it may take time to finish the operation, therefore, the carry propagation delay usually is a bottleneck of the conventional subtractive divider. In this paper, a carry-free scheme is proposed by using signed bit representation to represent both quotient and partial remainder. During the arithmetic operation, a special technique is used to decide the quotient bit, and the new partial remainder can be found further by a table lookup-like method. The signed bit format of the quotient can be converted by on-the-fly conversion to the binary representation. Based on this algorithm a 32-b/32-b divider is designed and implemented, and the simulation shows that the divider works well.[[notice]]補正完畢[[incitationindex]]E

    Design and Implementation of a Radix-4 Complex Division Unit with Prescaling

    Get PDF
    International audienceWe present a design and implementation of a radix-4 complex division unit with prescaling of the operands. Specifically, we extend the treatment of the residual bound and errors due to the use of truncated redundant representation. The requirements for prescaling tables are simplified and a detailed specification of the table design is given. All principal components used in the design are described and the proposed optimizations are explained. The target platform for implementation was an Altera Stratix II FPGA [15] for which we report timing and area requirements. For a precision of 36 bits, the implementation uses 1093 ALUTs, achieving a latency of 97ns. The maximum clock frequency is 268.53 MHz

    A multi-radix approach to asynchronous division

    Get PDF
    The speed of high-radix digit-recurrence dividers is mainly determined by the hardware complexity of the quotient-digit selection function. In this paper we present a scheme that combines the area efficiency of bundled data with data-dependent computation time. In this scheme the selection function is very simple and may be implemented using a fast adder This function speculates the result digit and, when the speculation is incorrect, a correction of the quotient and of the residual must be performed. When the residual satisfies some constraints it is also possible to switch to a higher radix, computing a fraction of the next digit in advance. This results in a division scheme with a variable iteration time and a variable number of iterations and hence with an asynchronous behaviour Several designs were realized and compared both in terms of execution time and area. The fastest unit considered is a radix-64 divider that may switch to radix 128 or 256. Our evaluations show that area /spl times/ delay savings from 25% to 65%, compared to equivalent synchronous designs, may be achieved.Peer ReviewedPostprint (published version

    Reliable and Fault-Resilient Schemes for Efficient Radix-4 Complex Division

    Get PDF
    Complex division is commonly used in various applications in signal processing and control theory including astronomy and nonlinear RF measurements. Nevertheless, unless reliability and assurance are embedded into the architectures of such structures, the suboptimal (and thus erroneous) results could undermine the objectives of such applications. As such, in this thesis, we present schemes to provide complex number division architectures based on (Sweeney, Robertson, and Tocher) SRT-division with fault diagnosis mechanisms. Different fault resilient architectures are proposed in this thesis which can be tailored based on the eventual objectives of the designs in terms of area and time requirements, among which we pinpoint carefully the schemes based on recomputing with shifted operands (RESO) to be able to detect both natural and malicious faults and with proper modification achieve high throughputs. The design also implements a minimized look up table approach which favors in error detection based designs and provides high fault coverage with relatively-low overhead. Additionally, to benchmark the effectiveness of the proposed schemes, extensive fault diagnosis assessments are performed for the proposed designs through fault simulations and FPGA implementations; the design is implemented on Xilinx Spartan-VI and Xilinx Virtex-VI FPGA families

    Solving Systems of Linear Equations in Complex Domain : Complex E-Method

    Get PDF
    The E-method, introduced by Ercegovac, allows efficient parallel solution of diagonally dominant systems of linear equations in real domain using simple and highly regular hardware. Since the evaluation of polynomials and certain rational functions can be achieved by solving the corresponding linear systems, the E-method is an attractive general approach for function evaluation. We generalize the E-method to complex linear systems, and show some potential applications such as the evaluation of complex polynomials and rational functions

    Study of Recursive Divide Architectures and Implementation for Division and Multiplication

    Get PDF
    Multipliers have been key and critical components for most application-specific and general-purpose computer architectures. However, these architectures have been transitioning towards multiple cores that can process large amounts of data through parallel approaches to computation. Unfortunately, traditional arithmetic functional units that worked well for single-core architectures have the side effect of incurring large amounts of area and power. Consequently, multi-core architecture need new ways of thinking about increased throughput to handle large amounts of data. This work discusses implementation of different divider algorithms and presents a recursive high radix divide unit that is modified to handle both multiplication and division targeted at multi-core architectures. Results are obtained with a 65nm technology and show a significant decrease in area and power while still maintaining a low total latency by utilizing high radix encoding within the functional unit.School of Electrical & Computer Engineerin

    Complex Multiply-Add and Other Related Operators

    Get PDF
    International audienceIn this work, we present algorithms and schemes for computing common arithmetic expressions defined in the complex domain as hardware-implemented operators.The operators include Complex Multiply-Add (CMA: ab+c), Complex Sum of Producrs (CSP: ab+ce+f), Complex Sum of Squares (CSS: a^2+b^2) and complex Integer Powers. The proposed approach is to map the expression to a system of linear equations, apply a complex-to-real transform, and compute the solutions to the linear system using a digit-by-digit, the most significant digit first, recurrence method. The components of the solution vector corresponds to the expressions being evaluated. The number of digit cycles is about m for m-digit precision. The basic modules are similar to left-to-right multipliers. The interconnections between the modules are digit-wide

    High sample-rate Givens rotations for recursive least squares

    Get PDF
    The design of an application-specific integrated circuit of a parallel array processor is considered for recursive least squares by QR decomposition using Givens rotations, applicable in adaptive filtering and beamforming applications. Emphasis is on high sample-rate operation, which, for this recursive algorithm, means that the time to perform arithmetic operations is critical. The algorithm, architecture and arithmetic are considered in a single integrated design procedure to achieve optimum results. A realisation approach using standard arithmetic operators, add, multiply and divide is adopted. The design of high-throughput operators with low delay is addressed for fixed- and floating-point number formats, and the application of redundant arithmetic considered. New redundant multiplier architectures are presented enabling reductions in area of up to 25%, whilst maintaining low delay. A technique is presented enabling the use of a conventional tree multiplier in recursive applications, allowing savings in area and delay. Two new divider architectures are presented showing benefits compared with the radix-2 modified SRT algorithm. Givens rotation algorithms are examined to determine their suitability for VLSI implementation. A novel algorithm, based on the Squared Givens Rotation (SGR) algorithm, is developed enabling the sample-rate to be increased by a factor of approximately 6 and offering area reductions up to a factor of 2 over previous approaches. An estimated sample-rate of 136 MHz could be achieved using a standard cell approach and O.35pm CMOS technology. The enhanced SGR algorithm has been compared with a CORDIC approach and shown to benefit by a factor of 3 in area and over 11 in sample-rate. When compared with a recent implementation on a parallel array of general purpose (GP) DSP chips, it is estimated that a single application specific chip could offer up to 1,500 times the computation obtained from a single OP DSP chip
    corecore