29 research outputs found
An Architecture for Improving Variable Radix Real and Complex Division Using Recurrence Division
International audienceThis paper shows the details of an implementation of variable radix floating-point complex division based on previous implementations of the algorithm. This implementation takes advantage of the easier prescaling offered by low-radix division and recodes it as necessary for higher radix iterations throughout the design. This, along with proper use of redundant digit sets, allows us to significantly altar performance characteristics relative to exclusively high-radix division implementations. Comparisons to existing architectures are shown, as well as common implementation optimizations for future iterations. Results are given in cmos32soi 32nm MTCMOS technology using ARMbased standard-cells and commercial EDA toolsets
Carry-Free Radix-2 Subtractive Division Algorithm and Implementation of the Divider
[[abstract]]A carry-free subtractive division algorithm is proposed in this paper. In the conventional subtractive divider, adders are used to find both quotient bit and partial remainder. Carries are usually generated in the addition operation, and it may take time to finish the operation, therefore, the carry propagation delay usually is a bottleneck of the conventional subtractive divider. In this paper, a carry-free scheme is proposed by using signed bit representation to represent both quotient and partial remainder. During the arithmetic operation, a special technique is used to decide the quotient bit, and the new partial remainder can be found further by a table lookup-like method. The signed bit format of the quotient can be converted by on-the-fly conversion to the binary representation. Based on this algorithm a 32-b/32-b divider is designed and implemented, and the simulation shows that the divider works well.[[notice]]補正完畢[[incitationindex]]E
Design and Implementation of a Radix-4 Complex Division Unit with Prescaling
International audienceWe present a design and implementation of a radix-4 complex division unit with prescaling of the operands. Specifically, we extend the treatment of the residual bound and errors due to the use of truncated redundant representation. The requirements for prescaling tables are simplified and a detailed specification of the table design is given. All principal components used in the design are described and the proposed optimizations are explained. The target platform for implementation was an Altera Stratix II FPGA [15] for which we report timing and area requirements. For a precision of 36 bits, the implementation uses 1093 ALUTs, achieving a latency of 97ns. The maximum clock frequency is 268.53 MHz
A multi-radix approach to asynchronous division
The speed of high-radix digit-recurrence dividers is mainly determined by the hardware complexity of the quotient-digit selection function. In this paper we present a scheme that combines the area efficiency of bundled data with data-dependent computation time. In this scheme the selection function is very simple and may be implemented using a fast adder This function speculates the result digit and, when the speculation is incorrect, a correction of the quotient and of the residual must be performed. When the residual satisfies some constraints it is also possible to switch to a higher radix, computing a fraction of the next digit in advance. This results in a division scheme with a variable iteration time and a variable number of iterations and hence with an asynchronous behaviour Several designs were realized and compared both in terms of execution time and area. The fastest unit considered is a radix-64 divider that may switch to radix 128 or 256. Our evaluations show that area /spl times/ delay savings from 25% to 65%, compared to equivalent synchronous designs, may be achieved.Peer ReviewedPostprint (published version
Reliable and Fault-Resilient Schemes for Efficient Radix-4 Complex Division
Complex division is commonly used in various applications in signal processing and control theory including astronomy and nonlinear RF measurements. Nevertheless, unless reliability and assurance are embedded into the architectures of such structures, the suboptimal (and thus erroneous) results could undermine the objectives of such applications. As such, in this thesis, we present schemes to provide complex number division architectures based on (Sweeney, Robertson, and Tocher) SRT-division with fault diagnosis mechanisms. Different fault resilient architectures are proposed in this thesis which can be tailored based on the eventual objectives of the designs in terms of area and time requirements, among which we pinpoint carefully the schemes based on recomputing with shifted operands (RESO) to be able to detect both natural and malicious faults and with proper modification achieve high throughputs. The design also implements a minimized look up table approach which favors in error detection based designs and provides high fault coverage with relatively-low overhead. Additionally, to benchmark the effectiveness of the proposed schemes, extensive fault diagnosis assessments are performed for the proposed designs through fault simulations and FPGA implementations; the design is implemented on Xilinx Spartan-VI and Xilinx Virtex-VI FPGA families
Solving Systems of Linear Equations in Complex Domain : Complex E-Method
The E-method, introduced by Ercegovac, allows efficient parallel solution of diagonally dominant systems of linear equations in real domain using simple and highly regular hardware. Since the evaluation of polynomials and certain rational functions can be achieved by solving the corresponding linear systems, the E-method is an attractive general approach for function evaluation. We generalize the E-method to complex linear systems, and show some potential applications such as the evaluation of complex polynomials and rational functions
Study of Recursive Divide Architectures and Implementation for Division and Multiplication
Multipliers have been key and critical components for most application-specific and general-purpose computer architectures. However, these architectures have been transitioning towards multiple cores that can process large amounts of data through parallel approaches to computation. Unfortunately, traditional arithmetic functional units that worked well for single-core architectures have the side effect of incurring large amounts of area and power. Consequently, multi-core architecture need new ways of thinking about increased throughput to handle large amounts of data. This work discusses implementation of different divider algorithms and presents a recursive high radix divide unit that is modified to handle both multiplication and division targeted at multi-core architectures. Results are obtained with a 65nm technology and show a significant decrease in area and power while still maintaining a low total latency by utilizing high radix encoding within the functional unit.School of Electrical & Computer Engineerin
Complex Multiply-Add and Other Related Operators
International audienceIn this work, we present algorithms and schemes for computing common arithmetic expressions defined in the complex domain as hardware-implemented operators.The operators include Complex Multiply-Add (CMA: ab+c), Complex Sum of Producrs (CSP: ab+ce+f), Complex Sum of Squares (CSS: a^2+b^2) and complex Integer Powers. The proposed approach is to map the expression to a system of linear equations, apply a complex-to-real transform, and compute the solutions to the linear system using a digit-by-digit, the most significant digit first, recurrence method. The components of the solution vector corresponds to the expressions being evaluated. The number of digit cycles is about m for m-digit precision. The basic modules are similar to left-to-right multipliers. The interconnections between the modules are digit-wide
High sample-rate Givens rotations for recursive least squares
The design of an application-specific integrated circuit of a parallel array processor is considered
for recursive least squares by QR decomposition using Givens rotations, applicable
in adaptive filtering and beamforming applications. Emphasis is on high sample-rate operation,
which, for this recursive algorithm, means that the time to perform arithmetic operations
is critical. The algorithm, architecture and arithmetic are considered in a single
integrated design procedure to achieve optimum results.
A realisation approach using standard arithmetic operators, add, multiply and divide is
adopted. The design of high-throughput operators with low delay is addressed for fixed- and
floating-point number formats, and the application of redundant arithmetic considered. New
redundant multiplier architectures are presented enabling reductions in area of up to 25%,
whilst maintaining low delay. A technique is presented enabling the use of a conventional
tree multiplier in recursive applications, allowing savings in area and delay. Two new divider
architectures are presented showing benefits compared with the radix-2 modified SRT algorithm.
Givens rotation algorithms are examined to determine their suitability for VLSI implementation.
A novel algorithm, based on the Squared Givens Rotation (SGR) algorithm, is developed
enabling the sample-rate to be increased by a factor of approximately 6 and offering
area reductions up to a factor of 2 over previous approaches. An estimated sample-rate of
136 MHz could be achieved using a standard cell approach and O.35pm CMOS technology.
The enhanced SGR algorithm has been compared with a CORDIC approach and shown to
benefit by a factor of 3 in area and over 11 in sample-rate. When compared with a recent implementation
on a parallel array of general purpose (GP) DSP chips, it is estimated that a single
application specific chip could offer up to 1,500 times the computation obtained from a
single OP DSP chip