{ 1 skhan, 2 ydkim }@hbt.chungbuk.ac.kr, 3 ygyou@chungbuk.ac.kr
Introduction
Modern data processing requires faster arithmetic operation circuitry to meet increasing system speed. Signal processing algorithms, for example, rely heavily on efficient arithmetic circuits of addition and multiplication that have been developed over the last few decades. A substantial amount of research has been devoted to the development of fast addition and multiplication. The resultant latency of addition ranges from 2 to 4 cycles, while those of multiplication demand 2 to 8 cycles. Double precision division, however, suffers a heavier latency of 6 to 61 cycles [1] .
Even though it is less frequently used, division can be one of the major computation factors degrading overall performance of system applications [1] . An SRT algorithm has been devised to enhance division speed through computing a quotient digit of each iteration step based on a recurrence technique. The speed of an SRT-based divider largely depends on the complexity of quotient selection. It is necessary to reduce the number of iteration steps or employ high speed arithmetic circuitry for faster division. The radix is increased to reduce the number of iteration steps and thereby enhance division speed.
The use of a quotient-digit selection table significantly reduces the complexity of quotient-digit selection. However, the table size drastically increases with high radices [2] . The table size can be significantly reduced by estimating the quotient digit instead of finding the exact one [3] . In this paper, we address some dividers employing a redundant number system. The full addition or subtraction performed in each iteration step is the most significant delay element of high speed division, which is performed on an n-bit divisor and a partial remainder to determine its quotient digit. A redundant number system and its associated operations are proposed to overcome this carry propagation problem. To study the effectiveness of the redundant number system, several dividers are designed employing a recoded binary signed digit adder (RBSDA) [4] and redundant binary adders of prior arts in [5] , [6] and [7] . Global carry generation is prevented by eliminating non-zero sequences in augends and addends in a RBSDA [4] . Recoding is performed before additions to eliminate nonzero sequences. The carry propagation has been avoided with a redundant number system by selecting combinations of intermediate sum and carry digits so that they do not generate carries [5, 6, 7] . Combinations (S i , C i-1 ) = (1, 1), or (S i , C i-1 ) = (1,1), are not used. The performance of dividers is compared for evaluation purposes.
This paper addresses some of the SRT divider implementations employing different adders, and for illustration of the effectiveness of the proposed scheme, compares divider design by employing non-redundant number adders to a design with redundant number adders.
Title of Publication (to be inserted by the publisher)

Division Algorithms
Division produces two outputs: a quotient Q and a remainder R from a dividend X and a divisor D; X = Q · D + R, where R < D. The basic division algorithm is as follows: r -1 = X; for i = 0 to k-1 if 2r i-1 > D then q i = 1; r i = 2r i-1 -D; else q i = 0; return (Q, R); Here X has 2k bits, and others have k bits in data representations; r i and q i are the partial remainder and the quotient bit of the ith stage of the algorithm, respectively. A shading part of the algorithm is to be modified to accommodate faster computation.
SRT Division.
The SRT division algorithm introduces another quotient value, q i = 0, into the non-restoring algorithm eliminating an addition or subtraction operation, thereby speeding up computation. The size of D satisfies the range of 2 -1 ≤ D < 1 in an SRT division.
The quotient selection becomes simpler comparing 2r i-1 with respect to -2 -1 or 2 -1 . This means that only a comparison of the two bits of the divisor and a remainder is required, a full comparison is not needed. A binary fraction is greater than or equal to 2 -1 when it begins with 0.1; and smaller than -2 -1 when it begins with 1.0. A routine determining quotient digits and remainders is as follows:
if 2r i-1  < 2 -1 then q i = 0; else if 2r i-1  ≥ 2 -1 and (sign(2r i-1 ) = sign(D)) then q i = 1; r i = 2r i-1 -D; else q i = -1; r i = 2r i-1 + D;
The routine is repeated k times but the number of additions or subtractions is smaller than k due to the newly introduced value of 0 for q i selection. It is expected that k/2 additions or subtractions are performed in average.
Carry Save Adder-based
Design. An ordinary carry save adder (CSA) prevents time-consuming carry propagation and speeds up computation; it consists of a simple linear array of full adders. Since all digit positions are processed in parallel, an addition or subtraction completes with a constant delay.
In a CSA-based SRT division algorithm, carry propagation may be required to get a correct sign of a subtraction or addition. Partial remainders are represented in a stored-carry form; the actual remainder is the sum of two carry-sum vectors.
An approximated comparison is performed by overlapping ranges of 2r i-1 to get quotient q i = -1, 0, or 1 without suffering risks of wrong quotient selections. The overlapping regions are shown in Fig.  1a . Here u =(u 1 u 0 .u -1 u -2 L) and v =(v 1 v 0 .v -1 v -2 L) are carry and sum components of 2r i-1 in storedcarry representation, respectively. They are 2's complement numbers in the range of
The quotient selection is performed as follows:
338
On the Convergence of Bio-, Information-, Enrivonmental-, Energy-, Spaceand Nano-Technolgies
Title of Publication (to be inserted by the publisher) Here the 4-bit number t = (t 1 t 0 .t -1 t -2 ) is the sum of the most significant 4 bits of u and v. The first three bits, t 1 t 0 .t -1 , can be compared to constants -2 -1 and 0.
Truncation of 2's complement numbers makes their absolute values smaller regardless of their signs. The true value of 2r i-1 is smaller than 0 for t < -2 -1 , since the truncation error of each component is smaller than 2 -2 . Similarly, it holds 2r i-1 < 2 -1 ≤ D for t < 0. This algorithm employs a 4-bit fast adder; the high-order 3 bits of the 4-bit result can be supplied to a logic circuit or an eight-entry table to get a next quotient digit. The speed of a CSA-based design is evaluated for comparison purposes. The gate delay of a CSA is 4∆. To compare its value to a partial remainder, additions of carry-sum pairs are performed; that of CLA adders is 6∆. Fig. 2a includes the Divisor Multiple Generator block selecting the multiples of the divisor; and its delay is 2∆. The total delay of SRT division algorithm based on CSA is (k/2)×(4+2+6)∆ assuming k/2 additions/subtractions in average.
Redundant Number based Design.
The division speed can be further enhanced by employing a redundant number of partial remainders. The sign of a partial remainder is readily available without full carry propagation; this is the most powerful result of this scheme. Only the three most significant digits are compared to 2 -1 . Table 1 illustrates a quotient selection method of the redundant number based on SRT division. Here D msb is the most significant digit of a divisor.
Key Engineering Materials Vols. 277-279 339
Title of Publication (to be inserted by the publisher)
The redundant number yields overlapping regions at partial remainder ranges corresponding to q i = ±1 and q i = 0 as shown in Fig. 1b , since a truncation does not make a number less than, or equal to, the original number. Note that truncation errors of CSA based designs are always smaller than their original values. In a redundant number based design, truncation errors may be larger or smaller than the original values. It is necessary to consider three most significant digits determining a quotient digit; thus, overlapping regions occur. An RBSDA-based design requires recoding at the end of every stage of addition/subtraction. Delays of 4∆ are due to the contributions of 2∆ from addition and 2∆ from recoding. This delay is independent of the number of digits since parallel circuit operation is performed at all the digit positions.
Recoding circuit recodes x=(x n ,x p ) to z=(z n ,z p ). Recoding of x n and x p is performed simultaneously. The logical equation is as follows: 
The addition logic is as follows: The total delay of the RBSDA-based SRT divider is (k/2) × (2+2+2)∆. The delays of RBSDA and recoding are 2∆, respectively; and the quotient selection and divisor sign selection takes another 2∆. The number of addition/subtraction is assumed k/2 in average.
The addition of RBA is performed in two steps: in the first step, two digits in the same figure positions of an augend and an addend are added. There are six types of combinations of two digits in addition as shown in Table 2 . The results of addition are two signals: intermediate carry digit C i and intermediate sum digit S i as defined in this table. The second step must then obtain the sum digit Z i at each position by adding S i and C i-1 from the next lower order position. After the first step, there is no carry generation at any position in the second step. 
340
Title of Publication (to be inserted by the publisher) Therefore, the addition of two k-digit redundant binary numbers is performed in constant time. The RBA proposed in [5, 6, 7] is shown in Fig. 3 . The delay of a RBA is 6∆, and the quotient selection and multiple divisor generation take another 2∆. The total delay of a RBA based k-bit SRT divider is (k/2) × (6+2)∆.
SRT dividers based on RBSDA and RBA are shown in Fig. 2b and 2c , respectively. The structure takes two inputs of 2k-digit dividend and k-digit divisor, and yields two k-digit outputs, quotient vectors and a remainder. The structure comprises k stages of individual addition/subtraction circuitry, where the quotient digit and the sign of the divisor are determined.
The performance of the foregoing three algorithms is compared in Table 3 . The numbers in the table represent the number of unit gate delays. Combinations of SRT and the redundant number scheme yield a faster divider design. The number of gates and input lines used for a RBSDA-based design is double that of a RBA-based design due to the recoding block.
Summary
We have presented SRT division schemes aimed at high speed operation based on a redundant signed number. Combinations of SRT and the redundant number scheme yield a faster divider design. Substantial performance enhancement is expected for SRT dividers when the redundant number formats are introduced: A SRT divider based on RBSDA yields a 50% improvement in speed, and a RBA-based design shows a 33% improvement in speed. Designs based on RBSDA are faster but
