Radix-8 New Svobota-Tung Divider with Carry Free Characteristic for Precsaling by 余繁
Radix-8 New Svobota-Tung Divider with Carry Free
Characteristic for Precsaling
Fun Ye*, Chih-Hao Kuei and Jen-Shiun Chiang
Department of Electrical Engineering, Tamkang University,
Tamsui, Taiwan 251, R.O.C.
Abstract
In recent years, computer applications have increased in the computational complexity. The
speed requirement forces designers of general-purpose microprocessors to pay particular attention to
implement the floating point unit (FPU). Anew floating-point division architecture that complies with
the IEEE 754-1985 standard is proposed in this paper. This architecture is based on the New
Svoboda-Tung (NST) division algorithm and radix-8 MROR (maximally redundant optimally recoded)
signed digit number system. In NST division, the dividend and divisor must be prescaled. For the
divider implementation, a signed digit adder with carry free characteristic is proposed for addition and
subtraction, and this adder can improve the cycle time significantly. A radix-8 MROR divider by
TSMC 0.25 m technology is thus designed and simulated. The simulation results show that the
performance, hardware cost, and power consumption of the proposed divider is competitive to the
conventional SRT divider.
Key Words: New Svoboda-Tung Division, Floating-Point Division, Prescaling, Radix-8, Signed
Digit Number System
1. Introduction
For computer arithmetic division the most common
format found in modern computers is the IEEE 754-1985
standard for binary floating point arithmetic [1]. This
standard defines single and double precision formats.
Most implementations of the division operation are ba-
sed on digit-recurrence that one digit of the quotient is
produced per iteration. There are two main classes of al-
gorithms for digit-recurrence division, the SRT (Swee-
ney, Robertson, and Tocher) approach [2,3] and the ST
(Svoboda and Tung) approach [4,5].
A lot of efforts have been expended in improving the
design and implementation of SRT based dividers [6
11]. SRT stands for the initials of Sweeney, Robertson,
and Tocher. They developed the algorithm independently
at approximately the same time. The SRT division proce-
dure iteratively employs the P-D plot [12] to determine
the next quotient digit and an adder is used to update the
partial remainder. The P-D plot approach is usually im-
plemented by table look-up. However, the table size in-
creases drastically with high radices. The Pentium FDIV
bug is the most famous story of the Intel microprocessor
bugs. It was caused by an error in a look-up table. Intel
decided to use the SRT algorithm that can generate two
quotient digits per clock cycle. The weakness is that the
SRT algorithm needs to store quotient digit derived from
a P-D plot. This look-up table was incorrectly entered
into the Pentium FPU. Many researchers have presented
several classes of SRT dividers that consider variations
of the P-D plot, but the complexity of the P-D plot makes
it difficultly to implement high-radix SRT dividers [13].
On the other hand, there are no P-D plot problems in
Svoboda-Tung algorithm. However, Svoboda-Tung pre-
sents several drawbacks:
(1) As the Svoboda-Tung division theorem accurately
points out, the radix should be greater than 4.
(2) Without compensation, the quotient digit may ex-
Tamkang Journal of Science and Engineering, Vol. 11, No. 2, pp. 185194 (2008) 185
*Corresponding author. E-mail: fyee@mail.tku.edu.tw
ceed the radix base of the signed digit-set.
(3) Compared to the conventional high radix SRT divi-
sion, the Svoboda-Tung division needs prescaling.
In order to overcome the drawbacks of ST division,
in 1998, Montalvo et al. [14] presented the New Svo-
boda-Tung division algorithm, and it showed a way to
rectify the original defect in the NST algorithm. They de-
fine a new radix to avoid drawbacks (1) and (2) of the ST
division algorithm.
The iterative division can be implemented either us-
ing the NST approach that requires prescaling of the
operands or the SRT approach that does not require pre-
scaling. For NST the prescaling of dividend and divisor
is needed so as to transform the divisor range from [1,2)
to [1,1+), where  is a fraction. In this work we summa-
rize a general systematic method to accomplish the pre-
scaling, and we also propose a hardware scheme such
that the timing complexity is constant regardless of the
bit length of the divisor. The delay of this prescaling ar-
chitecture is constant, i.e. O(1).
Based on [14], we propose a new radix-8 division by
the MROR (maximally redundant optimally recoded) al-
gorithm. Our division approach involves a simple recur-
rence with carry free addition and employs prescaling of
the operands. The input operands comply with the IEEE
754-1985 floating-point standard. The result of the ex-
periment shows that the performance of this divider is
competitive to a conventional SRT divider.
The rest of this paper is organized as follows: Sec-
tion 2 briefly reviews the digit recurrence. The New
Svoboda-Tung division algorithm is described in Sec-
tion 3. The prescaling scheme is described in Section 4.
Section 5 analyzes the detailed architecture of the pro-
posed NST radix-8 MROR divider. The comparison re-
sults of speed, area, and power dissipation are presented
in Section 6. Finally, we make a concluding remark in
Section 7.
2. Digit Recurrence Overview
The New Svoboda-Tung division takes its basis as
the most basic division expression. The basic recursive
division expression is as follows:
X = Q  Y + R
where X, Y, Q and R, represent the dividend, divisor,
quotient, and remainder, respectively. Division instruc-
tions are executed in most of today’s digital computers
via a digit-recurrence procedure. The time required for
the digital division is spent primarily in repeating exe-
cution of this recursive procedure. Dividend X and divi-
sor Y are normalized IEEE numbers with a range of 1 
X, Y < 2. The recursive expression of the NST division
is defined as follows:
R(j+1) = b  R(j)  qj+1  Y (1)
The simplest and most widely implemented class of di-
vision algorithms is digit recurrence. Digit recurrence
algorithms use subtractive methods to calculate quoti-
ents one digit in each iteration. Implementation of digit
recurrence algorithms is typically of low complexity. In
equation (1), R(j) is the current partial remainder; R(j+1)
is the next partial remainder; b is the radix; qj+1 is the
next quotient digit, and Y is the divisor. The dividend
(or initial partial reminder) is R(0). Without loss of gen-
erality, we assume that both the dividend R(0) and divi-
sor Y are fractions and so is the generated quotient Q,
Q = q0q1q2q3…qn-1qn (2)
The division procedure can be verified by applying the
recursive equality (1) repeatedly.
Iteration 1: For j = 0, R(1) = b  R(0)  q1  Y
Iteration 2: For j = 1, R(2) = b  R(1)  q2  Y
= b2  R(0)  (b  q1 + q2)Y
……………………………
Iteration n: For j = n-1, R(n) = bn  R(0)
 (bn-1  q1 +…+ b  qn-1 + qn)  Y
3. NST Division Algorithm
In NST, the signed digit-set D<b,> = {,…, 1, 0,
1,…, } such that
b
b
2
1   , where b is the radix, and
 is the maximum digit in the balanced signed digit-set
D<b,>. It is used to represent the remainder R
(j+1) and the
quotientQ. The conventional digit-set,D<b,> = {0, 1,…,
}, is used to represent the divisor Y.
In 1998, Montalvo [14] derived the range of divisor
186 Fun Ye et al.
Y in the NST algorithm and defined the range of divisor Y
as follows:
(3)
In equation (3), the strictest range is t = , and  is a
positive integer. Hence, the range of divisor Y in the
NST algorithm becomes:
1  Y< 1+, (4)
where 
 
	

b t
, and  < .
The NST algorithm is valid if and only if  < .
Hence, if r1
j and r2
j have different signs, then r2
j must
meet the condition |r2
j| =  < . If this condition does not
meet, r1
j and r2
j must be recoded as r1a
j and r2a
j respec-
tively to fulfill the following expression:
b  r1
j + r2
j = b  r1a
j + r2a
j
The recoded condition resolves into the following two
cases.
Case 1: if r1
j is positive but not zero, then r1a
j is set to
r1a
j = r1
j
 1, and r2a
j = r2
j + b.
Case 2: if r1
j is negative but not zero, then r1a
j is set to
r1a
j = r1
j + 1, and r2a
j = r2
j
 b.
As mentioned above, the purpose of recoding the two
most significant digits of the residual is to ensure r0
j+1 =
0 [14].
In the NST algorithm, the range of the divisor Y is 1 
Y < 1+. However, in the IEEE 754 floating point stan-
dard, the range of the divisor Y is 1  Y < 2. We need a
prescaling procedure to transform the divisor Y and divi-
dend R(0) in the NSTdivision. In other words, the original
division defined by Xstd = Q  Ystd + R, where Xstd and Ystd
are IEEE 754 normalized significant, is replaced by the
equivalent division defined byK  Xstd = Q  (K  Ystd)+
K  R, where K, K  Ystd, and K  Xstd are the scaling fac-
tor, the scaled divisor Y, and the scaled dividend X, re-
spectively.
Montalvo classified the NST algorithm into four ca-
tegories: MRMR (Maximally Redundant Maximally
Recoded) algorithm, MROR (Maximally Redundant Op-
timally Recoded) algorithm, MRmr (Maximally Redun-
dant minimally recoded) algorithm, and mr (minimally
redundant) algorithm. The algorithms of the four catego-
ries are analyzed according to the possible digit-sets and
recoded conditions. According to the range of the nu-
merical values ,  and , we summarize the four algo-
rithms in Table 1.
The NST division algorithm consists of two steps:
Firstly, the input operands of the dividend and the divisor
must be prescaled; secondly, the NST algorithm performs
division recursion with partial reminder and quotient se-
lection computation. In general, the basic procedure of the
NST division algorithm can be summarized as follows:
(1) Prescale input operands of dividend R(0) and divi-
sor Y such that1 1 
 


Y
b t
 
.
(2) Select the most significant digit of the partial re-
mainder R(j) as the (j+1)th quotient digit.
(3) Execute digit-recurrence R(j+1)=b  R(j)  qj+1  Y.
(4) Recode partial reminder R(j+1).
(5) Repeat steps (2)~(4) till the division iteration is
complete.
4. Prescaling Scheme of the NST Division
In the NST algorithm, the range of divisor Y is de-
fined as 1  Y < 1+, where 
 

	

b
. After calculation
 is 1/14. Therefore, for a NST radix-8 division, we have
to define the range of the divisor for the input in between
Radix-8 New Svobota-Tung Divider with Carry Free Characteristic for Precsaling 187
Table 1.  for the special algorithms [14]
 =   b  
Algorithm   Radix-b Radix-2 Radix-4 Radix-8 Radix-16 Radix-32
MRMR b1 0 1/b 1/2 1/4 1/80 1/16 1/32
MROR b1 b/21 1/2(b1) 1/2 1/6 1/14 1/30 1/62
MRmr b1 b2 1/b(b1) 1/2 01/12 1/56 01/240 01/992
mr b/2 b/21 2/b
2
1/2 1/8 1/32 01/128 01/512
1 and 15/14, 1  Y < 15/14. Since the IEEE 754 standard
specifies a normalized range for the operands to be 1  X,
Y < 2, dividend X and divisor Y must be multiplied by a
scaling factor K, such that 1  Y < 2 becomes 1  Y  K <
1+. The range of divisor Y is 1  Y < 2, so the range ofK
is 1/2  K < 1.
The prescaling unit is to find K corresponding to
each segment of the divisor Y, and to multiply each K by
dividend X and divisor Y. The design procedure of the
prescaling unit is as follows:
(1) Set Y = 1+d and K = 1-e, where 0 < d < 1 and 0 < K
< 1/2. Because 1  Y  K < 1+, Y and K are substi-
tuted to find:
1  (1e)  (1+d) < 1+. (5)
(2) Accumulate inequality (5) to obtain
(6)
To simplify the hardware of the prescaling unit, , d,
and e in expression (5) can be replaced by the approxi-
mate values. These approximate values are comprised
by 2-h, where h is an integer. The following procedures
are used to find the values of d, e, and K in each range:
(a) Find the approximate value of , t, where t = 2
-s

 and s = [log2(1/)].
(b) Find ei and Ki by ei = i  t/2 and Ki = 1  i  2
-(s+1)
respectively, where 0  i < 1/t, in different seg-
ments.
(c) Use expression (6) and ei to find the range of di.
The value di found in procedure (c) must satisfy di =m 
2-p, where m and p are integers.
In the NST radix-8 division, the parameters in the
prescaling unit are listed as follows:
(a)  = b – 1 = 8 – 1 = 7
(b)  = (b/2) – 1 = 4 – 1 = 3
(c) 1  Ys < 1+
(d) 
 

	


	


	
b
7 3
8 7
1 14/
(e) s = log2 (1/) = log214 = log 14/log 2 = (log 2 + log
7)/log 2 = 3.8 4
(f) t = 2
-s = 2-4
According to 0  i < 16 and Ki = 1 – i  2
-(s+1), we can
find Ki of different prescales. The scaling factors are
listed in Table 2, and the relationship of range Y, scaling
factor K, and transition point di is graphically shown in
Figure 1. The detailed architecture of our proposed pre-
scaling scheme is discussed in next section.
5. Architecture of the NST Radix-8 MROR
Divider
The proposed divider includes three sub circuits, pre-
scaling, iteration, and registers, as shown in Figure 2.
The digit set of the radix-8 MROR signed digit number
system is D<8,7> = {7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7}.
In order to represent a number in radix-8 signed digit for-
mat, it requires a sign bit and three magnitude bits, and
totally 4 bits are used to represent a digit. The interpreta-
tion of these digits is shown in Table 3. In NST division,
prescaling is to adjust the numerical values of the divi-
dend and the divisor. Only once the adjustment has been
executed that the dividend and the divisor correspond to
the input operands of NST, then the partial remainder
R(j+1) and the divisor Y in the recursive equation can be
selected correctly. Only after selecting the correct partial
remainder and divisor, the divider is able to do the opera-
tion in the iteration unit, R(j+1) = b  R(j)  qj+1  Y.
A. Prescaling
Although the meaning of “prescale” is to multiply a
scaling factor K, in actual hardware we use an addition/
subtraction operation to replace the multiplication op-
188 Fun Ye et al.
Table 2. K scaling factor for radix-8 numbers
Range of Y K K in binary
1  Y < 17/16 1 1/2 + 1/2
17/16  Y < 35/32 0.96875 1/2 + 1/2 - 1/32
32/35  Y < 9/8 0.93750 1/2 + 1/4 + 1/8 + 1/16
9/8  Y < 37/32 0.90625 1/2 + 1/4 + 1/8 + 1/32
37/32  Y < 19/16 0.87500 1/2 + 1/4 + 1/8
19/16  Y < 5/4 0.84375 1/2 + 1/4 + 1/8 - 1/32
5/4  Y < 41/32 0.81250 1/2 + 1/4 + 1/16
41/32  Y < 43/32 0.78125 1/2 + 1/4 + 1/32
43/32  Y < 45/32 0.75000 1/2 + 1/4
45/32  Y < 47/32 0.71875 1/2 + 1/4 - 1/32
47/32  Y < 49/32 0.68750 1/2 + 1/8 + 1/16
49/32  Y < 51/32 0.65625 1/2 + 1/8 + 1/32
51/32  Y < 27/16 0.62500 1/2 + 1/8
27/16  Y < 57/32 0.59375 1/2 + 1/16 + 1/32
57/32  Y < 15/8 0.56250 1/2 + 1/16
15/8  Y < 2 0.53125 1/2 + 1/32
eration. It is to facilitate the addition/subtraction opera-
tion to arrange the scaling factorK into various combina-
tions of 2n in the previous calculation. According to Ta-
ble 2, Y is divided into 16 intervals, and each interval has
the related scaling factor K. The first five most signifi-
cant bits of the divisor Y can decide the interval location
of the divisor Y and the details are shown in Table 4. In
other words it decides the range of the input through
Y(n-1)~Y(n-5), and then decides the scaling factorK, and
finally conducts addition and subtraction operations on
the data selected by the multiplexer. The prescaling ar-
chitecture of the prescaling block is shown in Figure 3.
B. Iteration
The iteration unit must satisfy the recurrence R(j+1)
= b  R(j)  r1a
j
 Y. The iteration unit is composed of
Radix-8 New Svobota-Tung Divider with Carry Free Characteristic for Precsaling 189
Figure 2. The operation flow of the proposed divider.
Table 3. The conversion table of Radix-8 MROR digit set
Digit number Binary coding
7 0111
6 0110
5 0101
4 0100
3 0011
2 0010
1 0001
0 0000
7 1001
6 1010
5 1011
4 1100
3 1101
2 1110
1 1111
Figure 1. The relationship of range Y, factor K, and transition point di.
recoded unit, shifter unit, compensation unit, signed
digit adder, and quotient converter as shown in Figure
4. We can use an on-the-fly converter to convert the re-
dundant quotient to binary form [14]. The detailed de-
scription of the various units is described in the fol-
lowing subsections.
C. Recoded Unit
The recoded unit can be implemented through sim-
ple logic circuits, and the design process is aimed to find
190 Fun Ye et al.
Table 4. Deciding table for the divisor Y intervals
b-1b-2b-3b-4b-5
2
-1
2
-2
2
-3
2
-4
2
-5
Range of Y Transition point di di value di in binary
1  Y < 17/16 0  d0  1/16 d0 = 1/16 00010
17/16  Y < 35/32 1/31  d1  3/31 d1 = 3/32 00011
32/35  Y < 9/8 1/15  d2  2/15 d2 = 1/8 00100
9/8  Y < 37/32 3/29  d3  5/29 d3 = 5/32 00101
37/32  Y < 19/16 1/7  d4  3/14 d4 = 3/16 00111
19/16  Y < 5/4 5/27  d5  7/27 d5 = 8/32 01000
5/4  Y < 41/32 3/13  d6  4/13 d6 = 9/32 01001
41/32  Y < 43/32 7/25  d7  9/25 d7 = 11/32 01011
43/32  Y < 45/32 1/3  d8  5/12 d8 = 13/32 01101
45/32  Y < 47/32 9/23  d9  11/23 d9 = 15/32 01111
47/32  Y < 49/32 5/11  d10  6/11 d10 = 17/32 10001
49/32  Y < 51/32 11/21  d11  13/21 d11 = 19/32 10011
51/32  Y < 27/16 3/5  d12  7/10 d12 = 11/16 10110
27/16  Y < 57/32 13/19  d13  15/19 d13 = 25/321 11001
57/32  Y < 15/81 7/9  d14  8/9 d14 = 7/8 11100
5/8  Y < 2 15/17  d15  1 d15 = 1 11111
Figure 3. Architecture of the prescaling block.
regularity in the recoded conditions. The logic diagram
of the recoded unit is shown in Figure 5.
D. Shifter Unit
This unit implements the b  R(j) operation of the re-
currence, and in a radix-8 system it is to implement 8 
R(j). In the hardware implementation, it can simply be im-
plemented through the shifter circuit.
E. Compensation Unit
This unit mainly implements r1a
j
 Y in the recur-
rence. This unit can be implemented by multiplexers.
F. Signed-Digit Adder
A new radix-8 MROR signed digit adder with carry
free characteristic is proposed in this work. Usually, in
conventional computers the addition is performed by
employing the carry propagation technique, which es-
sentially limits the computing speed. In order to elimi-
nate the carry propagation and thereby speed up the addi-
tion process, the carry out signal should be independent
of the carry in value. The logic diagram of the proposed
signed-digit adder is shown in Figure 6.
The truth table of the final sum is shown in Table 5.
Each sub circuit of the signed-digit adder described in
Figure 6 represents a logic function. This partitioning al-
lows the overall circuit to be easily expressed as follows:
carry1 = ( ), ,a bi s i s
carrys1 = [( ) ] [ ], , , ,a b i a b ii s i s c i s i s c    2 2
     [( ) ( )], , , ,a b i a b ii s i s c i s i s c2 2
its = [( ) ( )] [ ( )], , , ,a b i i i i a bi s i s s s s c i s i s      2 1 0 2
carrys2 = ( )i i i its ts s s  2 1 0
carry2 = ( ) ( )i i i i i i i its s s s ts s s s      2 1 0 2 1 0
	    ( )i i i i carrysts s s s2 1 0 2
When performing the addition operation, the signed di-
git adder in [15], apart from generating the final sum,
Radix-8 New Svobota-Tung Divider with Carry Free Characteristic for Precsaling 191
Figure 4. Architecture of the iteration block.
Figure 5. Recoded unit.
Figure 6. Signed-digit adder diagram.
also generates a control signal for the next level. The
logic circuit for this adder is composed of two parts: a
full-adder circuit and a sign and carry control circuit.
The number of logic gates needed by the full-adder cir-
cuit is 2  (n+1)AFA, where AFA is the area of a full ad-
der. The second part, sign and carry control circuit,
needs 24 logic gates. Therefore, the adder unit needs a
total of 8 full adders and one sign and carry control cir-
cuit for the radix-8 number system. 8 full adders require
40 logic gates; the sign and carry control logic circuit
needs 24 gates, and totally 64 logic gates are needed to
comprise the adder unit in [15]. However, the proposed
signed-digit adder only needs 52 logic gates, and in
comparison with the area of the adder circuits in [15],
ours makes a reduction of about 1/4.
6. The Design and Simulation of a
Radix-8 Divider
Based on the proposed NSTMROR radix-8 division
approach, a NST radix-8 divider was designed and si-
mulated by TSMC 1P4M 0.25 m process. The perfor-
mance of the designed divider is summarized in Table 6.
A comparison of the computation time of several di-
vider designs is presented in [9]. The references do not
consider quotient conversion time, and therefore the fi-
nal quotient conversion time is ignored here. Table 7 is
192 Fun Ye et al.
Table 5. Truth table of the final sum block
Ci-1,s = 0 Ci-1 = 1
tss ts2 ts1 ts0 Si,s Si,2 Si,1 Si,0 tss ts2 ts1 ts0 Si,s Si,2 Si,1 Si,0
1 0 1 0 1 0 1 1 0 0 0 1 0 0 1 0
1 0 1 1 1 1 0 0 0 0 1 0 0 0 1 1
1 1 0 0 1 1 0 1 0 0 1 1 0 1 0 0
1 1 0 1 1 1 1 0 0 1 0 0 0 1 0 1
1 1 1 0 1 1 1 1 0 1 0 1 0 1 1 0
1 1 1 1 0 0 0 0 0 1 1 0 0 1 1 1
0 0 0 0 0 0 0 1
Ci-1,s = 1 Ci-1 = 1
tss ts2 ts1 ts0 Si,s Si,2 Si,1 Si,0 tss ts2 ts1 ts0 Si,s Si,2 Si,1 Si,0
1 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0
1 0 1 1 1 0 1 0 0 0 1 0 0 0 0 1
1 1 0 0 1 0 1 1 0 0 1 1 0 0 1 0
1 1 0 1 1 1 0 0 0 1 0 0 0 0 1 1
1 1 1 0 1 1 0 1 0 1 0 1 0 1 0 0
1 1 1 1 1 1 1 0 0 1 1 0 0 1 0 1
0 0 0 0 1 1 1 1
Table 6. Characteristics of the designed divider
Process Technology TSMC 1P4M 0.25 m
Voltage, Temperature 2.5 V, 25 C
Single precision 22780.6Gate Count
Double precision 30987.3
Single precision 11Number of Cycles
Double precision 21
Single precision 6.01 nsCycle Time (ns)
Double precision 6.12 ns
Single precision 97.4Power Consumption
(mW) Double precision 138
Table 7. Speedup achieved by the proposed design over
other designs (Single precision)
Single precision
Design Radix Speedup
Proposed Divider 8 1
Prescaled NST MROR [17,29] 4 1.5
Prescaled NST MROR [9] 4 1.8
Regular [10] 4 2.1
Prescaled regular [18] 4 2.25
Prescaled over redundant (i) [19] 4 2.16
Prescaled over redundant (ii) [19] 4 2.05
Trade-off SRT [20] 4 1.5
Prescaled regular [21] 2 2.2
Regular (i) [22] 2 2.39
Regular (ii) [23] 2 2.7
the speedup comparison table among other single preci-
sion dividers. The performance of the proposed divider
is better than others. We also compare the operation time
of the proposed divider with other SRT dividers in the
double precision system, and the experimental result is
shown in Table 8. Gate count comparisons are shown is
Table 9. The power consumption comparisons are shown
in Table 10 for implementations of radixes 2, 4, 8, and
16, respectively.
7. Conclusion
ANST radix-8 divider complied with the IEEE 754-
1985 is presented in this paper. The main architecture of
this divider is composed of the prescaling scheme of the
dividend and divisor and the signed-digit adder. For pre-
scaling operation, we generate a table for K prescaling
factors and the prescaling can be easily implemented by
simply shifting and adding of the operands. For speed
and hardware cost consideration we propose a new
signed-digit adder. Compared to the conventional ap-
proach, the proposed signed-digit adder can save 1/4
hardware cost. The presented NST radix-8 divider is de-
signed both for single precision and double precision op-
erations by TSMC 1P4M 0.25 m technology. The simu-
lation results indicate that our divider has the best opera-
tion speed in the single precision operation among the
SRTdividers, and good performance comparedwith other
SRT dividers in double precision operations. The power
consumption is competitive to other dividers.
References
[1] “IEEE Standard for Binary Floating Point Arithme-
tic,” ANSI/IEEE Standard 754-1985, IEEE Computer
Society (1987).
[2] Robertson, J. E., “A New Class of Digital Division
Methods,” IRE Trans. on Electronic Computers, Vol.
7, pp. 218222 (1958).
[3] Tocher, T. D., “Techniques of Multiplication and Divi-
sion for Automatic Binary Computers,” Quarterly J.
Mech. App. Math., Vol. 2, pp. 364384 (1958).
[4] Svoboda, A., “An Algorithm for Division,” Informa-
tion ProcessingMachines,Vol. 9, pp. 183190 (1963).
[5] Tung, C., “ADivision Algorithm for Signed-Digit Ari-
thmetic,” IEEE Trans. on Computers, Vol. 17, pp.
Radix-8 New Svobota-Tung Divider with Carry Free Characteristic for Precsaling 193
Table 10. Power comparison between NST and SRT algorithm
Power Comparison
Design Radix Process Technology Power
Proposed (Single precision) 8 TSMC 1P4M 0.25 m 97.4 mW
Proposed (Double precision) 8 TSMC 1P4M 0.25 m 138 mW
Regular (i) (Double precision) [27] 4 Sun Ultra SPARC processor 108.4 mW
Regular (ii) (Double precision) [27] 8 Sun Ultra SPARC processor 133 mW
Parallel Single Precision [28] 4 TSMC 0.18 m 154 mW
Table 9. Gate count estimation of different NST division designs
Gate Count Estimation
Design Radix Process Technology Gate count
Proposed (Single precision) 8 TSMC 1P4M 0.25 m 22780.6
Proposed (Double precision) 8 TSMC 1P4M 0.25 m 30987.3
Area Efficient FDIV (Single precision) [26] 4 TSMC 0.18 m 12241
Prescaled NST MROR (Single precision) [17] 4 TSMC 1P3M 0.6 m 14494.5
Table 8. Speedup achieved by the proposed design over
other designs (Double precision)
Double precision
Design Radix Speedup
Proposed Divider 8 1
Regular (i) [16] 4 1.63
Regular (ii) [16] 8 1.24
Regular (iii) [16] 160 1.16
SRT Algorithm (i) [24] 4 1.32
SRT Algorithm (ii) [24] 160 0.9375
SRT Parallel Path Algorithm [25] 4 1.32
887889 (1968).
[6] Oberman, S. F. and Flynn, M. J., “Design Issues in Di-
vision and Other Floating-Point Operations,” IEEE
Trans. on Computers, Vol. 46, pp. 154161 (1997).
[7] Srinivas, H. R., Parhi, K. K. and Montalvo, L. A, “Ra-
dix 2 Division with Over-Redundant Quotient Selec-
tion,” IEEE Trans. on Computers, Vol. 46, pp. 8592
(1997).
[8] Burgess, N., “PrescaledMaximally-Redundant Radix-4
SRT Divider,” Electronics Letters, Vol. 30, pp. 1926
1928 (1994).
[9] Srinivas, H. R. and Parhi, K. K., “AFast Radix-4 Divi-
sion Algorithm and Its Architecture,” IEEE Trans. on
Computers, Vol. 44, pp. 826831 (1995).
[10] Montuschi, P. and Ciminiera, L., “Design of a Radix-4
Division Unit with Simple Selection Table,” IEEE
Trans. on Computers, Vol. 41, pp. 16061611 (1992).
[11] Fenwick, P., “High-Radix Division with Approximate
Quotient-digit Estimation,” Journal of Universal Com-
puter Science, Vol. 1, pp. 2-23, http://www.jucs.org/
jucs_1_1/high_radix_division_with/paper.html
(1995).
[12] Kantabutra, V., “A New Algorithm for Division in
Hardware,” in Proceeding of IEEE International Con-
ference on Computer Design (ICCD), pp. 551556
(1996).
[13] Burgess, N. and Williams, T., “Choices of Operand
Truncation in the SRT Division Algorithm,” IEEE
Transactions on Computers, Vol. 44, pp. 933938
(1995).
[14] Montalvo, L. A., Parhi, K. K. and Guyot, A., “New
Svoboda-Tung Division,” IEEE Trans. on Computers,
Vol. 47, pp. 10141020 (1998).
[15] Mekhallalati, M. C. and Ibrahim, M. K., “New High
Radix Maximally-Redundant Signed Digit Adder,”
IEEE International Symp. on Circuit and System, Vol.
1, pp. 459462 (1999).
[16] Nannarelli, A. and Lang, T., “Low-Power Division:
Comparison among Implementation of Radix 4, 8 and
16,” 14th IEEE Symp. on Computer Arithmetic, p. 60
(1999).
[17] Tsai, M.-S., “The Design and Implementation of a
High Speed Radix-4 Carry Free Division Architec-
ture,”Master Thesis,TamkangUniversity, June (2000).
[18] Ercegovac, M. D. and Lang, T., “Simple Radix-4 Divi-
sion with Operands Scaling,” IEEE Trans. on Com-
puters, Vol. 39, pp. 12041208 (1990).
[19] Montuschi, P. and Ciminiera, L., “Over-Redundant Di-
git Sets and the Design of Digit-by-Digit Units,” IEEE
Trans. on Computers, Vol. 43, pp. 269279 (1994).
[20] Wang, X. and Nelson, B. E., “Tradeoffs of Designing
Floating-Point Division and Square Root on Virtex
FPGAs,” Proc. 11th Annual IEEE Symposium on Field-
Programmable Custom Computing Machines, Vol. 2,
pp. 195203 (2003).
[21] Burgess, N., “A Fast Division Algorithm for VLSI,” in
Proc. IEEE Int’l Conf. Computer Design: VLSI in Com-
puters and Processors, Boston, pp. 560563 (1991).
[22] Kuninobu, S., Nishiyama, H. E. T., Tanaguchi, T. and
Takagi, N., “Design of High Speed MOS Multiplier
and Divider Using Redundant Binary Representation,”
Proc. 8th Symp. Computer Arithmetic, Como, Italy, pp.
8086 (1987).
[23] Ercegovax, M. D. and Lang, T., Division and Square
Root. Norwell, Mass.: Kluwer Academic (1994).
[24] Montuschi, P. and Lang, T., “Boosting Very-High Radix
Division with Prescaling and Selection by Rounding,”
IEEE Trans. on Computers, Vol. 50, pp. 1327 (2001).
[25] Rice, E. and Hughey, R., “A New Iterative Structure
for Hardware Division: The Parallel Paths Algorithm,”
16th IEEE Symposium onComputer Arithmetic (ARITH’03),
pp. 5462 (2003).
[26] Moon, J.-S., Kwon, T.-J., Sondeen, J. and Draper, J.,
“An Area-Efficient Standard-Cell Floating-Point Unit
Design for a Processing-In-Memory System,” Proc. of
the Conference on European Solid-State Circuits, pp.
5760 (2003).
[27] Nannarelli, A. and Lang, T., “Low-Power Radix-8 Di-
vider,” Proc. of International Conference on Com-
puter Design, pp. 420426 (1998).
[28] Kwon, T.-J., Moon, J.-S., Sondeen, J. and Draper, J.,
“A 0.18 m Implementation of a Floating-Point Unit
for a Processing-In-Memory System,” IEEE Interna-
tional Symp. on Circuit and System, Vol. 2, pp. 453
456 (2004).
[29] Chiang, J.-S. and Tsai, M.-S., “A Radix-4 New Svo-
bota-Tung Divider with Constant Timing Complexity
for Prescaling,” Kluwer Journal of VLSI Signal Pro-
cessing, Vol. 33, pp. 117124 (2003).
Manuscript Received: Oct. 11, 2006
Accepted: Oct. 22, 2007
194 Fun Ye et al.
