Division with speculation of quotient digits by Cortadella, Jordi & Lang, Tomás
Division with Speculation of Quotient Digits * 
Jordi Cortadella 
Dept. of Computer Architecture 
Polytechnic University of Catalonia 
Barcelona, Spain 08071 
Abstract 
The speed of SRT-type  dividers  i s  mainly  deter- 
mined by the complexity of the quotient-digit selec- 
t ion,  so that implementat ions are limited t o  low-radix 
stages. W e  present  a scheme in  which the quotient- 
digit i s  speculated and, when this speculation is  in- 
correct, a rollback o r  a partial advance is  performed. 
This  results in  a division operation with a shorter  cy- 
cle tame and a variable number of cycles. W e  per- 
formed several designs and report results that show a 
radix-64 implementat ion that i s  30% f a s t e r  than the 
fastest  conventional implementat ion (radix-8) at an 
increase of about 45% i n  area p e r  quotient bit. More- 
over ,  we  show a radix-16 implementat ion that is  about 
10% f a s t e r  than the radix-8 conventional one, with the 
additional advantage of requiring about 25% less area 
p e r  quotient bit.  
1 Introduction 
Most implementations for the division operation are 
based on the SRT algorithm which involves a recur- 
rente in which one digit of the quotient is produced 
per iteration [l]. Consequently, to reduce the number 
of iterations i t  is convenient to use a higher radix for 
the quotient digit. However, as the radix increases the 
added complexity of the quotient selection function 
increases the iteration delay and eliminates the ad- 
vantage. Because of this, implementations have used 
radix 2 and radix 4 stages and higher radices are ob- 
tained by unfolding these stages [lo, 61. Moreover, 
reductions in time have been obtained by overlapping 
these unfolded stages [lo, 121. One way that has been 
proposed to reduce the complexity of the quotient se- 
lection function is to prescale the divisor (and the div- 
idend) [9, 7,4] or to multiply the estimate of the resid- 
ual by an approximation of the reciprocal of the divisor 
In this paper we present and evaluate the idea of 
using a simple function to speculate a probable value 
of the quotient digit and use this speculated value to 
continue with the algorithm. Another function deter- 
mines whether the speculation is incorrect and, in that 
case, the algorithm rolls back, the digit is corrected 
and the process continues from there. In a variation 
of this scheme, we allow a partial advanc.e of fewer 
'Partially supported by the Ministry of Education of Spain 
[11, 81. 
(CICYT, TIC 91-1036) 
Tom& Lang 
Dept. of Electrical and Computer Engineering 
IJniversity of California at Irvine 
Irvine, CA 92717 
bits than a full digit when an incorrect speculation is 
performed. 
Because of the possible rollbacks and partial ad- 
vances, the execution time is variable. Consequently, 
a division unit of this type is suitable when the rest 
of the system can make use of this variable time. In 
particular, if the unit forms part of a general-purpose 
processor with several functional units, it is necessary 
to have hardware support for control of dependencies. 
In the evaluation we determine the average execution 
time assuming a uniform distribution of operands. 
The effectiveness of an implementation depends on 
a variety of factors, such as the time and cost of the 
speculation function, the probability of correct spec- 
ulation, the time and cost of error detection, and the 
time for correction. 
We develop the method and evaluate alternative 
possibilities. To evaluate its effectiveness we present 
some examples of implementations and compare with 
the implementation of conventional algorithms using 
the same technological constraints. 
We now review very briefly the well-known division 
algorithm, mainly to establish the notation used. We 
use the standard recurrence 
w [ j + l ] = r w l j ] - q j + l d  w [ O ] = x  ' ( 1 )  
where w b ]  is the residual after the j- th iteration, r 
is the radix, qj+l is the new quotient digit, d is the 
divisor, and x is the dividend. We assume that 2, d ,  
and q are normalized fractions. 
To have a fast iteration a redundant adder is used. 
In this paper we use a carry-save adder, although a 
similar development could be done for other redundant 
representations of w[jJ. 
The quotient digit is a signed digit lqj+1l 5 a, with 
redundancy factor p = a/(. - 1). This requires that 
w[O] 5 pd, which is obtained by shifting the dividend. 
Moreover, for this case, the convergence of the algo- 
rithm requires the residual to be bounded so that 
The quotient digit qj+l is determined by a quotient- 
digit selection function. This function depends on an 
estimate of the residual and on an estimate of the di- 
visor, that is 
qj+l  = sel(rir, d )  (3) 
1063-6889/93 $03.00 0 1993 IEEE 
The estimates are usually obtained by truncating 
the corresponding values. The number of bits of these 
estimates increases with the radix as shown in Table 1. 
This lengthens the delay of the function, so that prac- 
tical implementations are limited to radix 2 and radix 
4 stages. Examples of the number of bits required are 
shown in Table 1. 
Table 1: Number of bits required for quotient-digit 
selection functions. 
2 Scheme for Division with Specula- 
tion of Quotient Digits 
We now describe the basic scheme (without partial 
advance) and then introduce the partial advance. 
qd ~ l i l  wfil 
b I I l l  ... t ... t 
correction 
speculation/ cycle 
correction - M u x  U- 
(a) wU+1] 
SpeC ,MUX+CSA I Spec I MUX+CSA 
I I check I 
4 for 
cycle bounds 
(b) 
Figure 1: Basic scheme. (a) Block diagram, (b) Tim- 
ing diagram 
2.1 Basic Scheme 
The basic scheme is shown in Figure 1. As men- 
tioned in the introduction, the quotient digit is specu- 
lated (how this is done is described in the next section) 
and used in the recurrence to produce the speculated 
next residual as follows (speculation cycle): 
w " b  + 11 = r w b ]  - qj"+,d (4) 
At the end of the iteration, we decide whether the 
speculated digit is correct by determining whether 
the residual is inside the allowed bounds (see expres- 
sion 2). If it is inside the bounds, 
w b  + 11 = wSb + 11 ( 5 )  
7 -3 9 0 6 2 
. . .  . .  
7 -2 -0m 0 8 am 2 
D: digit selection, A: addition 
S: speculation cycle, C: correction cycle 
Figure 2: Timing diagram for (a) conventional division 
( r  = 16) and (b) quotient-digit speculation 
7 1 - 2  -8 0 8 
C 1 -1 
C I-11 
Figure 3: Quotient-digit generation 
and the next iteration is performed. On the other 
hand, if the residual is out of bounds the quotient digit 
and the residual have to be corrected. A c,orrection 
cycle is performed as follows: 
w b  + 11 = W 8 b  + 11 - $+*d ( 6 )  
In this case, the digit has the same weight as the 
digit obtained in the speculation cycle and, thus, the 
correct digit is q,+l = qj+ l  + Q ; + ~  The amount of cor- 
rection performed, $, is discussed in the next section. 
Consequently, when there is an error (incorrect spec- 
ulation), more than one cycle is needed to obtain the 
correct quotient digit and the correct next residual. 
As indicated in Figure l .b,  the check for the bound 
can be overlapped with the speculation of the next 
quotient digit. 
In Figure 2, we show the timing diagram of an ex- 
ample division and in Figure 3 the quotient digits pro- 
duced. Case a) corresponds to a conventional division 
in which a correct quotient-digit is computed in each 
cycle. The second diagram corresponds to the basic 
scheme with speculation. As can be seen, the cycle 
time for this case is smaller than for the conventional 
case because the computation of the speculated digit 
is faster than the computation of the correct digit. In 
the example, the first two speculated digits are cor- 
rect; the third digit is incorrect, producing a w8[3] 
which is out of bounds. This results in a correction 
of 43 by + 1 and the corresponding correction of w8 [3]. 
88 
This correction is sufficient, as indicated by the fact 
that the corrected residual is inside the bounds. Then, 
the next speculated digit is correct and after that an- 
other error is detected and corrected. In this case, two 
cycles are required for the correction, as the residual 
obtained after the first cycle of correction is still out 
of bounds. 
2.2 Scheme with Partial Advance 
In the basic scheme presented, two situations can 
occur: either we advance one digit of the quotient 
incorrect speculation, even if the error is very small. 
The corresponding delay is reduced by the scheme 
with partial advance. In this scheme, when the er- 
ror is small, a third situation is allowed, namely the 
advance of a number of bits which is less than a whole 
digit; we call this par t ia l  advance'. 
I digit 1 - 
speculatior! qd rW7l pwb] w]] 
(with PA' 
-M 1 a.. 1 ... 1 
Mux 
digit _.) U 
speculation X- 
(with PA) 
correction 
w 
MUX 
apaculation/ 
partial/ cycla 
corrac t ion 
I # 
quotient 
unit 
wh+l] 
Figure 4: Scheme with partial advance 
The amount of advance is selected so that the next 
residual is bounded. The partial-advance iteration is 
ZLIb + 11 = pwlj] - q,P+ld (7)  
where 10g2p is the number of bits of the partial act- 
vance. In this case, the digit q7+l has an overlap of 
log, 
A block diagram for this scheme is shown in Fig- 
ure 4. Note that two different speculations are per- 
formed depending whether there is a full advance 
(log, r bits) or a partial advance (log:, p bits). The tim- 
ing diagram of Figure 5.a and the quotient digit com- 
putations of Figure 5.b, show an example with par- 
tial advance. The first two digits are correct, whereas 
the third is incorrect. Consequently, it is not possi- 
ble to advance a whole digit. Two possibilities exist: 
no advance or partial advance. In this case, the error 
is sufficiently small so that a partial advance can be 
made (we consider a partial of two bits, logap = 2).  
The new speculated digit is 4, which overlaps with the 
bits with the previous digit. 
' A  variation of tlus idea was suggested to us by F'aolo 
Montusclu 
Figure 5:  Quotient-digit generation with partial ad- 
vance (2  hits) 
incorrect digit. This new digit is correct. Then a new 
speculation ( q  = 4) is incorrect and in this case no 
advance is possible (because the error is too large). A 
correction cycle is required, and after that ,  with still 
the residual out of bounds, another partial advance 
can be perforined ( q  = -6). Finally, the last digit is 
correct. 
3 Speculation of Quotient Digit and 
3.1 Speculation of Quotient Digit 
As indicated in the previous section, the idea is 
to reduce the delay of the quotient-digit selection by 
using a simpler function which gives a correct value 
wit,h high probability. To achieve this, we use 
Error Detection/Correction 
q;+, = s p e c ( G 5 ,  2 )  
where w5 and have fewer bits than the w and (1 of 
expression (3) ,  respectively. 
The question is how to find such a speculation func- 
tion, combining a low delay and a reasonable probabil- 
ity of speculating the correct value. There are many 
parameters in this choice of function, including charac- 
teristics of the implementation. Moreover, it  involves 
a tradeoff between delay and cost. Therefore, i t  is 
not possible to search for an optimal solution, and we 
have to be content with finding one that is satisfactory. 
To simplify the process we perform it in two stages, 
namely, first we determine, for various number of bits 
of the estimates, the function that gives the highest) 
probability of success and then we use this function in 
implementations. 
For a given number of bits of the estimates, the 
funct,ion that produces the highest probability of cor- 
rect speculation, can be determined theoretically or 
empirically. A possible theoretical approach is as fol- 
lows (see Figure 6): 
0 For a particular value of CP determine the range 
of values of wb].  This defines a vertical strip in 
Figure 6 .  
a9 
as - 
d 
radix 4 8 I6 - 
a 2  5 12 
# bit.s ti? (s-c) 4-4 5-4 (6 -. 5 )  
# bits (i. o 1 1 
, prob. of success 0.91 0.86 0.77 
Figure 6: P-D diagram to determine the probability 
of correct speculation. 
0 For a particular value of (i. determine the range 
of values of d .  This defines a horizontal strip in 
Figure 6 .  
0 Determine the areas corresponding to the inter- 
section of the rectangle formed by the two strips 
above and the selection intervals for different val- 
ues of q j + l .  Select for qJ+l the value that corre- 
sponds to the largest area. 
This approach has the disadvantage that it assumes 
that all values of the residual are equally likely. To 
avoid such an assumption, which might be even less 
true with speculation than with conventional division, 
we determined the speculation function empirically. 
A possible empirical determination of the specula- 
tion function is to perform divisions using a conven- 
tional SRT algorithm. Then, for each pair of values 
(Gal&)  select as q8 the value which occurs most fre- 
quently. However, this process is not, accurate because 
the residual values obtained when executing a conven- 
tional algorithm are not the same as those produced 
when speculation is used. To eliminate the corre- 
sponding error, we determine the speculation function 
as follows: 
0 Build a speculation table (Table-spec) which, for 
each pair of values of ( G a l & ) ,  has as entry the 
speculated quotient digit. InitkJize this table 
with any value, for example all 0. 
until a stable speculation table is obtained: 
0 Perform several iterations of the following process 
Perform simulations of divisions in which ttlie 
dividend and the divisor are selected at  ran- 
dom. 
In these divisions, use as quotient digit the 
value obtained from Table-spec. 
Build a matrix with a row for each pair (2;" 
(i") and a column for each value of Q ; + ~ .  
For each residual and divisor obtained dur- 
ing the simulation, determine tlie possible 
5. 
correct values of the quotient digit and in- 
crement the corresponding entries in the ma- 
trix. 
At the end of the iteration, for each row of 
t,he matrix select for quotient-digit specula- 
tion the value that has the maximum entry 
in the row. Update the speculation table. 
For each simulation, lo5 divisons were executed; for 
each division, the dividend and divisor were randomly 
generated and a 54-bit quotient was calculated. Only 
few iterations (between 3 and 5) were required to ob- 
tain a stable speculation function. The technique of 
multiple independent repetitions was used to obtain 
results with a confidence level higher than 0.98. 
Table 2 shows probabilities of success for the best 
function for several cases. Since w8 has a carry-save 
representation, different number of bits can be used 
from the and from the carry .  When sum and 
carry bits are expressed in parenthesis, (s-c), the cor- 
responding bits are first assimilated and then used by 
the speculation function. Since the probability of suc- 
cess is relatively high and the number of bits signif- 
icantly smaller than those of Table 1, t8ha approach 
looks proinising. 
Table 2: Prolmbility of success of the quot,ient-tligit, 
spec U 1 at, ion function. 
3.2 Error Detection 
Since tlie speculation is not always correct it is nec- 
essary to detect the error. Two approaches for this 
seem possible: 
(:ompute the correct value of the quotient digit 
and compare with the speculation. This scheme is 
not convenient because, for high radix, the exact 
quotient-digit selection is complex and slow and 
because there are cases in which more than one 
value is correct. 
Determine whether the next residual is within the 
allowed bound. 
We chose this second approach. We need to assure 
that the speculation is accepted only if 
To have a fast comparison, this determination has to 
be performed using a truncation of w 8 b  + I] and d.  
Let us call 6' and these truncated values. We now 
determine the minimum number of bits that these es- 
timates require. Let us call f the number of fractional 
bits of each estimate. Since a carry-save representa- 
tion is used for w,  2;' is calculated as the sum of the 
90 
most significant bits of the two bit vectors representing 
w, and therefore 
w E [we, wc + 2-’+1) 
d E [&, & + 2-f)  
(10) 
On the other hand, d is nonredundant so 
(11) 
Consequently, relation (9) is satisfied if 
(12) - I &“ I pp - 2-f+l 
Therefore, these are the comparisons that have to be 
performed2. 
In addition, the continuity condition has to be pre- 
served. This requires that  the difference between the 
upper and lower bound of the residual has to be at  
least equal to d .  That is, 
p&-2- f+1+pn^”>d  (13) 
2p(p - 2-f+1 2 & + 2-f 
(2p - l )& > 3 x 2-f 
and since d < & + 2-f,  it is sufficient that 
(14) 
(15) 
and we get 
Since & 2 1/2, it is sufficient that 
1 
4 2 213 5 
r n p f  
4 3 1 3  
8 5 517 4 
16 10 2/3 5 
We now determine the number of integer hits of the 
estimates. Since intermediate residuals produced by 
speculative division might have values larger than pd, 
additional integer bits are required. Let us call e the 
maximum error produced by a speculation, that is, e 
is the maximum difference between a speculated value 
and a correct quotient digit. Moreover call and M 
the highest and lowest values of w’b + I] obtainable 
when the recurrence is performed with a speculation 
having the maximum error. Since a correct residual is 
w’b + 11 I pd, then 
2 . . . 3  3 
4...7 4 
- 
M i p d + e d  
and thus, the number of integer bits (i) required to 
represent wrongly speculated residuals is 
i = Pog2(p + e ) ]  + 1 (20) 
’Only the truncated value of p a  is required for the imple- 
mentation of the comparisons. 
For p 5 1 this reduces to 
i = [log2(e + 1)1 + 1 (21) 
The value e is calculated as follows. Given a spec- 
ulation function to obtain the quotient digit, for each 
rectangle R defined by w’ and & in the P-D diagram 
(see Figure 6) we calculate 
where q H }  is the smallest set of quotient digits that 
shows several examples of the minimum number of 
bits required. 
cover t I, e rectangle. Finally, e = maxeR. Table 3 
In summary, to determine whether there is an error 
it is necessary to have two comparators that perform 
the comparisons specified in (12). The estimate Cc 
has i integer bits and f fractional bits, whereas & has 
f fractional bits (the most significant being always I). 
Since the com Iarisons are done with truncated ver- 
sions of w’h + I\ and d, there are cases in which the 
speculation is correct but the comparison (to be con- 
servative) fails. Consequently, the probabilities of suc- 
cessful speculation are somewhat smaller than those 
presented in Section 3.1. Now these probabilities de- 
pend not only on the number of bits of w’ and (is but 
also on f, and can be increased somewhat by using a 
larger value of f. An example is shown in Table 4. 
f II 4 1 5 I 6 I ‘L I >  
prob. ofsuccess 11 0.73 1 0.75 I 0.76 I 0.76 I 0. 
Table 4: Variation of probability of success ( r  = 16, 
n = 12, (6-5) bits for C’ and 1 bit for (is) 
3.2.1 
We now determine the range of error that permits a 
partial advance of log, p bits. As discussed above, we 
determine the error by looking a t  w* + I  In the basic 
digit is corrected until a residual inside the bounds is 
obtained. However, if the errot is small we can accept 
the speculated digit and make a partial advance of 
l oy2p  bits if 
Iwcause this is inside the bounds for the next iteration. 
(:onsequently, we can advance l ogzp  bits if 
Error Detection for Partial Advance 
scheme, if w’[j + I] is out of boun k ’ k  s, t e speculated 
IPW’b + 111 i rpd (23) 
(24) 
91 
I 1  speculation/ 
partial/. cycle 
L, partial-advance +mc wm@n correction 
(b) 
Figure 7: Units for digit speculation. (a) Basic 
scheme; (b) With partial advance ( q j + l  denotes q j + l ,  
c$+~, or qj+l depending on the type of cycle). 
These comparisons are done in a similar manner as 
those previously discussed, that is, using estimates of 
w ' b  + 11 and d .  
3.3 Correction of Quotient Digit 
Since with the detection scheme presented in the 
previous section, when there is an error the correct 
digit is not known, it is necessary to perform an incre- 
mental correction and to check again whether a cor- 
rect digit is obtained. We have chosen to correct the 
quotient digit by +1 or -1, depending on the sign 
of the residual estimate. This method requires that 
in some cases more than one correction cycle be per- 
formed. However, this situation is infrequent so that 
the method is suitable. 
Figure 7 depicts a block diagram of the circuit re- 
quired for digit speculation, error detection, and digit 
correction. It consists of a multiplexor that selects 
between the speculated digit ( q ' )  and the correction 
(f l)  according to the result of the comparison with 
the bounds. In the scheme with partial advance, two 
speculation tables (for q' and q p )  and two compar- 
isons with the bounds (for full and partial advance) 
are required. 
Skipping Quotient Digits 
The correction performed because of incorrect spec- 
ulation can also be used to avoid having t,o produce 
some divisor multiples that are difficult to generate. 
For example, it is possible to implement a radix-8 di- 
vider with a=5 using only one adder and therefore 
only multiples that are powers of two\ by avoiding 
to speculate the quotient-digit values 3 and 5. When 
these values are required, they are obtained by spec- 
ulating some neighboring value and performing a cor- 
rection cycle. We have performed some designs that 
show that this approach is beneficial because it reduces 
the critical path, while not increasing significantly the 
number of cycles. 
4 Evaluation 
To evaluate the speed advantage of the scheme we 
have described, we performed some implementations 
for 54-bit dividers (this value affects only the area es- 
timates). Since there are many parameters that spec- 
ify a particular implementation, we have not done a 
complete analysis of the solution space but performed 
some reasonable designs to evaluate and compare. 
All designs have used the same technology and 
design tools. In particular, we have used a lpm 
standard-cell CMOS library [5] (size of a %input 
N A N D  gate is 12.5 x 47.5pm2, delay of an inverter 
is 0.15 ns . Some simple modules have been designed 
by hand )multiplexors and CSAs) while rntsll [2] has 
been used for the synthesis of the quotient-digit selec- 
tion functions and the comparators. Mtsl l  has always 
been guided to optimize delay a t  the expense of in- 
creasing the area. Fan-in and fan-out capacitances 
(but no routing) have been considered for delay calcu- 
lations. We only give the final results of the designs. 
More details can be found in [3]. 
4.1 Basic Scheme 
We now evaluate the execution time of the basic 
scheme and compare with conventional SRT division 
units. 
Since we do not specify the number of bits of the 
quotient, we compute the avera e execution time per 
quotient bit. This time is giventy 
where C;h is the average number of cycles per quotient 
digit ( this includes the speculation cycles and the cor- 
rection cycles) and t ,  is the delay of one cycle. 
In Table 5 we report the results of some of t,hese 
designs. The delay does not include register loading, 
because the stages can be used in an implementation 
with unfolded iterations and even in a self-timed im- 
plementation [12]. Because of this possibility of un- 
folding, for our comparisons we normalize to delay and 
area per quotient bit. For the speculative approach, 
we show only cases that produce a faster implementa 
tion than the conventional approach and when increas- 
ing the radix produces an improvement in delay/bit. 
We were surprised by the good performance ob- 
tained for the conventional radix-8 implementation; 
t,liis is achieved by guiding riiisll to provide a tliffer- 
ent delay for both components forming the radix-8 
quotknt digit. We conclude that the fastest, specu- 
lative implementlation is about 10% faster than the 
92 
conventionar ~ 1 
cmp: i,f ) 3,5 4,5 
3 
.16 
Table 5: Characteristics of designs. (Area unit = Area 
of a 2-input NAND gate, delays are given in ns). 
fastest conventional one, with the added advantage of 
reducing the area per bit by 25%. 
4.2 Scheme with Partial Advance 
To calculate the average number of cycles per quo- 
tient digit, we perform simulations that are similar to 
those discussed in Section 3.1. Now three types of 
cycles exist, namely, full cycles (with probability p , ) ,  
partial cycles (with probability pp), and non-advance 
cycles (with probability pn). The average number of 
cycles per digit is then calculated as 
(26) 
log, 
Pf log2 r + Pp log, P c d  = 
Table 6 shows the number of cycles for several radices 
and several values of p .  
4,6 
4 
1.23 
I Radix II 4 II 8 II 16 1 
Eycle’deLy 
cell area 
delay/bit 
area/bit 
speedup 
area factor 
Table 6: Speculative division with partial advance (the 
designs correspond to the speculative dividers in t a  
bles 5 and 7) 
10.7 13.4 14.3 
3120 3860 5910 
3.1 3.1 2.9 
780 772 985 
2.03 2.03 2.17 
1.56 1.54 1.97 
We have performed several designs chosing from Ta- 
ble 6 the values of p that produce the lowest average 
number of cycles per digit. Table 7 shows the charac- 
teristics for implementations that are faster than those 
described in Section 4.1. The results indicate that this 
method produces speedups with an increase in area; 
# bits Cib 2 3 
Table 7: Designs of speculative dividers with par- 
tial advance (Speedup and area factor are given w.r.t. 
radix 2). 
this increase is mainly due to the duplication of the 
speculation function. The radix-64 implementation is 
80% faster than the fastest conventional implemen- 
tation (radix 8) with about 44% more cell area per 
quotient bit. 
5 Design Example 
We now present the implementation details of one 
design. For simplicity, a scheme without partial ad- 
vance is explained. We have chosen the one that gives 
the smallest delay per quotient bit, namely, the spec- 
ulative radix-16 divider shown in Table 5. The block 
diagram is shown in Figure 8. 
The quotient digit generated by the speculation 
function is qh + qI, where qh E {-8,-4,0,4,8} 
and ql E {-2,-1,0,1,2}. Therefore, the values 
{-12, -11, 11,12} are always obtained after correc- 
tions of the initially speculated digit. Although this in- 
creases somewhat the number of correction cycles, the 
use of n = 12 results in a larger overlap than n = 10, 
which makes the implementation of the speculation 
function simpler and faster. Moreover, the limited 
precision of the comparisons error detection) reduces 
diminishes the probability of requiring quotient digits 
near fa. Exploring several designs, we found n = 12 
to be the best trade-off for radix 16. 
Table 8 reports area and delay characteristics of this 
design. As qh is the highest-weight component of the 
quotient digit, the speculation function is simpler and 
faster than for qI. When synthesizing the speculation 
function, triisll has been guided to reducing the delay 
of qh which, in this case, is in the critical path. 
The CSA has been designed as a radix-2 full adder. 
Its delay (2.2 ns) is determined by two cascaded XOR 
gates (1.1 ns each). However, the outputs of the qid  
multiplexors have been connected to the last gate in 
order to reduce the critical path (the same optimiza- 
tion has been used for the conventional designs). This 
approach cannot be used with the residual, since the 
redundant, representation requires two signals. 
the range of “accepted” resi 6 uals and, consequently, 
93 
3 16w[j] 
digit speculation 
A S  9 
WU.* ............. b...... ”.. 
digit W I CSA 
\.......I1 
270 4.8* (for yh) 
6.3 (for U,) 
-1- i MUX 
nextlsame digit 
i k  
w[j+l I 
Figure 8: Block diagram for a spec.ulative radix-16 
divider. 
I Module I1 Area I Delav 1 
~~ 
error detection 11 210 I 5.3 
MUX f or aid I 1  2 x 310 I 0.9* I 
I Iteration (Tot,al\ I1 2nnn I 1n.7 1 
Table 8: Area and delay for the speculative radix-16 
divider (* indicate delays in the critical path) 
6 Summary and Conclusions 
The division method we have presented is based on 
the speculation of the quotient digit and on a roll-back 
when the speculation is incorrect. Because of the re- 
duction in complexity of the quotient-digit selection, 
this can result in faster implementations with higher 
radices than for the conventional approach. Moreover, 
a reduction in the number of adders required is possi- 
ble by speculating only a reduced set of quotient-digit 
values. 
We have discussed the way to determine an opti- 
mal speculation for a given estimate of the residual 
and of the divisor. Moreover, we detect whether the 
speculation is correct by determining whether the next 
remainder is inside the required bound. 
The desi ns performed show a radix-16 divider that 
is about 10% faster than the fastest conventional di- 
vider (radix 8). Moreover, the area per quotient bit is 
reduced by 25%. 
The speed can be further improved by doing par- 
tial advance when the speculation is incorrect. We 
developed the condition required for partial advance 
and described an implementation that replicates the 
quo tien t- digi t selection. 
We performed designs which show that a radix-64 
divider with speculation and partial advance of four 
bits is about 30% faster than the fastest c.onventiona1 
case with an area/bit increase of about 45%. 
Acknowledgment 
We thank Paolo Montuschi for his valuable discussions 
and comments. 
References 
[l] D.E. Atkins, “Higher-Radix Division IJsing Es- 
timates of the Divisor and Partial Remainder,” 
IEEE T m n s .  Cotnputezs ,  Vol. C-17, pp. 925-934, 
Oct. 1968. 
[2] R.K. Brayton, R. Rudell, A. Sangiovanni- 
Vincentelli, and A.R. Wang, “MIS: A multiple- 
level logic optimization system,” ZEEE Trans. 
Computer-Aided  Design,  vol. CAD-6, pp. 1062- 
1081, Nov. 1987. 
[3] J .  Cortadella and T. Lang, “Division with Specula- 
tion of Quotient Digits,” U P C / D A  C Tech. Report,  
1993. 
[4] M.  Ercegovac and T .  Lang, “Simple Radix-4 Di- 
vision with Operands Scaling,” I E E E  Trans. on 
Computer s ,  Vol. C:-39, pp. 1204-1208, September 
1990. 
[5] European Silicon Structures, ES8 E C P D 1 0  Li- 
brary Databook, April 1991. 
[GI J .  Fandrianto, “Algorithm for High Speed Shared 
Radix 4 Division and Radix-4 Square Root,” Proc. 
Xth IEEE Syiriposium on Computer  Ari thtnet ic ,  
Como, Italy, pp. 73-79, May 1987. 
[7] E.W. Krishnamurthy, “On Range-Transformation 
Techniques for Division,” IEEE D a n s .  on Co7ri- 
pu ter s ,  Vol. C:-19, pp. 227-231, March 1970. 
[8] D.W. Matula, “Design of a Highly Parallel Float- 
ing Point Arithmetic Unit,” Syiriposiuiri on Goiri- 
binatorial Optiiriization Science and Technology 
( C O S T ) ,  April 1991. 
[9] A. Svoboda, “An Algorithm for Dii.ision,” Inf. 
Proc. Mach., Vol. 9, pp. 25-32, 1963. 
[lo] G.S. Taylor, “Radix-16 SRT Dividers with Over- 
lapped Quotient Selection Stages,” Proc. 7th IEEE 
Symposium on Computer  Ar i thmet i c ,  Urbana, 
ILL ,  pp. 64-71, June 1985. 
[ l l ]  S. Waser and M.J. Flynn, Introduction to  Ari th-  
met ic  f o r  Digital Sys t em Designers ,  Holr, Rine- 
hart, and Winston, New York, 1985. 
[12] T.E. Williams and M.A. Horowitz, “A 160ns 
54bit CMOS Division Implementation lJsing 
Self-Timing and Symmetrically Overlapped SRT 
Stages,” Proc. IUih Symposium on Computer  
Ari lhniet ic ,  pp. 210-217, June 1991. 
94 
