Area-time optimal division for T = Ω((log n)1 + ε)  by Mehlhorn, K. & Preparata, F.P.
INFORMATION AND COMPUTATION 12, 270-282 (1987) 
Area-Time Optimal Division for T=f2( (log n)’ +‘)* 
K. MEHLHORN 
Fachhereich IO, Informarik. Universitiit des Suarlandes, 
6600, Saurbriicken, West German1 
AND 
F. P. PREPARATA 
Coordinated Science Laboratory, University oJ Illinois, 
Urbana, Illinois 6 180 I 
Area-time optimal VLSI division circuits are described for all computation times 
in the range [Q((logn) ’ “), O(J)] for arbitrary E > 0. ( 1987 Acadrmx Press, Inc 
1. INTRODUCTION 
A simple transformation of right-shift to integer division shows that the 
area-time (AT') complexity of any network for the computation of the 
inverse of an n-bit number (referred to here as “divider”) is bounded from 
below by a(~‘). A trivial fan-in argument also gives T= Q(log n), A family 
of AT’-optimal dividers has been proposed some time ago by Mehlhorn 
(1984). A network of this family can be constructed for each computation 
time Tin the range [B(log’n), O(A)]. S’ mce then considerable progress 
has been made in the design of faster dividers (Reif, 1983), culminating in 
the result of Beame, Cook, and Hoover (1984) illustrating an O(log n)-time 
divider (i.e., a time-optimal network in the hypothesis of bounded-fan-in 
components). However, the Beame-Cook-Hoover network (referred to 
here as the BCH network) does not achieve area optimality. Thus, it is 
natural to ask the question of the existence of area-time optimal dividers 
for T= o(log2 n). This paper provides an affirmative answer for 
TE [Q(logn)‘+“), O(log’n)] for any positive constant cd 1. It must be 
pointed out that the proposed networks are so complicated- 
notwithstanding their area-time optimality-that they are exclusively of 
theoretical interest. 
* This work was supported by the DFG, SFB 124, TP B2, VLSI Entwurf und ParallelitCt, 
and by NSF Grant ECS-8410902. 
270 
0890-5401/87 $3.00 
CopyrIght 1‘1 1987 by Academx Press. Inc. 
All rights of reproduction m  any form reserved. 
AREA-TIME OPTIMAL DIVISION 271 
FIG. 1. Block structure of the divider 
The network (see Fig. 1) consists of J+ 2 cascaded modules, where 
.I- l/e. The first J modules are modified dividers of the BCH type, com- 
puting a sequence of approximations of the inverse with increasing num- 
bers of bits 1, d I2 6 . . . d I, < n. 
The last two modules are designed to complete the buildup of the result 
size from lJ to n bits by implementing the Newton approximation method, 
which, at each iteration doubles the length of the result. This is carried out 
in two phases, respectively executed by the “fast” and “slow” 
approximators. The fast approximator basically consists of a single area- 
time optimal fastest multiplier, used to execute the initial iterations; the 
slow approximator is instead a cascade of affordably slow multipliers, each 
executing one of the final iterations. Note that the cascade of the two New- 
ton approximators structurally coincides with Mehlhorn’s (1984) divider. 
The paper is organized as follows. In Section 2 we present a more 
efficient implementation of the BCH method leading to a circuit referred to 
as “modified BCH divider.” In Section 3 we discuss an alternative method 
for the computation of the inverse, which uses the modified BCH method 
as a subroutine. Finally, in Section 4 we illustrate the combination of the 
previous techniques with the Newton approximation, to yield our proposed 
network, while Section 5 contains a few closing remarks. 
In this paper we shall frequently refer to under- and overapproximations 
of the reciprocal of a number. For brevity, given an n-bit number x in the 
interval [l/2, 1) (i.e., a normalized fraction with n bits to the right of the 
binary point), we say that for 1 <n, v is an l-bit underinverse or an l-bit 
overinverse of ,Y depending upon whether v = 12’/.~_1.2 -~’ or 11 = [2//x1.2 -‘. 
Equivalently, v. x = 1 f 6, with 6 < 2 -’ or v has two significant bits to the 
left and 1 significant bits to the right of the point. 
2. AN EFFICIENT IMPLEMENTATION OF THE BCH METHOD 
In this section we first describe (a variant of) the BCH method (Beame 
et al., 1984) and then modify it so as to reduce its area requirement. 
The original BCH method computes the n-bit underinverse of an n-bit 
number x by adding the first n powers of u = 1 -x and truncating the n2- 
bit result to its leading n-bits. Each power of u is computed individually 
and the n powers are subsequently added together. A power uk is computed 
272 MEHLHORNAND PREPARATA 
by taking the “logarithm” of U, multiplying it by k, and then taking the 
“antilogarithm.” 
Since taking logarithms of large numbers is very hard, the method 
resorts to a modular representation and works as follows: 
ALGORITHM INVERSE 1 (x). 
Input: an n-bit number x in the range [l/2, 1). Given are m (small, 
possibly consecutive) primes p1 ,..., pm such that 
fi pi > 2’n?) (Note that m ‘v n’/log n) 
,=I 
(n is assumed to be a power of two) 
output: an (n + 2)-bit number u in the range (1,2), so that 
uxx= 1 +6 with 6~2~” 
(1) begin 
(2) 
(3) 
(4) 
(5) 
(6) 
(7) 
(8) 
(9) 
(10) 
(11) 
(12) 
(13) end 
u := (1 -.u) 2”; (*u is an integer *) 
for j, 1 djdm 
pardo hi := u mod pi; 
compute r, so that a?= bj, where 
ai is a generator of the multiplicative group of Z*p,; 
for I=0 to logn- 1 
do ,!O .= aQ’mo4p, ~ 1) 
I’ I (*ml” = u” mod p,*) 
od; 
v, := #‘?J’; ~ 1 (mj’) + 2”*‘) mod pj 
(*v, = 2”(‘* ~ ‘). I;&’ (u/2”)’ mod pi*) 
V, :=v,M,mod(p,...p,) 
(* first step of Chinese remaindering * ) 
odpar; 
v :=Cy=, V,mod(p, . ..p,); (* second step of Chinese 
remaindering* ) 
v := truncate u to the first n + 1 bits, insert one 0 to the left, 
and set point after the second bit from the left 
Let us next describe the different steps of this algorithm in more detail. 
In this description we will make frequent use of the following two facts: 
(1) One can multiply two k-bit integers in time T and area A, where 
AT’=O(k2) and TE [sZ(log k), O(,,/%)]. This is the result of (Mehlhorn 
and Preparata, 1983). 
AREA-TIME OPTIMAL DIVISION 213 
(2) One can add m k-bit integers in time O(log m + log k) and area 
O(km . log m). This can be achieved by expressing the m integers in redun- 
dant representation (see, e.g., [4-61) and then adding them in a tree-like 
fashion. The tree has depth O(log m) and requires area O(m log m) for 
every bit position. Each level of the tree introduces a delay of just O(1) 
thanks to the redundant number representation. 
We are now ready to describe the circuit in more detail. We start with 
the parallel loop, lines 2-10. 
Line 3. This line is easily executed in time O(log n) and area 
O(n (log n)‘) for each pi by expressing u by its binary expansion 
u = C;:d ~,2~, U, E (0, 1 }, storing the numbers 2’ mod pi in a table and 
performing the required additions in redundant number representation. We 
leave the details of this step to the reader. 
Line 4. Step 4 is realized by a table loop-up, i.e., by a look-up in a table 
which gives the value of r, for each possible value of hi. Since p, can cer- 
tainly be expressed using 2-log n bits this table has n’ entries of 2 log n bits 
each. We realize this table by 2 log n H-trees each requiring area O(n’). 
Thus the total area is O(n’ log n) for each p,, and a table look-up takes 
time O(log n). 
Note that the 2 log n slices of the table are accessed in parallel. Also note 
that this circuit can be pipelined (its period is 0( 1) in technical terms) and 
therefore O(log n) look-ups can also be performed in time O(log n) using 
the same area. This observation is important for step 6. 
Lines 5, 6, 7. Consider a fixed I first. We first compute 
Rj” = r,2’ mod (pi - 1) 
as outlined in line 3. Note that the I-place shift does not have to be 
executed explicitly; it only determines which powers of two need to be 
“I looked-up. The computation of Rj takes time O(log n) and area 
O(n(log n)‘). We perform this computation in parallel for all 1, 
O<l<logn- 1. 
The integer m)‘) is computed from (” R, by look-up in a table of 
“antilogarithms.” The log n look-ups are pipelined and take time O(log n) 
and area O(n2 log n) for each pi (refer to the description of line 4). 
Finally note that ml’) = u~2’mod(f’~ ’ ) = hf’ mod pi = a” mod pi. 
Line 8. We use a tree of multipliers. This tree has depth O(log log n) 
and has log n nodes. Each node contains a circuit multiplying two 2 log n 
bit numbers and reducing the result mod pi in time O(log log n) and area 
0( (log n)‘). This shows that step 8 takes time O(log n) and area O(n). Both 
estimates are very generous. 
274 MEHLHORN AND PREPARATA 
Finally note that 
log II ~ I n-1 
,! (2 
0 + ,;y, = ‘yf ’ (2”2’ + uq = pn- I). 
/=O 
,F, (u/W. 
Line9. Let M,=[(p,...p,)/pilpi-‘mod(p,...p,,). Then Mi is the 
coefficient of u, required for Chinese remaindering (Knuth, 1981). The 
number M, is precomputed and stored in a register of length O(n’). We 
multiply u, by M, by dividing A4, into n’/log n pieces of length O(log n), 
performing n’/log n multiplications in parallel and then summing the 
results. This can certainly be done in time O(log n) and area O(n2 log n). 
Also the reduction mod(p, . . . p,) can be done in that area and time. 
Indeed, let q be an integer in [0, 2”Z+‘ogn) and M P p, “‘pm; then 
q mod M = q - Lq/MJ . M. Thus we perform, in time O(log n) and area 
O(n2), a multiplication of q by an approximation of l/M of precision 
2P”‘P’0g’f (having only O(log n) significant bits), followed by a mul- 
tiplication of M by Lq/MJ. 
Summar)>. Lines(3) to (9) take time O(log n) and area O(n2 log n) for 
each p,. Since u” has n’ bits we have m = O(n’/log n) and each modulus is 
representable in 2 log n bits. We realize loop (2) to (10) by having a 
module for each modulus and hence the loop takes time O(log n) and area 
O(n’). 
Line 11. In line 11 we add m numbers of n2 bits each and reduce 
mod(p, . ..p.,,). This takes time O(log n) and area O(m log ~1. n2) = O(n4). 
LEMMA 1. There exists a circuit which computes the n-bit inverse of an 
n-bit number in time O(log n) and area O(n4). 
Proqf Immediate from the discussion above. 1 
The enormous space requirement of the method sketched above is essen- 
tially due to the fact that the powers of u are computed with Q(n’) bits of 
precision. However, only the leading n + log n bits are truly needed for the 
computation of v. This observation is the key to the “modified” BCH 
method, to be described next. In the modified method we compute the 
powers of an f-bit integer u in m rounds (this m has nothing to do with the 
IPI in algorithm INVERSE 1 ), where m is a design parameter to be selected. 
In each round we compute the sum of s = (I)““’ consecutive powers using 
the method of Lemma 1. We call s the depth of the method. This takes time 
O(log I) and area 0( (Is)‘) and yields a result of 0(/s) bits. The space 
requirement results from the fact that only Is/log(ls) different prime moduli, 
each of length 2 log(ls) bits, must be used. We truncate this result to 
1+ [log 12ml bits and start the next round. The details are as follows. 
AREA-TIME OPTIMAL DIVISION 275 
ALGORITHM INVERSE 2(x) 
Input: an I-bit number XE [l/2, 1) and an integer s= (I)““. 
Output: an (I + 2)-bit number u E (1,2) 
begin u0 := 1 - x; 
fori=Otom-1 do 
begin 
0, :=x;:-; u;’ 
ui+1 := truncate uf to q = I+ rlog 12ml bits right of point; 
end; 
v := truncate ~~0~ ... CT,,-, to I bits right of point; 
end. 
To prove te correctness of this algorithm we must show that v gives the 
(I-C 2) leading bits of l/( 1 - u) (of which the rightmost 1 bits represent the 
fractional part). To this end, we must show that the error of the underap- 
proximation is < 2 --‘. 
For any variable a used by the above algorithm let (r denote the 
corresponding exact value (note that, since all numbers are nonnegative, 
the truncation mechanism gives c? 2 a), and 6(a) the absolute error on a, 
such that a = a” - d(a). Recall also that &a. h) -C 6(a) &+ d(6) 5 and that 
&a + b) = 6(a) + 6(b). Using these relationships, we readily have 
Since 3,. . . c?,, ~ 1 < 3 and 3, > 1 (i = 0 ,..., m - 1 ), we obtain 
6(a,...o, -,)<3(6(a,)+ .‘. +&a,- I)). 
From Zi = Cf:d ii/ we have 
.> 1 s -. I 
&a,)= C 6(u/)< C jC :- ‘cqu;) < S(u,)/( 1 - ii,)2 < 46(Ui), 
i=O ,=o 
since iii < 4 for i = l,..., m - 1. (Obviously 6(u,) = 0.) 
Thus 6( co.. . @m ~ , ) < 12m max 6( ui) and the condition 
12m max 6(ui) < 2-’ 
ensures the correctness of the method. We claim that S(ui) < 2-y as a result 
of truncating to q bits right of the point. Indeed 6(u,) < 2 Py, trivially. For 
i> 1, assuming 6(u,) < 2Py, let UT+ I = u; (before the truncation). Then 
276 MEHLHORNAND PREPARATA 
since ui < i for i > 1. If we assume s 2 2, then S(u,*, ,) < 2-4, which shows 
that its q bits to the right of the point are correct. Thus, the prescribed 
truncation yields 6(uj+ ,) < 2 Py, and the induction step is complete. In 
conclusion, we choose 
q>l+log 12m. 
(Note that for any choice of s, [log 12ml< 4 + log log I by the definition of 
m.1 
Noting that m . O(log I) = O(log2 l/log s), we have: 
LEMMA 2. For any 2 <s 6 1 there exists a circuit computing the l-bit 
inverse of an l-bit number in time O(log’ l/log s) and with area O((~S)~). 
The Ai”‘-performance of the above circuit is given by 
By choosing the depth s as s = I” (E > 0), the resulting circuit achieves 
T=O((l/s)logI) and AT2=0(12(‘+‘) ), i.e., it is a moderately AT’- 
suboptimal divider still achieving T = O(log I), for fixed E. We are aware 
that this result had been previously obtained by Leighton (1985), 
presumably by a similar argument. 
We close this section by noting that if v is an l-bit underinverse of 
x E [t, 1) then v + 2-’ is an I-bit overinverse of x. 
3. A TECHNIQUE OF SUCCESSIVE REFINEMENTS 
We now describe an alternative approach to the computation of the 
inverse of an I-bit number, which uses the BCH method as a subroutine. 
Informally, this approach begins by computing a (small length) coarse 
overapproximation of the inverse of x, and subsequently refines it by mul- 
tiplicative factors, which are all inverses of numbers very close to 1 (from 
above). Therefore, the first approximation takes advantage of the small 
operand length, whereas the subsequent refinements exploit the presence of 
leading zeros in the representation of the number to be inverted. This 
method is best described for an I-bit integer XE [l, 2). (Note the modified 
range of normalization. ) 
The number x E [ 1,2) can be written as 
x=x, +2-‘I-“. w, 
where x, is an (II + 2)-bit number (the leading I, + 2 bits of x) and w  is an 
(I-I,-2)-bit number (the trailing l-f,-2 bits ofx). Then X,E [l, 2) and 
AREA-TIME OPTIMAL DIVISION 277 
w E [0,2). Let u,. 2 be an (1i + 1)-bit overinverse to x1. 2-l (i.e., 
x,ui=l+~, 1~2~‘I-‘).Then 
u,,~=v,x,+u,w2~“~2=l+~+u,w2--“-2<1+4~2~”-2=1+2~” , 
since rj <2-‘lP2, u, < 1, and w  < 2. This means that u,.\: has at least 1, - 1 
consecutive O’s immediately to the right of the point. Define 
Then, if u2 denotes an approximation of l/z,, we have ~~-1~ 1: l/x. Also, if 
uZz2 = 1 + q’ then u, u2x = 1 + q’, i.e., u, v2 is an overapproximation of l/x 
of precision $. The process can be iterated thereby obtaining 
l/x 2 U,U?” Uk. 
This leads to the following algorithm: 
ALGORITHM INVERSE 3(x) 
Input: an l-bit number XE [I, 2) and an integer sequence 
1, cl+ . . . < I, = 1. 
Output: an I-bit number u E (l/2, 11, such that ux = 1 + E, E < 2 -’ 
(1) begin ~:=l 
(2) for i=l tokdo 
(3) begin ti :=leftmost (I, + 2) bits of X; 
(4) 2; := ut,; 
(5) .yi := leftmost (li + 1) bits of 2,; 
(6) ui :=2-.I((/, + 1 )-bit overinverse of .uj2 ‘): 
(7) 0 := v  vi 
end; 
end 
The correctness of the method is established by showing that the error is 
bounded from above by 2-‘. Indeed, note that t, = x, so that (line (4)) 
:k=vkp,“‘u,.x, and Zkvk = (Uk. ’ 'U,)X. But (line(5)) .u,=z,--,,,, 
ylk<2-‘-’ and (line(6)) xkuk= 1 +6k, 6,<2-‘--‘. We conclude 
Since 6,+ylkvk<2-‘-‘+2~‘~‘=2-‘. This shows that uk”‘u, is the 
desired overapproximation of the inverse of X. 
Step 6 is the crucial action in the above algorithm; we realize it by 
making use of the BCH method. To analyze its performance, we need 
278 MEHLHORN AND PREPARATA 
LEMMA 3. If an l-bit number XE [t, 1) has I’ - 1 zeros immediately to the 
right of the leading 1, the l-bit inverse of x can be computed in time 
T= O(log(l/l’) . log I/log s) and area A = O( (Is)‘), for any 2 <s < l/l’. (Note 
that this result subsumes Lemma 2 for 1’ = 1.) 
ProoJ Indeed u = 1 - 2x is a (nonpositive) proper fraction whose 
absolute value has I’ zeros immediately to the right of the point. This 
implies that lu”““l < 2-‘, so that only the first [l/l’] consecutive powers of 
u need to be computed. u 
The numbers .Y;, i = l,..., k, used in Step 6 meet the conditions of 
Lemma 3, since ot;- 1 is a (nonnegative) number with lie, leading zeros 
(1, = 0, by convention). Step 6 is therefore carried out by applying 
Algorithm INVERSE 2 so that the ith iteration is characterized by length li 
and depth si. An implementation of this technique is therefore completely 
specified by the two sequences: 
and 
1, , lz,..., 1, 
.y,, s2 ,..., Sk. 
Before closing this section we note that Step 7 involves a multiplication 
of O(li)-bit numbers at the ith iteration; thus this operation is no more 
complex than the execution of the homologous Step 6, and will not be 
further mentioned in this discussion. 
4. THE DIVIDER NETWORK 
We have all the premises to illustrate in detail the structure of the divider 
sketched in Fig. 1. 
The first J stages are collectively designed to implement the successive 
refinement technique; each module implements the modified BCH 
algorithm. For i= 1, 2,..., J, let 1; be the (output) operand length, si the 
depth, A ,,; the area, and T,.i the time of the ith module. We seek a solution 
where all such modules have identical area (i.e., A ,,, = A’ for i= l,..., J) 
and identical computation time, equal to the target time (i.e., 
T,,, = 8((log n)’ +‘), i = l,..., J). By the requirement of optimality, we have 
We also choose 
&=C= (log”n)l+~. 
II= (log ;; I + es,’ 
(2) 
(3) 
(4) 
AREA-TIME OPTIMAL DIVISION 279 
The parameter J is chosen as the largest value of i for which si 2 2, and is 
readily found to be O( l/s). Also note that 2 < sJ = s$‘“+g;‘)’ < 2(‘Og “I’. Since the 
area of the ith module is O((lisj)2), condition (2) is obviously verified. 
From (3) and (4) we obtain I, = (n/(log H)‘+‘)‘/~, and from Lemma 2: 
From Lemma 3, for i = 2 ,..., J, 
since l,s, = li- , s, ~ , 
since sj 3 2 
= 0 log n. 
( 
2(log ?z)“‘iP ” 
2(log ?z)c’iP 2’ > 
= O((log n)’ +>:) 
thus verifying the objective for the computation time. 
With these choices, each module of the chain is AT’-optimal, and the 
global computation time is c, (l/s)(log n)’ +I = @((log n)’ +l.), for some con- 
stant c,. The value of I,, the number of bits of the result, is approximately 
This value I, represents the length of the operand supplied to the cascade 
of the two Newton approximators, to be described next. Notice that, since 
each Newton iteration doubles the number of accurate bits, if we start with 
1, accurate bits, only (1 + E) log log n Newton iterations are needed to 
complete the task. 
Starting with the downstream approximator, we recall (see Fig. 2) that 
this module is in turn the cascade of p submodules (p is an integer to be 
defined shortly), where the ith submodule has area and time A,,i and T,,,, 
respectively, and 
A3.i=2A3,;-Ir T3.f = Jz T,.i- I 3 i = 2, 3 ,..., p. 
280 
n 
t 
1st submodule 
MEHLHORN AND PREPARATA 
FIG. 2. The module structure of the slow “Newton approximator.” 
With this choice (originally proposed in (Mehlhorn, 1984)), the global 
area and time of the slow approximator are respectively proportional to 
the area A + and time T,,p of the pth (last) submodule. Since we are aiming 
for an A T’-optimal network with computation time O(T), we must have 
Aj.P z:,, = O(n2) 
and 
This condition enables us to specify the parameter p. Indeed, the speed of 
the submodules increases as we proceed upstream (by decreasing sub- 
module index), and each submodule must satisfy the condition that its mul- 
tiplication time is at least logarithmic in the operand length. Since the 
operand length is halved in going from index i to index i- 1 (due to the 
mechanism of the Newton approximation), and the most stringent 
condition occurs for i = 1, we have 
T n 
(&)p-, 3log 2”1 ’ ( > 
which is certainly satisfied if we select p as 
p-1=2log T 
( ) log n 
= 2~ log log n, (5) 
or p = 1 + 2~ log log n. 
Finally we turn our attention to the “fast approximator.” This module 
receives an approximation of length I,> n/(log n)’ +‘. 2(logn)” and delivers 
an approximation of length n/2(log n)2E. (Note that this is exactly the 
input operand length of the first module of the slow Newton approxi- 
mator discussed earlier.) Thus, this module must execute at most 
(log n)’ + (1 -.a) loglog n iteration steps, each of them within time @(log n). 
The module essentially consists of a “fastest” multiplier, i.e., time O(log n), 
AREA-TIME OPTIMAL DIVISION 281 
of numbers of length n/(log PI)*‘, and can be realized with area A2 such that 
A,(log n)‘= O((n/(log n)“)‘) and hence A, = O((n/(log n)“2’)2). Thus, 
the resulting AT’-measure for this module is 
(lognn),+,i.(logn).(logn)” = O(n2) 
and the optimality condition is clearly satisfied. 
Since each of the three major units of our divider--the chain of modified 
BCH dividers, the fast Newton approximator and the slow Newton 
approximator-has area O((n/(log n)’ +‘)?) and time O((log n)’ +‘), we 
conclude with 
THEOREM 1. For any fixed 1 > F > 0, the n-bit inverse of an n-bit number 
can be calculated with optimal AT=-performance,for any TE [Q((log n)’ +I’), 
O((log n)‘)l. 
5. CONCLUSION 
We constructed an AT’-optimal divider with computation time 
(log n)’ +C for any E > 0. The reader may wonder whether one can choose e 
as a decreasing function of n (tending to zero as n goes to infinity). This is 
indeed the case if the construction is slightly modified. In the construction 
as it is now we use a chain of modified BCH dividers each with the same 
area and speed. Thus both area and time grow as l/s and hence AT* grows 
(at least) as (11~)~. 
If E is chosen as a function of n, then this simple chain of equally sized 
modules does not suffice. Rather one has to use a chain of increasingly 
larger (and slower) modules as we did for the Newton iteration. Omitting 
the tedious and not particular illuminating details we have 
THEOREM 2. There is an AT’-optimal divider for n-bit integers ,for any 
TE [SZ(logn.2”“g’“g”“‘4), O((logn)‘)]. 
Note that 20% ‘W n I3 4 = 0( (log n)‘) for any E > 0. 
RECEIVED August 1985; ACCEPTED October 1986 
REFERENCES 
MEHLHORN K. (1984), AT*-optimal VLSI integer division and integer square rooting, 
Integration 2, 163-167. 
f&IF, J. (1983). Logarithmic depth circuits for algebraic functions, in “Proceedings, IEEE 24th 
Found. of Comput. Sci.,” pp. 138-145. 
643 7: 3-R 
282 MEHLHORN AND PREPARATA 
BEAME. P. W., COOK, S. A.. AND HOOVER, H. J. (1984) Log depth circuits for division and 
related problems, in “Proceedings, IEEE 25th Found. of Comput. Sci.,” pp. l-6. 
LUK, W. K., AND VUILLEMIN, J. (1983), Recursive implementation of optimal time VLSI 
integer multipliers, in “VLSI 83,” Trondheim, Norway. 
SPANIOL, 0. (1976). “Arithmetik in Rechenanlagen.” Teubner, Stuttgart. 
MEHLHORN, K., AND PREPARATA, F. P. (1983) AT’-optimal VLSI integer multiplier with 
minimum computation time, Icfornr. und Confrol58, Vol. 1-3, 137-156. 
KNUTH. D. E. (1981) “The Art of Computer Programming, Vol. 2., Seminumerical 
Algorithms,” 2nd. ed.. Addison-Wesley, Reading, Ma. 
LEIGHTON, F. T. ( 1985) personal communication, May. 
