A Gate-Array Realization of an Algorithm for Division by Abbasi, Salman Y.
University of Central Florida 
STARS 
Retrospective Theses and Dissertations 
1984 
A Gate-Array Realization of an Algorithm for Division 
Salman Y. Abbasi 
University of Central Florida 
 Part of the Engineering Commons 
Find similar works at: https://stars.library.ucf.edu/rtd 
University of Central Florida Libraries http://library.ucf.edu 
This Masters Thesis (Open Access) is brought to you for free and open access by STARS. It has been accepted for 
inclusion in Retrospective Theses and Dissertations by an authorized administrator of STARS. For more information, 
please contact STARS@ucf.edu. 
STARS Citation 
Abbasi, Salman Y., "A Gate-Array Realization of an Algorithm for Division" (1984). Retrospective Theses 
and Dissertations. 4679. 
https://stars.library.ucf.edu/rtd/4679 
A GATE-ARRAY REALIZATION 
OF AN ALGORITHM FOR DIVISION 
BY 
SALMAN YOuSEF ABBASI 
B.S.E., Fairleigh Dickinson University, 1978 
RESEARCH REPORT 
Submitted in partial fulfillment of the requirement s 
for the degree of Master of Science in Engineer ing 
in the Graduate Studies Program of the College of Engineering 
University of Central Florida 
Orlando, Florida 
Spring Term 
1984 
ABSTRACT 
A realization of a division algorithm suitable for high speed 
pipeline and realtime processors is presented Implementation of 
the divide algorithm can be achieved by utilizing LSI I VLSI gate-
array technology. The divider performs precision, high speed 9 bit 
sign magnitude division. The design consist of combinational logic, 
where input and output data are latched into input and output regis-
ters. Data propagates through 16 divide stages. The n'th stage gener-
ates the n 'th quotient bit upon receiving the updated dividend and 
controls from the previous stage. A simulation program is developed 
to verify the algorithm, and an analysis for speed performance and 
cost is provided. Other division algorithms are discussed. 
TABLE OF CONTENTS 
LIST OF FIGURES 
I. INTRODUCTION 
Important Parameters 
Impact Of New Technology 
II. DIVIDER GATE-ARRAY REALIZATION 
Performance 
Algorithm 
Divider Block Diagram • 
Input/Output Registers 
Normalization Block 
Scaling Control 
Sign Generator 
Divide Block 
First Type Divide Cell 
Second Type Divide Cell 
Example 
Propagation Delay 
LSI Implementation 
Software Simulation • 
Other Divide Algorithms 
Algorithm 1 
Algorithm 2 
III. CONCLUSION 
REFERENCES 
ii i 
iv 
l 
2 
3 
4 
4 
4 
6 
6 
8 
8 
10 
12 
12 
15 
18 
25 
25 
27 
27 
27 
33 
38 
39 
LIST OF FIGURES 
1. Input/Output Data Format 
2. Divide Block Diagram 
3. Normalized Logic Block Diagram 
4. Scaling Control Diagram • • • • . . . 
5. First Type Divide Cell Diagram 
6. 8 Bit Subtracter Block Diagram 
7. 3:1 Mux Logic Diagram • 
8. Second Type Divide Cell Diagram • 
9. 2:1 Mux Logic Diagram . . . . 
10. Software Simulation Program 
11. Algorithm 1 Block Diagram . . 
12. Algorithm 2 Pipeline Diagram 
13. Pipeline Stage Divider Block Diagram 
iv 
. . . . 
. . . 
5 
7 
9 
. . . . . • 11 
• 13 
. . . . • 14 
• • 16 
• • • 17 
. . • 19 
. . . . . • • • 2 7 
. . . . • • 32 
• 34 
• 35 
I. INTRODUCTION 
Many digital systems perform one or more of the binary operations 
of addition, subtraction, multiplication, and division. All arithmetic 
functions can be related to addition; subtraction is performed by 
adding complemented numbers, multiplication is repeated addition, and 
division is repeated subtraction. Thus, all these arithmetic opera-
tions could be performed in a central processor unit consisting of a 
binary adder and control logic. 
These operations could be performed with either a serial or par-
allel circuit arrangement. A serial arrangement requires considerably 
less hardware than a parallel arrangement. However, if extremely fast 
arithmetic is required, parallel methods must be used. Multiplication 
and division, when performed by repeated additions and subtractions, 
consume several processing cycles. For fast systems, such as real 
time and pipeline processors, special hardware units are used to per-
form multiplication and division. 
Arithmetic operations vary in their complexity of implementations. 
Addition and subtraction are fairly simple operations and are easy to 
implement. Multiplication and division are complicated functions and 
are difficult to implement. Multiplication and division are basically 
iterative processes. The first involves inspecting the multiplier dig-
its one at a time, shifting one place to the left when the inspected 
2 
digit equals one, and then adding the individual partial products. 
The second involves a quotient digit selection, a multiplication, and 
a subtraction during each stage. 
Computers generally break multiplication and division operations 
down into simpler functions such as add, subtract, or shift. Existing 
arithmetic logic units (ALUs) can perform these operations under 
program control. To accomplish division in minimum time and without 
a sizeable increase in hardware, the IBM 360/91 uses its fast pipeline 
multiply hardware for iterations in the division process. The Cray-! 
computer performs division by forming the reciprocal of the divisor 
using its reciprocal pipe and then passing the reciprocal along with 
the dividend through the multiply pipe. Most logic designers use a 
method similar to the one used in the Cray-1 computer. A PROM look-
uptable is used to compute the reciprocal of the di visor, then the 
reciprocal is multiplied by the dividend. 
Important Parameters 
Some of the important parameters to be considered in implementing 
arithmetic algorithms are speed, cost, and accuracy. IBM 360/91 and 
Cray-1 computers share major components (pipelines and multiplers) 
to perform division. Other computers utilize ALUs. Computers and 
pipeline processors that multiply the reciprocal of the divisor by the 
dividend need special hardware to compute the reciprocal, and thus 
share 1 pipe between multiplication and division. The trade-off be-
tween speed and cost varies from on.e system to another and could be 
3 
optimized to meet system specifications. The throughput for a comput-
er with a special division pipe is higher than the throughput for a 
computer which shares its multiply pipe. However, the hardware cost 
is higher when a special division pipe is added. 
Impact of New Technology 
Technological improvements in fabricating integrated circuits 
from silicon are being made every day. Today's technology produces 
smaller and smaller electronic components performing increasingly 
complex electronic functions at ever higher speed and lower cost. 
Circuit designers are leaving the microscopic world of PC-board-mount-
ed MSI based logic and entering the microscopic realm of the system on 
a chip. Gate-array technology has been fully opened to designers. 
Manufacturers now offer comprehensive computer aided design (CAD) 
packages to help designers realize their gate-array requirements in 
any technology: ECL for speed, CMOS for efficiency, ~MOS or TTL for 
anything in between. Gate-arrays exist in three primary configura-
tions: 1) basic gate-array only, 2) semicustom and 3) custom arrays. 
Basic-gate-only populated chips are the easiest to employ, incorporat-
ing nothing more then NAND and NOR gates. Semicustom gate-arrays are 
for more complex designs, incorporating medium complexity modules such 
as JK and/or D flip flops and multiple input AND, OR, and XOR gates. 
Full-custom arrays can accomplish virtually any logic realization. 
II. DIVIDER GATE-ARRAY REALIZATION 
Performance 
The gate-array design presented in this report emphasizes high 
speed performance and accurate division. The divider circuit accepts 
a sign magnitude dividend and divisor and produces a sign magnitude 
quotient. The design is configured for an 8 bit dividend plus 1 
bit dividend sign, and an 8 bit divisor plus 1 bit divisor sign. The 
output quotient is 16 bits plus a sign bit. Figure l shows the input 
and output data formats. 
Algorithm 
The divider design is based on the hand calculation division 
algorithm. If there is a 4 bit dividend and a 4 bit divisor, the 
quotient can be calculated as follows: 
dividend = 1001 (4 bit binary number) 
divisor = 0010 (4 bit binary number) 
100.1 
0010~ 
10 
000 
00 
01 
00 
10 
10 
00 
quotient = 100.1 (4 bit binary number) 
4 
s 
MSB LSB MSB LSB 
s B7 B6 BS B4 B3 B2 Bl BO s B7 B6 BS B4 B3 B2 Bl BO 
I-
H 
Divisor 
H Dividend 
Dividend Si n g Divisor Si n g 
Input Data Formats 
MSB LSB 
S BlS Bl4 Bl3 Bl2 Bll BlO B9 B8 B7 B6 BS B4 B3 B2 Bl BO 
H 
Quotient 
----- Quotient Sign 
Output Data Format 
Figure 1. Input I Output Data Format 
6 
One important step in realizing the basic algorithm for division 
is to normalize the divisor, that is, to shift the divisor left until 
the MSB bit becomes 1. For example, the 8 bit binary number 00011010 
becomes 11010000. Since the number of shifts for a given divisor is 
not known, the quotient scaling cannot be determined. The quotient 
scaling problem is corrected by monitoring all leading zero bits of 
the divisor before normalization and generating internal control 
signals to force quotient MSB bits to zero as explained later in this 
section. 
Divider Block Diagram 
The divider block diagram is shown in Figure 2 and consists of 
the following major blocks: 
1. Input/output registers -- latches the input and output data. 
2. Normalization block 
becomes 1. 
shifts the divisor until the MSB bit 
3. Scaling control -- generates internal control signals to adjust 
for scaling problems resulting from divisor normalization. 
4. Sign generator -- generates quotient sign. 
5. Divide block -- stages of the process of division. 
Input/Output Registers 
Input and output registers are used as storage devices. Input 
and output data are latched simultaneously. The input and output 
maximum clock rate must be greater than the maximum propagation delay. 
D
iv
id
en
d 
In
pu
t 
9 
Re
g 
~
C
k
 
D
iv
is
o
r 
In
pu
t 
C
lo
ck
 
Re
g 
Ck
 
D
iv
is
or
 S
ig
n 
o
rm
a
li
ze
 
B
lo
ck
 
S
ca
le
 C
o
n
tr
o
l 
B
lo
ck
 
D
iv
id
e 
B
lo
ck
 7
 
F
ig
ur
e 
2.
 
D
iv
id
e 
B
lo
ck
 D
ia
gr
am
 
Re
g 
•
 
-
•
c
k 
16
 
Re
g 
•
 
-
•
c
k 
Q
uo
tie
nt
 
Si
sm
 
Q
uo
tie
nt
 
16
 
.
.
.
.
.
.
.
i 
8 
Normalization Block 
The divisor is normalized by shifting left until the MSB bi t 
becomes 1. Consider an 8 bit divisor Do through D7, where Do is the 
LSB bit and D7 is the MSB bit. The 8 bit normalized divisor Do ' 
through D7' , where Do' is the LSB bit and D7' is the MSB bit, wil l 
be determined by the following logic equations: 
Do' = Do D7 
D1' = D1 D7 + Do D6 D7 
--D2' = D2 D] + D1D6 D] + Do Ds D6 D] 
D3' = D3 D7 +Dz D6 D7 + D1 D5 D6 Dl +Do D4 Ds D6 Dl 
----
+ Do D3 D4 Ds D6 D1 
---- -----+ Dz D) D4 D5 D6 D] + Di Dz D3 D4 D5 D6 D] 
Figure 3 shows a diagram of the normalization logic. 
Scaling Control 
Scaling control signals are generated by monitoring the leading 
zero bits of the divisor and are used to force t he associated quotient 
bits to zero. Also, the dividend for the following s tage is selected 
"
\ 
D7
-D
O ~
 
~
 
l 
D6
 
I 
D7
 
D6
 
DS
 
p~ 
I )
-
1 
n
1 
)- )- )-
~
 
D7
 
~
 
"'
 
~ 
-
F
ig
ur
e 
3
.
 
N
or
m
al
iz
ed
 L
og
ic
 B
lo
ck
 D
ia
gr
am
 
l..
 D
S t Dl
 
I )
-
1 
tD6
 -
LY
 
l It :
:-1 
)-
D
l 
D7
 
D2
 
D6
 
~
 
~
 
~
 
l£Z
 
~
 
,.
..
..
 
~
 
DO
' 
10 
to be the same as the dividend for the current stage if the current 
scaling control signal is active. Consider a 16 bit quotient Q7 
through Q_s, with Q1 as the MSB bit and Q-a as the LSB bit and F7 
through F1 are the associated controls for quotient bits Q7 through 
Q1• No control bits are needed for quotient bits Qo through Q_8 • 
F7 through F1 are generated using the following logic equations : 
F4 = D7 D6 D5 D4 
F5 = Dl D6 D5 D4 D3 
------F6 = D7 D6 D5 D4 D3 Dz 
-------F7 = D7 D6 D5 D4 D3 Dz D1 
Figure 4 shows a logic diagram of the scaling controls F1, through F7 • 
Sign Generator 
Quotient sign is derived from dividend and divisor sign bi ts as 
shown in the following table: 
Dividend Divisor Quotient 
Si n Si n Si n 
0 0 0 
0 1 1 
1 0 1 
1 1 0 
where: 0 is positive and 1 is negative 
Note: The dividend, divisor, and quoient are sign magnitude numbers. 
11 
Fl 
D7 
D7 
F2 
D6 
D7 
D6 F3 
D5 
D7 
D6 
F4 
D5 
D4 
D7 
D6 
D5 FS 
D4 
D3 
D7 
D6 
D5 
D4 F6 
D3 
D2 
D7 
D6 
D5 F7 
D4 
D3 
D2 
Dl 
D7-Dl 
7 
Figure 4. Scaling Control Diagram 
12 
Divide Block 
Inputs to this block are an 8 bit dividend, No through N7, and 
an 8 bit normalized divisor, Do' through D1'. The output . is a 16 
bit quotient Q_8 through Q7. The divide block contains 16 divide 
cells, allowing one cell for every divide stage. The first 7 divide 
cells are associated with the 7 MSB bits of the quotient, the remaining 
9 cells are slightly different from the first 7 and are associated with 
the 9 LSB bits of the quotient. 
First Type Divide Cell. Figure 5 shows a block diagram of the first 
type divide cell. This type is used in the first seven divide stages. 
The terms shown on the block diagram are defined as follows: 
n - Stage identifier, where n goes from 7 to -8. 
Nn - Dividend for the n'th stage. 
Fn - Scaling control for the n'th stage. 
D' - Normalized divisor. 
OVn - Overflow bit for the n'th stage. 
Bn - Borrow bit resulted from the n'th stage. 
Qn - Quotient output from the n'th stage. 
The following steps are performed in every stage: 
Step 1: D' is subtracted from Nn, the result is multiplied by 2 
by shifting left 1 bit, and is forwarded to a mux logic to be used 
for selecting Nn-l (Nn for the next stage). Also, a borrow bit 
is generated if the result of the subtraction is negative. Figure 6 
shows an 8 bit subtractor with borrow in and out bits. 
Nn
 
D
' 
Bn
 
Qn
 
ov
n 
Fn
 
F
ig
ur
e 
S.
 
F
ir
st
 T
yp
e 
D
iv
id
e 
C
el
l 
D
ia
gr
am
 
B
R
I 
BO
 
AO
 
B
l 
A
l 
B
2 
A
2 
B
3 
A
3 
B
4 
A
4 
B
S 
A
S 
B
6 
A
6 
B7
 
A7
 
l ITT
! tfB
 tfB
 tfB
 W
B 
WU
 ~ 
tfB
 
N
ot
e:
 
C 
=
 
A
 -
B
 
B
R
I 
is
 B
or
ro
w
 i
n
 
BR
O 
is
 B
or
ro
w
 o
u
t 
tal 
I 
I 
11'
1 
r 
1 
t 
1 
di
 I 
11 
d 
11 
11
ti1
11
11
 
~ 
,11
-..
.
.
 
_
_
 
_
.
.
_
 
:I
 
'JIC
J ~I
~ O
 tllC
J "
' 
1rJ
ICJ
 ~
 
tJ 
t 
1 
~ 
.
I 
' 
Cl
) 
C
l 
C 
C3
 
C4
 
00
18
0 
.
 
10
 
e
 
M
 v 
cs
 
C6
 
.
.
.
.
.
.
.
.
 J 
I-
~·~
~ 
J)JY
YIY 
I
~
~
 
e C7 
F
ig
u
r
e
 
6.
 
8 
Bi
t~
Su
bt
ra
ct
or
 B
lo
c
k 
Di
a
gr
am
 
.
.
.
.
.
.
.
.
.
.
.
 
oc
 
.
_
_
_
 
-
' 
BR
O 
.
.
.
.
.
.
 
+:
--
15 
Step 2: A logical AND is performed between Bn and OVn and the 
result is used along with Fn as a mux control to select Nn-l as 
follows: 
Fn (Sl) 
1 
1 
0 
Bn. OVll (SO) 
0 
1 
1 or 0 
Out ut 
The OVn bit in the above table is used to detect the overflow at 
the mux output as a result of the 1 bit left shift to generate 2Nn• 
If the OVn bit is set and a Bn bit is generated, then OVn will force 
Bn to zero since it is not a true borrow. Whenever Fn is logic 0 the 
mux selected output is Nn. Thus, as long as the Fn flag is 0, the 
dividend will be passed to the next stage without any modifications, 
and the output Qn will be forced to zero. If Bn is 0, OVn is O, and 
Fn is 1, then 2(Nn - D') is selected and Qn is set to 1. If Bn is 
1, OVn is 0, and Fn is 1, then 2Nn is selected and Qn is set to O. 
Figure 7 presents a logic diagram of the 3:1 mux along with the select 
table. 
Second Type Divide Cell. Figure 8 presents a block diagram of the 
second type of divide cell. This cell is used in the remaining nine 
divide stages and functions in a fashion similar to the previous cell 
with the exception of an absence of the fn control bit. 
16 
,.....j 
I cU 
c 
..0 C) 
:z 
0 
(/) 0 ,.....j >: 
,.....j 
(/) 
..-i ..-l 0 
c 
~ ,_ 
0 .. 
. Cl c c I 
i:Q ~ c II II :z c 0 ,.....j 
'-' z c (/) (/) N N z 
cU .0 C) 
c 
z 
N 
........ 
I 
~ 
0 
........ 
I 
c 
z 
......-! x 
•• ;:::3 
N ~ 
-.. 
.. 
0 
I 
c 
z 
'-' 
N 
<G 
a:> 
Cl) 
,.Cl 
p'.:l 
I 
<G 
a:> 
... 
0 
17 
c 
p'.:l 
c 
O' 
I~ 
c 
p:i 
@ 
H 
bO 
m 
"T"1 
0 
r-l 
r-i 
Cl.I 
u 
Cl.I 
'"O 
•r-f 
::> 
•r-f 
0 
Cl.I 
0.. 
:>. 
E-i 
'"O 
c 
0 
CJ 
Cl.I 
Cl) 
a:> 
Cl.I 
H 
;:::3 
bO 
•r-f 
c ~ 
:>-
0 
18 
The mux selects one of the two values 2Nn or 2(Nn - D') as follows: 
Bn. OVn (S) 
0 
1 
Output 
If OVn is set and a Bn bit is generated from the previous stage then 
the OVn bit will force Bn to zero since it is not a true borrow. 
Qn is determined as follows: 
Bn. OVn 
0 
l 
Qn 
1 
0 
Figure 9 shows a logic diagram of a 2: 1 mux along with the select 
table. 
Example 
An example is helpful at this point to clarify the divide 
algorithm by going through it step by step. 
Input: 
Output: 
Dividend (N) = 10011101 = (157)10 
Dividend Sign = l (negative) 
Divisor (D) = 00001001 = (9)10 
Divisor Sign = 0 (positive) 
Quotient (Q) = N/D 
19 
.-j 
~ ,..0 cO 
z 
.-l 
0 .-l ~ 
N 
~ ,,-.. 
-0 0 
. I 
i::: i::: 
i:Q z i::: 
II .......... z 
CJ) N N 
,..0 cO 
20 
Q sign = (Dividend Sign) XOR (Divisor Sign) 
Step 1: 
Determine Q sign = (1) XOR (0) = 1 (negative) 
Step 2: 
Normalize D to get D' according to the logic equations given 
previously (see "Normalization Block" ). 
D' = 10010000 
Step 3: 
Generate scaling controls Fi through F'7 according to the logic 
equations ( see "Scaling Control" ). 
F1 = 1 
Fz = 1 
F3 = 1 
F4 = 1 
F5 = 0 
F6 = 0 
F7 = 0 
Step 4: 
Pass through 16 divide stages to compute Q 
1st Divide stage, n=7 
Q7 = F7. (B7. OV7) = 0 (F7 0) MSB bit 
2nd Divide stage, n=6 
Q6 = F 6• (B6• OV6) = 0 (F6 = O) 
3rd Divide stage, n=5 
Q5 = F 5• (B5. OV5) = 0 (F5 = U) 
4th Divide stage, n=4 
N4 = 10011101 
D' = 10010000 
00001101 subtract 
21 
B4 = 0 
F4 = 1 
OV4 = 0 
Q4 = F4• (B4. OV4) = 1 
N 3 = 00011010 
OV3 = 0 
5th Stage, n=3 
N3 = 00011010 
D' = 10010000 subtract 
10001010 
B3 = 1 
' 
F3 = 1 ,OV3 = 0 
Q3 = F3. (B3. OV3) = 0 
Nz = 00110100 , OVz = 0 
6th Stage, n=2 
Nz = 00110100 
D' = 10010000 Subtract 
10100100 
Bz = 1 ,Fz = 1 ,OV2 = 0 
Q2 = F2• (Bz .OVz) = 0 
N1 = 01101000 , OV1 = 0 
7th Stage, n=l 
N1 = 01101000 
D' = 10010000 Subtract 
11011000 
B1 = 1 , F1 = 1 , OV1 = 0 
No = 11010000 , OVo = 0 
8th Stage, n=O 
No = 11010000 
D' = 10010000 Subtract 
01000000 
Bo = 0 , OVo = 0 
Qo = (B0 • ov0) = 1 
N-1 = 10000000 
9th Stage, n=-1 
N-1 = 10000000 
D' = 10010000 Subtract 
11110000 
22 
N_2 = 00000000 , OV_z = 1 
10th Stage, n=-2 
N_z = 00000000 
D' = 10010000 Subtract 
01110000 
Q-2 = (Bz. OV_z) = 1 
N-3 = 11100000 
11th Stage, n=-3 
N_3 = 11100000 
D' = 10010000 Subtract 
01010000 
23 
, OV_3 = 0 
N_4 = 10100000 , ov_4 = o 
12th Stage, n=-4 
N-4 = 10100000 
D' = 10010000 Subtract 
00010000 
B_4 = 0, OV_4 = 0 
G-4 = (B_4 .ov_4 ) = 1 
N_5 = 00100000, OV_5 = 0 
13th Stage, n=-5 
N_5 = 00100000 
D' = 10010000 
10010000 
Subtract 
B_5 = 1 , OV_5 = 0 
G-s = (B-5. OV-5) = 0 
N_6 = 01000000 ov_6 = 0 
14th Stage, n=-6 
N_6 = 01000000 
D' = 10010000 Subtract 
10110000 
B-6 = 1 
' 
ov_6 = 0 
G-6 = (B-6• ov_6) = 0 
N_7 = 10000000 ,OV_7 = 0 
15th Stage, n=-7 
N_7 = 10000000 
D' = 10010000 Subtract 
11110000 
B_7 = 1 , OY-7 = 0 
N_8 = 00000000, OV_8 = 1 
16th Stage, n=-8 
N_8 = 00000000 
D' = 10010000 Subtract 
01110000 
, ov_8 = 1 
Result 
24 
Qo Q-1 Q_8 
Q = 0 0 0 1 0 0 0 1 0 1 1 1 0 0 0 1 
Q = 17.44 
For comparison 
Q = N/D = 157/9 = 17.44 
25 
Propagation Delay 
The propagation delay in every stage is equa l t o the s um of 
delays through an 8-bit subtracter, 3:1 mux for the first seven s tages 
and 2:1 mux for the last nine stages, and an AND gate. The total 
propagation delay through the gate-array divider could be calculated 
as follows: 
1. Delay through the 8-bit subtracter ( see "Subt racter Bl ock 
Diagram"). 
2. 
3. 
4. 
5. 
5 gate levels ( XOR gate = 2 levels ) X 1 nsec X 15 stages = 75 
nsec. 
Delay through the 3:1 mux ( see .. 3: 1 Mux Block Diagram'') . 
3 gate levels X 1 nsec X 7 stages = 21 nsec. 
Delay through the 2:1 mux ( see .. 2: 1 Mux Block Diagram" ). 
3 gate levels X 1 nsec X9 = 27 nsec. 
Delay through last stage. 
7 gate levels (subtracter and gates) X 1 nsec = 7 nsec. 
Delay through input latches. 
2 gate levels x 1 nsec = 2 nsec. 
Total delay = 75 + 21 + 27 + 7 + 2 = 132 nsec. 
Note: A 1 nsec internal gate propagation delay was used in the 
calculation. Applied Micro Circuits Corporation (AM.CC ) and Texas 
Instruments produce gate-arrays using ECL t echnology which 
achieves such speed. The propagation delay may be increased by a 
few nsec if internal drivers are added to buffer loaded signals . 
LSI Implementation 
To implement this algorithm for · an 8 bit di vidend , 8 bit divisor 
and a 16 bit quotient, a 2000 to 3000 gate gate- array is needed. 
Approximate gate number calculations are l isted in table 1. 
26 
TABLE l 
GATE-ARRAY SIZE CALCULATION 
I 
Function No. Gates No. Of functions Total 
8 bit subtract 76 16 1216 
9 bits (3:1 MUX) 54 I 7 378 
9 bits (2:1 MUX) 36 8 288 
D Normalizer 42 1 42 
Registers I/O 160 1 160 
I l 
Other gates 27 1 27 
--
Total - - 2111 
I 
27 
I I INSTRUCTION SOURCE CODE I 
ADDRESS BYTE BYTE BYTE LABEL OP OPERAND COMMENTS 
l 2 3 CODE 
2000 3A BO 20 LDA 20BO Get Divisor 
2003 47 LOCJPl MOV B,A MOVE A TO B 
2004 37 STC SET CARRY BIT = 1 
2005 3F CMC COMPLEMENT CARRY BIT 
2006 17 RAL SHIFT DIVISOR LEFT 
2007 D2 03 20 JNC LOOPl JUMP NO CARRY 
200A 78 MOV A,B MOVE B TO A 
200B 32 Bl 20 STA 20Bl STORE A 
200E 21 BB 20 LXIH 20B8 LOAD Fl 
2011 16 07 MVl D,07 INITIALIZE COUNTER 
2013 06 01 MVl B,01 MOVE 01 TO B 
1015 3A BO 20 LDA 20BO LOAD DIVISOR 
1018 2F CMA COMPLEMENT A 
2019 4F MOV C,A MOVE A TO C 
201A 79 LOOP2 MOV A,C MOVE C TO A 
201B 07 RLC GENERATE F CONTROLS 
201C 4F M.OV C,A TO NORMALIZE QUOTIENT 
201D AO ANA B MASK B WITH A 
201E A6 ANA M MASK M WITH A 
I 201F 23 INX H I NCREMENT HL 
2020 77 MOV M,A MOVE A TO l-'l 
I 2021 15 DCR D IS COUNT = 0 
2022 C2 JNZ LOOP2 NO GO TO LOOP 2 I 
2025 00 NOP YES I 
2026 OE MVI C,8 MOVE 8 TO C 
2028 16 00 MVI D,O CLEAR D 
202A 3A Bl 20 LDA 20Bl LOAD DIVISOR 
202D 47 MOV B,A MOVE A TO B 
202E 21 BF 20 LXIH 20BF LOAD F CONTROL I 
2031 37 LOOP3 STC CLEAR CARRY 
2032 3F CMC COMPLEMENT CARRY 
2033 7E MOV A, M MOVE M TO A 
2034 DE 00 SBI 0 IS F CONTROL =l 
2036 CA 66 20 JZ LOOP6 YES GO TO LOOP 6 
2039 3A B2 20 LDA 20B2 NO, LOAD DI VISOR 
203C 90 SUB B IS N- D NEGATIVE 
203D DA 59 20 JC LOOPS YES, GO TO LOOP 5 
2040 lE 01 MVI E,01 NO, MOVE 01 TO E 
2042 57 LOOP4 MOV D,A MOVE A TO D 
2043 37 STC CLEAR CAR.RY 
2044 3F CMC COMPLEMENT CARRY 
2045 17 RAL MULTI PLY D BY 2 
2046 32 B2 20 STA 20B2 STORE DIVIDEND 
Figure 10. Software Simulation Program 
28 
INSTRUCTION SOURCE CODE 
ADDRESS BYTE BYTE BYTE LABEL OP OPERAND COMMENTS 
1 2 3 CODE 
2049 7A MOV A,D MOVE D TO A 
204A 07 RLC ROTATE LEFT 
204B E6 -01 AN! MASK LSB BIT 
204D 57 MOV D,A MOVE A TO D 
204E 3A B3 20 LDA 20B3 LOAD Q MSB 
2051 17 RAL ROTATE LEFT 
2052 83 ADD E ADD A TO E 
2053 32 B3 20 STA 20B3 STORE Q MSB 
2056 C3 6D 20 JMP LOOP7 GO TO LOOP 7 
2059 SA LOOPS MOV E,D MOVE D TO E 
205A 15 DCR D IS D = 0 
205B CA 42 20 JZ LOOP4 YES GO TO LOOP 4 
205E 3A B2 20 LDA 20B2 NO, LOAD DIVIDEND 
2061 lE 00 MVI E,O MOVE 0 TO E 
2063 C3 42 20 JM.P LOOP4 GO TO LOOP 4 
2066 3A B3 20 LOOP6 LDA 20B3 LOAD Q MSB 
2069 17 RAL MULTIPLY BY 2 
206A 32 B3 20 STA 20B3 STORE Q MSB 
206D 2B LOOP7 DCX H MOVE POINTER 
206E OD DCR c IS COUNT = 0 
206F C2 31 20 JNZ LOOP3 NO, GO TO LOOP 3 
2072 00 NOP NO OPERATION 
2073 00 NOP NO OPERATION 
2074 00 NOP NO OPERATION 
2075 OE 08 MVI C,8 YES, MOVE 8 TO C 
2077 37 LOOP8 STC CLEAR CARRY I 2078 3F CMC COMPLEMENT CARRY 
2079 3A B3 20 LDA 20B2 LOAD DIVIDEND 
207C 90 SUB B SUBTRACT DIVISOR 
207D DA 99 20 JC LOOPlO BORROW=! GO TO LOOPlO 
2080 lE 01 MVI E,01 BORROW=O MOVE 1 TO E 
2082 57 LOOP9 MOV D,A MOVE A TO D 
2083 37 STC CLEAR CARRY 
2084 3F CMC COMPLEMENT CARRY 
2085 17 RAL MULT DIVIDEND BY 2 
2086 32 B2 20 STA 20B2 STORE DIVIDEND 
2089 7A MOV A,D MOVE D TO A 
208A 07 RLC ROTATE DIVIDEND LEFT 
208B E6 01 ANI 01 MASK THE LSB BIT 
208D 57 MOV D,A MOVE A TO D 
208E 3A B4 20 LDA 20B4 LOAD Q LSB 
2091 17 RAL ROTATE LEFT 
2092 83 ADD E ADD A TO E 
2093 32 B4 20 STA 20B4 STORE Q LSB 
Figure 10. (Continued) 
29 
INSTRUCTION SOURCE CODE 
ADDRESS BYTE BYTE BYTE LABEL OP OPERAND COMMENTS 
1 2 3 CODE 
2096 C3 A6 20 JMP LOOPll GO TO LOOP 11 
2099 SA LOOPlO MOV E,D MOVE D TO E 
209A 15 DCR D IS D =O 
209B CA 82 20 JZ LOOP9 YES , GO TO LOOP9 
209E 3A B2 20 LDA 20B2 NO, LOAD DIVIDEND 
20Al lE 00 MVI E, O MOVE 0 TO E 
20A3 C3 82 20 JMP LOOP9 GO TO LOOP9 
20A6 OD LOOPll DCR c IS COUNTER =O 
20A7 C2 77 20 JNZ LOOP8 NO, GO TO LOOPS 
20AA CF RS! YES, STOP 
20BO Divsor 
20Bl NDiv 
20B2 Divid 
20B3 QMSB 
20B4 QLSB 
20B5 
20B6 
20B7 
20B8 01 Fl 
I 20B9 F2 
20BA F3 
20BB F4 
20BC FS 
20BD F6 
20BE Fl 
20BF FB 
I 
Figure 10. (Continued) 
30 
Software Simulation 
The computer program in Figure 10 provides a software s i mulation 
to verify the design of the divide algorithm. The program was writ t en 
for the SDK-85 microprocessor board made by Intel. Several combina t ions 
of the dividend (N) and the divisor (D) were used, and t he r esul t s we r e 
recorded in Table 2. 
Other Divide Algorithms 
There are a number of interesting division algorithms pr esented 
in the referenced literature. These algorithms perform divis i on in 
sequential or combinational logic. Two of these divide algor ithms 
are highlighted below. 
Algorithm 1 
In an article titled "Analysis of Speed of a Binary Divider Using 
a Variable Number of Shifts per Cycle" by M. R. Patel and K. H. 
Bennett (1977), an analysis for a variable number of shifts per cycle 
divider is presented. The article indicates that the speed may be 
increased further by providing multiples of t he divisor which are 
negative integral powers of two ( + 0.5 x divisor , + 0. 25 x divisor) . 
- -
A dividend prefix number is used in this algorithm. The prefix number 
is inspected every cycle to decide whether to enter t he ADD . SHIFT mode 
or the ADD.SHIFT mode. Algorithm implementation is by means of sequen-
tial logic. A block diagram of the algorithm impl ementation is shown 
in Figure 11. 
31 
TABLE 2 
RESULTS OF THE SOFTWARE SI~ULATION 
N D Q 
(HEX) (HEX) (HEX) 
OF 03 05.00 
08 03 02.AA 
09 02 04.80 
AA 05 22.00 
AB 05 22.33 
05 45 00.12 
05 OA 00.80 
05 14 00.40 
15 14 01.0C 
7F FF 00. 7F 
80 FF 00.80 
32 
Pref ix 
Divisor Dividend 
Sign 
Adder 
. I Quotient 
,, r 
Buffer Buffer 
Final 
u Quotient 
Figure 11. Algorithm 1 Block Diagram 
33 
Algorithm 2 
In an article titled "A Pipelined Processing Unit fo r On-Line 
Division" by M. J. Irwin (1978), a processing unit which could be used 
as one stage of a pipeline fractional processing during di vi sion is 
presented. The unit is capable of operating in a seri al s t and alone 
manner where the operands are supplied one digit per cycle and the 
result is generated one digit per cycle. The unit coul d be l inked 
with other like units for high speed processing as shown in Figure 12 . 
The unit block diagram is shown in Figure 13 and consist of the 
following major blocks: 
1. Input/output registers -- latches input and output data. 
2. Multi-input adder -- processes full precision operands. 
3. Result digit selector -- selects quotient digits. 
4. Residual adder -- calculates residual for the next processing uni t. 
5. Selection network 1 -- generates the required multiples of Dj - 1 
and Qj. 
6. Selection network 2 -- generates nj+delta - Qjdj+delta• 
7. Carry generation -- needed if radix complemen t representation 
of negative numbers is used. 
The dividend and divisor are assumed to be in no rmalized form 
upon input to the unit. The first quotient digit, q1, can be properly 
selected after delta leading digits each of the dividend and divisor 
are known. Thereafter, one new digit of the quotient can be determined 
upon the receipt of one new digit each f rom t he dividend and divisor. 
The quotient digit selector is a Tabl e l ook-up device which implements 
--
-
-
MSB 
Result 
1l Digit 
On-Line 
Fractional 
"Unl,t 
j1 
MSB 
0 erand p 
Digits 
34 
Result 
I I Digits 
On-Line 
Fractional Unit 
-
-
I 
Operand 
Digits 
Serial Data 
Next MSB 
Result 
J Digit 
On-Line 
Fractional ~-111 
Unit 
,. 
Next MSB 
0 erand p 
Digits 
Serial 
Data 
LSB 
Result 
J •Digit 
On-Line 
Fractional 
Unit 
I I 
LSB 
O erand p 
Digits 
Figure r2. · Algorithm 2 Pipelined Diagram 
O
pe
ra
nd
 F
ro
m
 
P
re
vi
ou
s 
I 
I 
be
ta
L
 
S
ta
ge
 
I 
D
· J 
d1
, 
1 REG
 
I D
j-1
 I
JS
el
ec
ti
o1
 
dj
+d
e
lt
a
 
N
et
w
or
k 
111
 
I. 
-
-
c
q
j 
dj
+d
el
t~
 
-
n
j+
de
lta
 
I 
n
i 
,
 
n
j+
de
lt
a 
.
.
.
.
.
 
%0
 
*
 
•
 
'io
 
e
le
c
ti
o
n
 
O
pe
ra
nd
 Fr
~
 R
E
G
 
Ne
tw
o
rk
 
P
re
vi
o
u
s 
112
 
U
ni
t 
Oj
 
R
es
ul
t 
Fr
om
 P
re
vi
ou
s 
U
ni
t 
Di
 
-
•
 R
es
ul
t 
D
ig
it
 
S
el
ec
ti
on
 
•
 Np
· 
~l
~h
a 
M
ul
ti
-I
np
ut
 
A
dd
er
 
•
 
I 
Ca
r
r
y
 
G
en
er
at
io
n 
qj
+l
 
R
es
ul
t 
Fr
om
 
P
re
vi
ou
s 
S
ta
ge
 
-
Qj
+l
 
RE
G 
R
es
ul
t 
to
 N
ex
t 
U
ni
t 
q
j·D
j-1
 
P
j-1
 
R
es
id
ua
l 
A
dd
er
 
IR
es
id
u~
l R
es
id
ua
l 
~
 
RE
G 
to
 N
ex
t 
U
ni
t 
Fi
gu
re
 
13
.
 
P
ip
el
in
e 
S
ta
ge
 
D
iv
id
er
 B
lo
ck
 D
ia
gr
am
 
w
 
V
t 
36 
the SELECT function. It examines alpha most significant d i gits of rpj 
and beta most significant digits of Dj to select the appro priate 
quotient digit, qj+l• Alpha and beta are determined to give sufficient 
precision to the divide algorithm. The complexity of the sel e c tion 
network and adders increases for higher radices. 
There are differences and similarities between Irwi n's pipeline 
stage divider and the proposed gate-array divider . Some of these 
differences and similarities are listed below. 
1. Both units utilize input and output registers. 
2. Both units could be used on-line or as a stand alone divider . 
3. Inputs to the pipeline stage divider are assumed t o be in 
normalized form upon input to the unit. Inputs to the gat e-array 
divider are not required to be in normalized form. Normalization 
for the divisor is done internally. 
4. Pipeline stage divider requires delta operand digits to be present 
in the first divide cycle and an additional digit f or ea ch 
following cycle. Gate-array divider requires all operand digit s to 
be present at the same time. 
5. Pipeline stage divider selects quotient digits by ut ili zing a 
table look up. Gate-array divider selects quotien t digits by 
performing a full comparison between the dividend and divisor (the 
comparison is done by means of operands subtrac t ion and by 
generating control bits which determine the quot i ent digits). 
6. Pipeline stage divider could be implemented f or different radix 
values. Gate-array divider could be implemented f or different 
number of bits. 
A pipeline stage divider with radix r=2 i s functionally equiv-
alent to one gate-array divide cell and 8 of t hes e pipeline stage 
dividers are functionally equivalent to an 8 bit gate- array divider. 
However a pipeline divider with r=8 is not equivalent to an 8 bit 
37 
gate-array divider because one digit consists of 3 bits. For a 
pipeline stage divider with r=2, delta=5, alpha=4, beta=O, and a 
precision of 8 bits, each stage divider will consist of the f~llowing: 
1. 
2. 
3. 
4. 
5. 
Po input register (8 bits). 
Dj input register (8 bits). 
Selection network to provide qjDj-1, since qj could only be l or 0 
an 8 bit 2:1 mux circuit could be used to perform this selection. 
Selection network to provide rpj-1 - qjDj-l for the next stage. 
Multi-input adder to compute' Pj, where: 
Pj <-- r-delta (nj+delta - Qjdj+delta) + rp.+l - qjDj-1 • 
6. Residual adder to calculate rpj-1 - qjDj-1 for the next stage. 
7. Result digit selector which consists of a 32xl table look up. 
8. Output registers for full precision quotient and residual. 
9. Carry generation hadware. 
In comparison with a gate-array divide cell, this pipeline stage 
divider requires extensive hardware. The selection network gets 
more complicated for higher radices and hardware requirement increases 
as well. 
38 
III. CONCLUSION 
A 9 bit sign magnitude divider design has been presented. The 
divider is suitable for high speed pipeline and stand alone processing. 
Size analysis indicates that the divider could be implemented on an 
LSI gate-array circuit, consisting of approximatly 2111 gates. The 
Divider design consists of combinational logic, where input data is 
latched into input registers and output data is latched into output 
registers. The divisor is normalized and then passed to all the 
divide stages, while the dividend is passed only to the MSB divide 
stage. The division process starts in the MSB stage and propagates 
through to the LSB stage with the remainder from the MSB stage becom-
ing the dividend for the next stage and so on. Speed analysis indi-
cates that the maximum propagation delay through the divider stages is 
approximatly 132 nsec. The divider is capable of handling any 9 bit 
sign magnitude numbers (integers and fractions). The design could be 
expanded to handle more bits ( larger numbers). However, the number 
of gates, I/O pins, and the propagation delay will increase when more 
bits are added. 
Algorithm 1 divider is a sequential logic implementation suitable 
for stand alone division • The mean division speed for algorithm 1 
divider is 2.3 bits/cycle. Algorithm 2 is a combinational logic 
implementation suitable for pipeline and stand alone processing. 
There are some similarities and differences btween the gate-array 
divider and the pipeline stage divider ( see Algorithm 2 for details ). 
REFERENCES 
Huffman, G. D. "Gate-array logic." Engineering Design News, 
September 1981, pp. 86-96. 
Irwin, M. J. "A Pipelined Processor Unit for On-line Division." In 
Conference Proceedings--the 5th Annual Symposium on Computer 
Architecture, pp. 24-30. New York: IEEE/ACM, 1978. 
Patel, M. R, and Bennett, K. H. "Analysis of Speed of a Binary 
Divider Using Number of Shifts per Cycle." Computer Journal 21 
(August 1978): 246-52. 
Trivedi, K. s., and Ercegovac, M. D. "On-line Algorithms for Division 
and Multiplication ... IEEE Transactions on Computers 26 
(July 1977): 681-87. 
39 
