Performance evaluation of FPGA implementations of high-speed addition algorithms by Yu, WWH & Xing, S
Title Performance evaluation of FPGA implementations of high-speedaddition algorithms
Author(s) Yu, WWH; Xing, S
Citation
High-Speed Computing, Digital Signal Processing, and Filtering
Using Reconfigurable Logic, Boston, Massachusetts, USA, 20-21
November 1996, v. 2914, p. 26-33
Issued Date 1996
URL http://hdl.handle.net/10722/46578
Rights Creative Commons: Attribution 3.0 Hong Kong License
Performance Evaluation of FPGA Implementations of
High-Speed Addition Algorithms
William W.H. Yu and Shanzhen Xing
Department of Industrial & Manufacturing Systems Engineering
The University of Hong Kong, Pokfiilam Road, Hong Kong
ABSTRACT
Driven by the excellent properties of FPGA's and the need for high-performance and flexible computing machines,
interest in FPGA-based computing machines has increased dramatically. Fixed-point adders are essential building blocks of
any computing systems. In this work, various high-speed addition algorithms are implemented in FPGA's devices, and their
performance is evaluated with the objective of finding and developing the most appropriate addition algorithms for
implementing in FPGA's, and laying the ground-work for evaluating and constructing FPGA-based computing machines.
The results demonstrate that the performance of adders built with the FPGA's dedicated carry logic combined with some
other addition algorithms will be greatly improved, especially for larger adders.
Keywords: FPGA, addition, performance evaluation, carry-ripple adder, carry-completion adder, carry-lookahead adder,
carry-skip adder, carry-select adder
1. INTRODUCTION
Recent studies'9 have demonstrated that the reconfigurable computing systems indeed have the feasibility and potential
for improving the performance of a system by modifying its hardware or architecture by the software in real time to match
the computational characteristics of the individual application. As the densities and speeds of the SRAM-based FPGA's
(Field Programmable Gate Arrays) continue to increase, FPGA-based reconfigurable and custom computing systems have
become one of the hottest research topics in computer science and computer engineering. Fixed-point adders are essential
building blocks in any arithmetic units in a computing system. Their performances or speeds of operation depend on the
carry propagation delay. In order to reduce the worst-case carry propagation times, various high-speed fixed-point addition
algorithms have been studied extensively in the area of designing fixed VLSI processors1123. However, an addition
algorithm optimally implemented in one technology may not necessarily be so in a different technology. The elemental
building blocks are gates in fixed VLSI technology and CLB's (Configurable Logic Blocks) in FPGA devices. The
implementation techniques of addition algorithms in the two technologies are different and thus their performance and cost
parameters. Therefore, this work undertakes the performance evaluation of various available addition algorithms
implemented in Xilinx 4000 series devices in an effort to determine their suitability to FPGA's. The paper aims to lay some
ground-work for evaluating and constructing FPGA-based reconfigurable computing systems.
2. BASIS OF PERFORMANCE EVALUATION
In fixed VLSI technology, the assessment of different addition algorithms is usually based on the conventional analytic
approach. The gate-count model is used for arealcost evaluation and the gate-delay units model for operational time
evaluation. However, with FPGA's technology, the gate numbers and the gate delay units serve no useful purposes in
performance and cost evaluations, because the basic functional units in FPGA's are CLB's rather than the basic logic gates in
the fixed VLSI.
In order to evaluate FPGA's implementations of different addition techniques, the evaluation should more appropriately
be based on the features of FPGA's. Each FPGA device'0 includes three major reconfigurable elements: configurable logic
blocks (CLB's), I/O blocks (lOB's), and interconnections. These elements all contribute to the propagation delay of the
processing unit implemented in the FPGA devices. The JOB's provide the interface between the internal and external
signals. The programmable interconnect resources connect the inputs and outputs of the CLB's and JOB's into an appropriate
26 /SPIE Vol. 2914 O-8194-2316-5/96/$6.OO
network. Any XC4000 series CLB is capable of implementing up to two four-variable or one nine-variable logic frmnctions.
The logic functions in a CLB is accomplished in a table-look-up operation. Any logic functions of more than nine variables
require two or more CLB's. Generous on-chip buffering makes block delays insensitive to loading by the interconnect
structure, though all interconnect delays are layout-dependent. The total propagation delay depends of the amount of
resources used.
To evaluate the performances of the different addition algorithms, the parameters: operation time (T), cost I area (C)
and performance:cost ratio (ri)will be examined. Obviously, the cost/area (C) ofa design implemented in FPGA's should be
calculated in terms of the number of required CLB's. The operational time (T) is obtained from the timing simulation results
with Xilinx software which uses the actual block and routing delay times from the routed design. The simulation results
allow a much more accurate assessments of the behaviors of the implementations under worst-case conditions to be made.
The performance:cost ratio (ri) is defined as the reciprocal of the cost multiplied by the operational time in this work.
Therefore, the comparison of FPGA's implementations of different addition techniques can be based on the value of r. If
one technique has a larger value of r, itwill be considered to be better than another.
3. EXISTING ADDITION ALGORITHMS AND THEIR FPGA IMPLEMENTATIONS
Based on the way carries are propagated, the classical high-speed fixed-point addition algorithms mainly include carry-
ripple, carry-completion, carry-skip, carry-lookahead, and carry-select addition algorithms"'4. In order to reduce the
cost/area and the carry-propagation time or both of them, numerous variations of these classical approaches have been
studied1523 and newer ones are still being developed and implemented with the fixed VLSI technology.
The different addition algorithms have been implemented with different part-types of the widely used Xilinx 4000
series devices. This paper reports the performance evaluation for the carry-ripple, carry-completion, carry-lookahead, carry-
skip and carry-select adders.
3.1 Carry-ripple adder
The carry-ripple adder is one of the oldest and simplest adder designs. The n-bit carry-ripple adder is easily
implemented using the dedicated carry logic (Figure 1) which is one of the excellent features of Xilinx 4000 series devices.
The carry logic circuit is independent of the function generators, but shares some of the same input with the function
generators. Each CLB can implement approximately 40 different functions and carry modes.
Table 1 shows the performance parameters of the carry-ripple adders of sizes from 8 to 80 bits which are implemented
in different Xilinx 4000 series part-types.
SPIE Vol. 2914 /27
Configuration Memory Bit
UP
Figure 1. The dedicated carry logic in each CLB
Table 1. Implementation data of carry-ripple adders
AdderWidth 8 16 24 32 40 48 56 64 72 80
4003
C 6 10 14 18 22 26 30 34 38
T 19.4 27.7 29.1 35.1 41.4 49.3 59.6 61.1 67.3 *
11 859 361 245 158 110 78 56 48 39
4005
C 6 10 14 18 22 26 30 34 38 42
T 20.4 25 28.7 35 41.1 47.9 55.2 612 71.5 80.1
1 817 400 249 159 111 80 60 48 37 23
4008
C 6 10 14 18 22 26 30 34 38 42
T 21.7 25.5 30.8 34.4 40.7 49.7 52.2 61.7 67.7 76.1
11 768 392 232 162 112 77 64 48 39 31
4010
C 6 10 14 18 22 26 30 34 38 42
T 22.6 27.3 30.4 35.2 41 49.9 55.9 61.9 68.5 79.3
Ti 738 366 235 158 111 77 60 48 38 30
4013
C 6 10 14 18 22 26 30 34 38 42
T 25 26.8 31.5 37.8 41.7 52.4 58.4 64.4 71.3 79.7
1 667 373 227 147 109 73 57 46 37 30
Note: — can not be implementeu in one device.
3.2 Carry-completion adder
The carry-completion adder is obtained by modifying the carry-ripple adder to include the carry-completion detection
logic. Because this adder is asynchronous, the operational time of this adder varies according to the operands although the
worst-case operational time of this adder can still be linearly proportional to the length n of the adder. Table 2 shows the
performance parameters for carry-completion adders. In order to compare this algorithm with others, the average operational
times are taken rather than the worst-cast operational times of the adders.
Table 2. Implementation data of carry-completion adders
AdderWidth 8 16 24 32 40 48 56 64 72 80
4003
C 20 37 56 71 90 * * *
T 33.1 48.8 59.4 60.2 70 * * * * *
11 151 55 30 23 16 * * *
4005
C 20 37 56 74 91 110 127 144 161 180
1 37 49.6 57 61.7 74.2 71.9 80.1 84.6 87.8 104.1
11 135 54 31 22 15 13 10 8 7 5
4008
C 20 38 56 74 93 111 130 145 165 183
T 34.1 49.5 57.2 60.5 75.5 80.6 99 86.6 100.4 103.2
11 147 53 31 22 14 11 8 8 6 5
4010
C 20 38 53 73 91 111 126 146 163 182
T 36.5 49.6 60.8 57 70.7 71.5 100.4 96 101.4 110.5
11 137 53 31 24 16 13 8 7 6 5
4013
C 20 38 53 74 90 104 129 147 165 185
T 36.9 52.2 63.4 67.7 70.6 82.7 89 96.2 98.3 125.6
11 136 50 30 20 16 12 9 7 6 4
Note: — can not be implementeu in one device.
3.3 Carry-lookahead (CLA) adder
Theoretically, fundamental CLA adders can be constructed and always results in a constant addition time independent
ofthe width ofthe adder ifthe CLA unit can be freely expanded. Due to the rapid increase in the fan-out and fan-in required
28/SPIEVoI. 2914
to implement the carry generation and the cany propagation functions as the adder size increases. Such designs are not
practical but for the smallest adders. Therefore, large adders are generally implemented modifying the fundamental
approach with a multilevel structure or combining CLA algorithms with some others to reduce the fan-in and fan-out
difficulties. These approaches will usually result in additional delay, and the operational time of the adder will be no longer
a constant. The cost and performance data of FPGA implementations of multilevel CLA adders are shown in Table 3.
Table 3. Implementation data ofmultilevel CLA adders
AdderWidth 8 16 24 32 40 48 56 64 72 80
4003
C 30 63 90 100 * * * * * *
T 36.8 44.1 52.4 59.1 * * * * * *
n 91 36 21 17 * * * * * *
4005
C 30 64 89 121 144 182 * *
1 37.5 41 55.2 60.6 61.2 63.8 * *
11 89 38 20 14 11 9 *
*
4008
C 30 64 90 121 144 180 209 239 270 306
T 36.4 46.6 53.4 61.5 62 66.9 66.9 68 70.8 75.1
11 92 34 21 13 11 8 7 6 5 4
4010
C 30 64 90 121 144 182 210 240 266 303
T 37.6 44.5 57.6 58 65.7 64.2 67.6 66.8 79 74.9
11 89 35 19 14 11 9 7 6 5 4
4013
C 30 64 90 120 144 180 210 235 256 306
T 39.4 44.3 60.1 61.2 66 71.8 71.5 70.9 80.6 78.1
11 85 35 18 14 11 8 7 6 5 4
Note: — cannot b implemented in one device.
Table 4. Implementation data of carry-skip adders
Adder Width 8 16 24 32 40 48 56 64 72 80
4003
C 11 17 26 33 40 49 54 60 68
T 27.7 29.8 45.9 55.4 60.6 67.3 66.7 69.1 68.2 *
11 328 197 84 55 41 30 28 24 22 *
4005
C 11 18 27 34 40 49 52 60 67 74
T 27.3 31 49.7 53.4 59.2 63.1 64.5 69.5 68.7 78.4
1 333 179 75 55 42 32 30 24 22 17
4008
C 11 18 28 34 39 47 53 60 68 74
T 27.5 33.4 51.4 51.3 54.9 53.4 64.8 67.9 71.2 69.9
11 331 166 69 57 47 40 29 25 21 19
4010
C 10 18 27 34 40 48 53 60 68 74
T 27.7 32.6 48.6 51.4 60.1 55.3 67.8 69.5 70.8 75
TI 361 170 76 57 42 38 28 24 21 18
4013
C 10 18 27 35 40 49 53 60 68 74
T
TI
27
370
32.1
173
54.7
68
55.2
52
62
40
59.9
34
69.8
27
70
24
73.6
20
78.9
17
Note: — can not be implementea in one device.
3.4 Carry-skip adder
The carry-skip adder is built from the carry-ripple adder. An n-bit ripple adder is partitioned into blocks and carry-skip
logic is added to each block. The worst-case carry propagation delay in carry-skip adders highly depends on the
configurations of such adders. In this work, only one-level carry-skip adders are implemented. This is because the dedicated
carry logic is so efficient that there is likely to be little value beyond two or more skip levels for the adders as the carry-skip
SPIE Vol. 2914 129
frL6Z •Ion3!ds/O 
•suiqioj oq utsn iiio.ij sui MAO IEW UOtEZIUO1qOU1 Ut SOj U1 oq snooq S2UUIOJ rnssooid snouo1qDucs 
Jo SUISOp O4 O ojqinsun ouirxo t saui inj SflOuOlqouIsE S! '1OAOO1OJ O!1 soo:ouuuojid S1OM pU000S q2 pu soo S1OiS: puoos oq 'ouu iuo!Jdo sio sq ipp Uo!idWoO-i(UEO tII 
.s1o:DJ I0!!10 ai soo UE !o!Idw!s 1I4M DiO1D q pnoqs ! 'aIoJoJo!LL SI4 .IOJ SUOS1 JOfW 0142 O.t o!oI £UO po!pop s'-J3 JO Sfl 
At3OJJ pu omonis 'Iq!q S O1.I soo:ouauojid sqq 01_p UE SOO SOMO °PP idd!1-'(uo 
pnouoo uo 'sqdE1 pu soq OAOE Jo SSq 
•17 O 1fl!d U! UMOS O.I SIflULIOJ I!1!dw uipuodsauoo pu p uo!uourIdur S.IOM Oq jo sqdi OQL SWqLIOI UOtJJpp Ag i4 JO o3uuuojiod OflIA /PP!flb o posn oq uo snuuoj I0!!dW0 
osoqL 9 °iqj. OW! UMOS iJOAtOOI1OD OSU1 OIE1 soo:ooimtuioj.iod pi.m 'so Jo sau u swq.io uopp 
SflOU2A JO UOUEUOt.UjdW! S1OM Oq UIO.LJ O1 SEflULIOJ io!1!dwo oq 'xouo izuoo u swq.ioje uopp 
2UaIJJ!p Jo suoiuoinjdwt SVOdd iduio3 O iop.io UI sdXid iij woij ponsoi uq ioioqs iqs arc sd1wcd iius 1.UO.LJ paInsaI SLUI 1U0!10d0 'iouo uj UO1JJ!p 'PI!IS Si SaDIAp SVOdd JO sOdi4Jd UaIJJ!p U! puinodmi nb!uq3o UO!!ppE !J!0dS jo sioourcid ouuuoj.iod oq ç qj o jqj woij poou s! u 
SMOSDIVdI4JOD t 
oiAp Uo U! PUUJdW! oq OU uo — 
El 6! OZ E t Lc L8 9Z! MZ It 
IOf LOL E9c L19 LLc 88f 6t'f L VtE E6Z I 801 Z6 08 cc L 6E L £Z ti 3 
91 OZ 6E oc 19 68 I1 L/2 It 
oiot' c6c ccc 9•f7c 99j7 LZt7 si I9E i scz i. sot Z6 6L cc Lt' 6E !E EZ iL 3 j7j OZ i7Z j7 6i 69 c6 iE! ZL It 
SOOt' c•c9 rsc zc ci El' £LE IiE Z€ E9Z I LOT Z6 6L Sc Lt' 6E !E EZ t'I 3 
ci iz t'z t' ic Z9 16 LI 8/2 It 
coot' c9 c Lzc t7•SE Sit' it' Lcz a. 201 Z6 OS cc Lt' 6E i z t'i 3 
* oz t7Z t't' t'c t't 76 S! 68Z It 
soOt' * Loc ic Lit' z6E st' rc ci Lt'Z i. 
* 001 OS cc Lt' 6E I EZ t'i 3 
U j79 9S St' Of' ZE t'Z 91 S PP!MPPV 
SloppE DoIs-/c.Lrc3 jo UO!2UOtUIdU1I s iqi. 
SOZ!S U.IJJ!p q1M siopp DoIos-/cLrco .IOJ sio2ourc.rcd ouuuojiod oq SOA! 
c iqj. •pornarcx iqirnj oq pjnoqs 0!0I UO!U!1LUO1Op £LIO Oq 1OJ piic spoq oq uqM posn SUIq!1Oj pu 'sipp ooJos-c.Lrco oq Jo U!UOttI1d oip 'suo!Uowo1dw! VOdd jo UO!fl1OS juitdo uiqo o iop.io uj iopp qpM UOA! 
qo ioj SUO!.IflgUO3 U1JJ!p qIM SUO!U1flJdW! j.IOAS WO usoqo osic arc 1OtO qM oidwoo o posn 
p UO!UOWJdUI1 oq 'siopp dcs-Xi.rco jo cq I!'I iU0!!PU00 JO suoiciu 11E O!OI UO!cU!UJ1OTOp £Lrc qoq .ioj posn Si onb!uqoo Idd!1-'(uo 5po s!q2 uj pojq jo no-(Lrco UiUflOp o piic spojq 
Pl!M SLUflS IU0!!PU03 q: 
.IOUO O pSfl SU1q21O UOt!ppE OI.j O onp S.iopp 3jS-i(UO JO SUO!!.rcA XUTII oq 
i.mo oiq 'oJ UI S1Opp y-I3 ot pu Idd!1-'(.rco uiq snuoiduioo s paip!suoo SM iopp oJos-í(Lmo qJ. 
Jpp1 Is-Lu1D S 
POIOP!SU03 Si UOtEZiWtdO UO!1fliJUOO iq ipnw oq pjnoqs sipp qons jo oouctwopod oq T42 poAoiq S! •S.Ipp pU!SOp iiiwido jo SU1!2 uo!cJdo oq usaidai XI!.issou ou op sinsai osorp 'IoAMoH 
O!21 soo:ooucauojiod pu 0W! jo suuo U! °cc qpLM UA! qo .ioj SUO!1flJUOO UaIOJJ!p 1!M SUO!UOWOJdW! 
11OAOS JO p UO!U1.UOIdtU! soq soqs j oqc sIoAI drjs jo iquinu qM Io!nb oso1ou! II!M ip 
Table 6. The empirical evaluation formulas of different adders
Adder Cost Time
Performance
Cost
Carry-Ripple 0.5n + 2 0.77n + 19
1
0.4n2 + 1 in +38
Carry-Complete 23n + 1.7 (0.in + 12) log2
1
(0.23n2 +27.8n + 20.4) log2 n
CLA 3.8n + 2 132 log2 n
1
(26.4 + 502n)log2 n
Carry-Skip 0.9n + 5 (0.02n + 11)log2
1
(0.02n2 + iOn + 55) log2 n
Carry-Select 1.5n +2.1 (0.03n + 9.4)log2 n
1
(0.05n2 + 142n+19.7)log2 n
350
300
250
200
150
100
50
0
140
120
100
80
60
40
20
0
(a) The time graphs of implementation data
_-.__ Ripple
_p_ Corn
CIA
__ Skip
-_*__ Select
350
300
250
200
150
100
50
0
140
120
100
80
60
40
20
0
(b) The time graphs of empirical formulas
Figure 3. The operational time graphs for different adders
The fundamental CLA adder is theoretically the fastest adder. Because this adder has a high fan-in requirement, FPGA
implementations of such adders larger than 9-bit must be implemented in multilevel. This leads to a longer operational time
SPIEVoI. 2914/31
(0 'ci- CJ 0 CO C'J 0
'- c4 C') 'ci. 'j• LC) (0 N-
(a) The cost graphs of implementation data
co Co '4- CJ 0 00 (0 (\J 0
'- C\1 C) 'j- '- LC) (0 N- co
(b) The cost graphs of empirical formulas
Figure 2. The cost graphs for different adders
—.—- Ripple
Corn
C[A
.-.*—. Skip
—*— SeleJ
co co '3 c1 0 co Co c- 4 01 (% c) LC) (0 N- co co co 'ci- c'1 0 co o c'i 0'- c\J C) 'J- L() co r— co
700
600
____________ 500
—.— Ripple 400Com
CLA
300
. 200
—*—Skip
—*—Se
(a) The performance:cost ratio graphs of (b) The performance:cost ratio graphs
implementation data of empirical formulas
Figure 4. The performance:cost ratio graphs for different adders
The carry-skip adder is the next cheapest one in cost and the next best in performance:cost ratio. However, the
operational time ofthis adder compares less favorably to that of the carry-ripple adder. This makes the cany-skip adder not
the best choice of addition algorithms to be implemented in FPGA's. At the writing of this paper, this is not yet conclusive.
The configuration optimization of this adder has yet to be examined. When the adder has been optimally designed, it could
be a candidate for the best performing addition algorithm to be implemented in FPGA's.
From the tables and graphs above, the carry-select adder appears to be the most appropriate choice for FPGA's
implementations. This adder has the best operational time when the adder width is larger than 56 bits at the medium cost.
Although the cost does not appear to be very good, the algorithm does have the advantage of the regular structure and almost
same the performance:cost ratio as that of the carry-skip adder. Moreover, other algorithms can be easily applied in this
adder. When combined with other algorithms and after the further examination of partitioning the adder, the performance
parameter for this adder could still be significantly improved.
5. CONCLUSIONS
In this work, we have implemented the five classical fixed-point addition algorithms in the widely used XC4000 device.
An attempt has been made to model the performance ofthe adders with the empirical formulas derived from the data resulted
from their implementations. The following conclusion can be drawn:
For fast applications, the carry-skip adder and the carry-select adder appear to be the most appropriate solutions due
to their excellent performance:cost ratio and the reasonable cost. For low cost applications, the carry-ripple adder is the
most appropriate solution. Although the operational time is not as good as that of some others for larger adders, it does have
a very simple and regular structure and the highest performance:cost ratio which make it attractive for the FPGA's
applications and especially for the parallel applications. ® The CLA and carry-completion adders seem to be the worst
performers because of their high cost and low perfôrmance:cost ratio. In general, those algorithms which have regular
structures and can take advantage ofthe dedicated carry logic feature are suitable for FPGA implementations.
6. LIMITATIONS OF THE PRESENT EVALUATION
In practice, a successful evaluation is affected by a large number factors. In order to make a more accurate evaluation,
the following factors should be considered. The two-operand addition is the most fundamental operation in computers.
Therefore, the adders can be easily evaluated by isolating the hardware associated with two-operand adders from the rest of
32/SPIE Vol. 2914
than is expected. The implementation results show that it has the highest cost and the worst performance:cost ratio. The
major reasons for this are its irregular structure and that its inability to take advantage of the dedicated cany logic.
Consequently, the pure CLA algorithm is impractical for FPGA's applications unless it is combined with some other
algorithms which would give reasonable performance.
(0 C'J 0 (0 (0 C.J 0
c.sJ Ce) U) (0 N. (0
the computer. However, the adders are indeed tied to the rest of the computer. The compatibility of an adder with other
components of the computer such as the ALU, memory, control circuitry, etc., should be carefully considered.
Consequently, more work has to be done to link these results to the rest of a computing system. The results are obtained
by implementing adders in a single chip. When all components of a computer are considered and multiple chips are used, the
I/o resources should be taken account into the evaluation. Moreover, the present evaluation is based on the XC4000-
families, therefore, it is difficult to say that the results is equally valid for other FPGA devices. ® The regularity of an adder
should be taken account into the evaluation because it is important to both the cost and design. However, it is very difficult
to quantify the regularity in the evaluation.
REFERENCES
1 . T.G.Rauscher and A.K.Agrawala, "Dynamic problem-oriented redefinition of computer architecture via
microprogramming," IEEE Transactions on Computers, Vol.C-27, No.1 1, Nov. 1978, pp.33-40.
2. P.M.Athanas and H.F.Silverman, "Processor reconfiguration through instruction-set metamorphosis," Computer, March
1993, pp.1 1-19.
3 . J.Davidson, "FPGA implementation of a reconfigurable microprocessor," Proc. 1993 IEEE Custom Integrated Circuits
Conference, 1993, pp.3 .2.1-3.2.4.
4. C.Iseli and E.Sanchez, "Beyond superscalar using FPGAs," Proc. 1993 IEEE International Conference on Computer
Design, 1993, pp.486-490.
5. N.W.Bergmann, J.C.Mudge, and L.R.Cirroco, "FPGA-based custom computers," Proc. 11th Australian Microelectronic
Conference. Microelectronics, Meeting the Needs ofModern Technology, 1993, pp.197-202.
6. N.W.Bergmann and J.C. Mudge, "An analysis of FPGA—based custom computers for DSP applications," Proc. 1994
IEEE International Conference on Acoustics, Speech & Signal Processing, 1994, pp.11-513-516.
7. J.Schewel, M.Thornburg, and S.Casselman, "Transformable computers & hardware object technology," Proc. 9th
International Parallel Processing Symposium, 1995, pp.518-522.
8. Proceedings ofIEEE Workshop on FPGA 'sfor Custom Computing Machines, California, April, 1993.
9. Proceedings 0fIEEE Workshop on FPGA 'sfor Custom Computing Machines, California, April, 1994.
10. The Programmable Gate Array Data Book, Xilinx, San Jose, Calif., 1994.
1 1 . Kai Hwang, Computer Arithmetic-Principles, Architecture, and Design, John Wiley & Sons, New York, 1979.
12. Joseph J.F. Cavanagh, Digital Computer Arithmetic-Design and Implementation, McGraw-Hill, New York, 1984.
13. Israel Koren, Computer Arithmetic Algorithms, Prentice Hall, New Jersey, 1993
14. Amos R. Omondi, Computer Arithmetic Systems-Algorithms, Architecture and Implementations Prentice Hall,
Hertfordshire, 1994.
15. D.Salomon, "A design for an efficient NOR-gate only, binary-ripple adder with carry-completion-detection logic," The
Computer Journal, Vol.30, No.3, 1987, pp.283-285.
16. R.W.Doran, "Variants of an improved carry-lookahead adder," IEEE Transactions on Computers, Vol.C-37, No.9,
Sept. 1988,pp.lllO-lll3.
17. B.W.Y.Wei and C.D.Thompson, "Area-time Optimal adder design," IEEE Transactions on Computers, Vol.39, No.5,
May 1990, pp.666-675.
18. T.Lynch and E.E.Swartzlander, "A spanning tree carry-lookahead adder," IEEE Transactions on Computers, Vol.4 1,
No.8, Aug. 1992, pp.931-939.
19. B.S.Fagin, "Fast addition of large integers," IEEE Transactions on Computers, Vol.41, No.9, Sept. 1992, pp.1069-
1077.
20. A.Guyot, B.Hochet, and J.M.Muller, "A way to build efficient carry-skip adders," IEEE Transactions on Computers,
Vol.C-36, No.10, Oct. 1987, pp.1 144-1152.
21. P.K.Chan and M.D.F.Schlag, "Analysis and design of CMOS Manchester adders with variable carry-skip," IEEE
Transactions on Computers, Vol.39, No.8, Aug. 1990, pp.983-992.
22. P.K.Chan, M.D.F.Schlag, C.D.Thomborson, and V.G.Oklobdzija, "Delay optimization of carry-skip adders and block
carry-lookahead adders using multidimensional dynamic programming," IEEE Transactions on Computers, Vol.41,
No.8, Aug. 1992, pp.920-930.
23. A.Tyagi, "A reduced-area scheme for carry-select adders," IEEE Transactions on Computers, Vol.42, No.10, Oct.
1993, pp.1 163-1170.
SPIE Vol. 2914/33
