FPGA adders: performance evaluation and optimal design by Xing, S & Yu, WWH
Title FPGA adders: performance evaluation and optimal design
Author(s) Xing, S; Yu, WWH
Citation IEEE Design & Test of Computers, 1998, v. 15 n. 1, p. 24-29
Issued Date 1998
URL http://hdl.handle.net/10722/42983
Rights Creative Commons: Attribution 3.0 Hong Kong License
KNOWING THE COSTS AND DELAY
functions of fundamental building blocks
enables designers to optimize costs and
propagation delays of the larger units built
from them. A fundamental building block of
an arithmetic logic unit (ALU) is the binary
adder. In this article, we examine the im-
plementation of fixed-point adders on Xilinx
4000 series FPGA chips and cost and delay
functions of various addition algorithms. On
the basis of this study, we propose opti-
mization schemes for the design of FPGA
carry-skip and carry-select adders.
Adder cost and performance
Although many writers have discussed
VLSI fixed-point addition techniques,1-9 gate-
count and gate-delay unit models in their
studies are not useful for evaluating costs and
performance of FPGA adders. In our study,
we obtain operational times from Xilinx tim-
ing-simulation software instead of from the
gate-delay models used for fixed VLSI de-
signs. Instead of gate counts, we measure
cost as the number of configurable logic
blocks (CLBs) used. The performance-to-cost
ratio is cost divided by operational time. In
making comparisons, we rank techniques
with larger performance-cost ratios as having
better performance.
Carry-ripple adder. Adders differ in the
ways their carries propagate. The most ba-
sic is the carry-ripple adder. The Xilinx 4000
series’ dedicated carry logic designed for se-
quential carry propagation makes imple-
menting n-bit carry-ripple adders easy. We
have implemented carry-ripple adders rang-
ing in length from 8 to 80 bits on different
part types.
We have also implemented carry-
complete and carry-look-ahead adders. By
comparison with the ripple adder, their high
costs, complexities, and high fan-in and fan-
out requirements3,4 make them unsuitable
for implementation on FPGA devices.
The carry-ripple adder is a basic building
block of other adders. The timing models we
use in our optimization analyses of carry-skip
and carry-select adders are functions of the
carry-ripple adder’s worst-case operational
time.
Timing models. We partition an n-stage
adder into x blocks. Each block has n/x
stages. We define the timing models of block
k, where 1 £ k £ x, as follows:
n Carry-ripple delay, R(yk). The total delay
of a carry entering block k, rippling
through subsequent stages, and emerg-
ing from the block is
R(yk) = l 1 + d yk (1)
where yk is the number of stages in block
k, d is the incremental delay of a single
stage, and l 1 is a constant.
FPGA Adders: Performance
Evaluation and Optimal Design
FPGA ADDERS
24 0740-7475/98/$10.00 © 1998 IEEE IEEE DESIGN & TEST OF COMPUTERS
Delay models and cost
analyses developed for
ASIC technology are not
useful in designing and
implementing FPGA
devices. The authors
discuss costs and
operational delays of
fixed-point adders on
Xilinx 4000 series
devices and propose
timing models and
optimization schemes for
carry-skip and carry-
select adders.
SHANZHEN XING
WILLIAM W.H. YU
University of Hong Kong
JANUARY–MARCH 1998 25
n Carry-generate delay,
G(yk). The total delay of
a carry generated at the
first stage of the block,
rippling through subse-
quent stages, and emerg-
ing from the block is
G(yk) = 
l 2 + d (yk - 1) (2)
where l 2 is a constant.
n Carry-terminate delay,
T(yk). The total delay of
a carry entering the
block, rippling through
subsequent stages, and
terminating at the last
stage is the same as the
carry-generate delay;
T(yk) = G(yk).
Carry-skip adders
Observing that a carry may skip any addition stages on
certain addend and augend bit values, researchers devel-
oped the carry-skip technique to speed up addition in the
carry-ripple adder. One can construct a carry-skip adder by
partitioning a carry-ripple adder into blocks of the same or
various sizes and adding carry-skip logic to each block.
Carry-skip logic determines when a carry entering the block
may skip directly to the next block. Using a multilevel struc-
ture, carry-skip logic determines whether a carry entering
one block may skip the next group of blocks. Because mul-
tilevel skip logic introduces longer delays, it may be of little
value beyond three or four levels even with fixed VLSI tech-
nology. Implemented on Xilinx 4000 devices, a carry takes
much longer to propagate through multilevel carry-skip log-
ic than through efficient, dedicated carry logic. Therefore,
here we examine only single-level FPGA carry-skip adders.
Implementation of nonoptimized adders. The opera-
tional time of carry-skip adders greatly depends on their con-
figurations. We investigated the implementation of various
configurations of each adder of a given length and selected
those that gave the best performance data. Figure 1 shows
performance parameters of nonoptimized carry-skip adders
of sizes from 8 to 80 bits. Our results show that the nonopti-
mized carry-skip adder performs no better than the carry-
ripple adder, with a small increase in cost. However, it was
worth investigating whether the optimized carry-skip adder
would perform better than the carry-ripple adder.
Optimization analysis. Many researchers have exten-
sively studied optimization of carry-skip adders and have
suggested many timing models for fixed VLSI technology.2,6-8
As we have said, these models cannot be used for FPGA cir-
cuit analysis, which is based on CLB number and route de-
lay. Therefore, we have developed the following formulation
of a carry-skip timing model for optimization analysis.
Carry-skip delay. Factors such as carry-skip logic structure,
carry-skip logic mapping, and CLB placement and routing at
the implementation stage contribute to carry-skip delay. We
assumed that the carry-skip logic structure is the dominant
factor, and our implementation results later corroborated
that assumption.
To use the CLB array structure effectively, we propose the
general tree structure for a carry-skip logic block shown in
Figure 2. Each rectangle represents a function generator. Ck- 1
and Ck are the carry-in and carry-out of block k, and Ck
yk - 1 is
the carry-out produced by the block. The figure shows that
8 24 40 56 72
400
350
300
250
200
150
100
50
0
Bits
Pe
rfo
rm
an
ce
-c
os
t r
at
io
808 24 40 56 72
120
100
80
60
40
20
0
Bits
Op
er
at
io
na
l t
im
e 
(n
s)
808 24 40 56 72 80
350
300
250
200
150
100
50
0
Bits
Co
st
 (C
LB
s)
Ripple
Complete
Look-ahead
Skip
Select
Figure 1. Performance parameters of nonoptimized FPGA adders in comparison with carry-
ripple adders.
Operands A and B
Ck−1
yk−1
Ck
kC
Figure 2. Carry-skip logic structure.
FPGA ADDERS
26 IEEE DESIGN & TEST OF COMPUTERS
the multilevel carry-skip logic has 2yk + 2 inputs. The first lev-
el has approximately N1 = (2yk)/8 CLBs, and level i has Ni =
Ni - 1/8 CLBs. Thus, the total number of CLBs needed to im-
plement a yk-bit carry-skip logic structure is N = N1 + N2 + N3
+ … @ Ø 1 + (5/16)yk ø .
Carry-skip delay includes constant look-up table delay, in-
terconnect delay between function generators within CLBs,
and interconnect delay between CLBs. Of these, intercon-
nect delay between CLBs is dominant. Therefore, in calcu-
lating carry-skip delay, we consider only inter-CLB delay. On
the other hand, in CMOS technologies, interconnect delay is
linearly proportional to the square of the length of intercon-
nect lines.10 Therefore, the carry-skip delay expression is
S(yk) = l 3 + b l 2 (3)
where l 3 is a constant, b the coefficient of linearity, and l
the effective length of the interconnect lines.
From Figure 2, we can assume l is approximately propor-
tional to the number of carry-skip logic layers, and we ex-
press it as
l = g log4(1 + 3N) = g log4[4 + (15/16)yk] @ g log4(4 + yk) (4)
Substituting Equation 4 into Equation 3 gives
S(yk) = l 3 + a log42(4 + yk) (5)
where a = bg 2 is a constant coefficient, and l 3 is the delay
of carry-in and carry-out logic. Equation 5 shows that carry-
skip logic implemented on an FPGA device is neither a con-
stant nor a linear function as reported by other
researchers.2,6,7
Configuration optimization. An n-bit carry-skip adder par-
titioned into x blocks has a configuration Y = {y1, y2, …, yx- 1,
yx}, and n = å k=1x yk. The optimization problem is to determine
a configuration that gives the adder the minimum worst-case
carry propagation (operational) time. Figure 3 shows the
worst-case carry propagation path, which takes the carry
generated at the adder’s first stage the longest propagation
time to reach the final stage. The carry generated at the first
stage ripples through the first block, skips the subsequent
blocks to the last block, and ripples through to the last stage.
This worst-case propagation delay occurs when the operand
pair are 010101…101 and 001010…011, and Cin = 0. The
worst-case propagation time is the sum of the carry-generate
delay of the first block, the skip-logic delays of the subse-
quent (x - 2) blocks, and the carry-terminate delay of the fi-
nal block:
(6)
Equation 6 implies the following criteria:
1. x > 2. Otherwise, the carry-skip adder has no advantage
over the carry-ripple adder because there are no carry-
skip operations.
2. R(yk) ‡ S(yk). The carry-ripple delay of any block is
greater than or equal to the carry-skip delay.
3. R(yi) £ å k=ix - 1 S(yk) + T(yx). The carry-ripple delay of any
block is less than or equal to the delay of the carry to
skip the block and subsequent blocks and ripple
through the last block, terminating at the last stage. 
Minimizing Equation 6 is equivalent to minimizing the sizes
of the first and last blocks and the number of blocks. There
are two simple steps to obtaining the optimal configuration.
The first is to use criterion 2 to determine the last block’s min-
imum size. The second is to use criterion 3 to determine the
length of the other blocks recursively, starting from the sec-
ond-to-last block until all n bits have been assigned.
Comparisons. Here we compare analytical and imple-
mented performance improvements of the carry-skip adder
using the proposed optimization scheme. Constants and co-
efficients in Equations 1, 2, and 5 differ slightly among dif-
ferent types of parts. We base our results on the Xilinx
XC4010PQ160-5 chip.
Analytical comparisons. We derive the timing model pa-
rameters from the first-order approximation of implemented
operational times shown in Figure 1. The parameters of
Equations 1, 2, and 5 are l 1 = 13.5, l 2 = 12.5, l 3 = 11, d = 0.8,
and b = 1.3. Analytically, if the proposed adder is smaller
than 54 bits, it will have no speed advantage over the carry-
ripple adder. We obtained optimized configurations of
adders from 56 to 112 bits. The operational times of the the-
oretical adders are 1% to 24% faster than carry-ripple adders.
Implementation comparisons. Figure 4 shows implemen-
tation results for optimized FPGA adders. The results show
an improvement of from 0.6% to 16% for optimized adders
of sizes from 64 to 112 bits. The implementation results show
slightly less improvement than the analytical results but ac-
D G y S y T yw k x
k
x
= + +
=
−∑( ) ( ) ( )1
2
1
1 2 x−1 x
Figure 3. Worst-case carry propagation path of carry-skip
adders.
JANUARY–MARCH 1998 27
curately reflect theoretical
predictions. The 56-bit adder
has no speed advantage
over the carry-ripple adder.
Compared with the nonopti-
mized adders in Figure 1, the
proposed adders have simi-
lar costs but shorter opera-
tional times.
Carry-select adders
Each block of a carry-
select adder generates two
sets of sums, one for the car-
ry-in 0 and the other for 1.
The carry-select logic selects
the appropriate set of sums
upon arrival of the carry bit.
Implementation of nonoptimized adders. Various block
schemes and addition techniques can implement a carry-
select adder. We investigated different combinations of block
schemes and addition techniques before undertaking opti-
mization analyses. Again see Figure 1 for costs, operational
times, and performance-cost ratios of the best-performing set
of nonoptimized adders. Propagation delays of nonoptimized
carry-select adders are comparable to those of carry-ripple
adders. Next we investigated whether optimized carry-select
adders would have better operational times.
Optimization analysis. Our optimization studies of
carry-skip adders led us to expect that configuration opti-
mization would result in speed improvements. We propose
three carry-select adder configurations and optimization
schemes. The three configurations are the select-ripple-
ripple (S-R-R), select-skip-ripple (S-S-R), and select-skip-skip
(S-S-S) adders. The optimization schemes proposed here
use the timing models given earlier for carry-ripple and
carry-skip adders.
S-R-R adder. In each S-R-R adder block, two carry-ripple
chains produce conditional sums and carry-outs. Each
block’s carry-select logic selects the appropriate sum and
carry-out. Carry-select operation delay is a constant m inde-
pendent of block size. The problem of optimization is to de-
termine the adder configuration Y = {y1, y2, …, yx- 1, yx} for the
minimum worst-case propagation time. The criterion for de-
termining block sizes is that the carry-select signal must syn-
chronize with the conditional sums and carry-outs. This
criterion is
R(yk) = (k - 2) m + R(y1)  (for k ‡ 2) (7)
Substituting Equation 1 into Equation 7 with n = y1 + å k=2x yk
gives 
y1 = 1/x[n - (m / d )] - (m /2 d )x + 3m /2 d (8)
The worst-case propagation time is the ripple delay of
block 1 plus the propagation time of the carry-select signal:
T = (x - 1)m + R(y1) = (1/x)(d n - m ) + (m /2)x + [l 1 + (m /2)]
We find the nearest integer to x, where x = [2(d n - m )/m ]1/2,
by equating the derivative dT/dx to 0. This gives us the near-
optimal adder block number. The optimal solution is n ‡
(3m )/d for x ‡ 2.
S-S-R adder. We construct an S-S-R adder by adding carry-
skip logic to each S-R-R block. Carry-select logic selects the
conditional sums and carry-outs generated by the ripple
chains within each block. Carry-skip logic determines when
the carry entering the block may skip directly to the next
block. The worst-case carry propagation path is the same as
that shown in Figure 3. To benefit from the carry-skip tech-
nique and synchronize the arrivals of block carry-ins and
conditional sums, we must meet the following criteria:
1. x > 2. This criterion is the same as for the carry-skip
adder.
2. S(yk) £ R(yk) + m . Skip delay must be less than or equal
to the total of ripple delay and carry-select delay.
3. T(yx) £ å k=2x - 1 S(yk) + G(y1). The last block’s conditional
sum generation time must synchronize with arrival of
the carry-in. 
4. R(yk) £ T(yx). Any block’s ripple delay must be less than
Ripple
Skip
Select
48 64 80 96 112
180
160
140
120
100
80
60
40
20
0
Bits
Co
st
 (C
LB
s)
48 64 80 96 112
100
90
80
70
60
50
40
30
20
0
10
Bits
Op
er
at
io
na
l t
im
e 
(n
s)
48 64 80 96 112
80
70
60
50
40
30
20
0
10
Bits
Pe
rfo
rm
an
ce
-c
os
t r
at
io
Figure 4. Performance parameters of optimized FPGA adders in comparison with carry-ripple
adders.
FPGA ADDERS
28 IEEE DESIGN & TEST OF COMPUTERS
or equal to the last block’s conditional sum generation
time.
Substituting R(yk) and T(yx) into criterion 4 gives yk £ yx -
q , where q = 1 - [( l 1 - l 2)/d ]. This means that the last block
is the largest. The worst-case propagation time is
T = å k=2x - 1 S(yk) + G(y1) + m (9)
Criterion 2 determines the first block size. Setting yk = yx -
q for k = 2, 3, …, x - 1, Equation 9 becomes
T = (x - 2) S(yk) + G(y1) + m (10)
where yk = (n - y1 - q )/(x - 1). Minimizing Equation 10 gives
the optimal number of blocks and their sizes. (We use the
quasi-Newton search method to find the minima.)
S-S-S adder. The S-S-S adder block configuration is the
same as that of the S-S-R adder. We use an additional carry-
skip network to examine the adder carry-in and block carry-
outs and to determine all block carry-ins. The critical path
is the same for both adders. The criteria for the S-S-S adder
are the same as those for the S-S-R adder except for criteri-
on 3. To synchronize the conditional sums generated by the
last block and the carry-in by the carry-skip network, crite-
rion 3 for the S-S-S adder becomes
T(yx) £ S(n - y1 - yx) + G(y1) (13)
and the worst-case carry propagation time is
T = S(n - y1 - yx) + G(y1) + m (14)
The optimization process is the same as for the S-S-R
adder.
Comparisons. Now we summarize and compare theo-
retical and implementation results for carry-select adders.
For theoretical comparisons, parameters for Equations 1, 2,
and 5 are the same as those given for carry-skip adders: l 1 =
13.5, l 2 = 12.5, l 3 = 11, d = 0.8, and b = 1.3. The carry-select
logic delay m is 12 ns for the XC4010PQ160-5 device.
Analytical comparisons. Optimization analyses show that
S-R-R and S-S-R adders smaller than 48 bits, as well as S-S-S
adders smaller than 56 bits, have no speed advantage over
carry-ripple adders. Optimized theoretical S-R-R, S-S-R, and
S-S-S adders are 13% to 39%, 15% to 36%, and 7% to 43% faster
than carry-ripple adders.
Implementation comparisons. Operation speeds of the
three optimized adders larger than 48 bits are very similar.
The S-R-R adder is the most economical to implement, at a
cost about 50% less than the other two adders. The speed
improvement of the S-R-R adder over the carry-ripple adder
is 7% to 36%.
OUR STUDY REVEALS that performance parameters of a
specific addition technique implemented in different FPGA
part types differ slightly. In general, adders implemented on
lower-density parts have slightly shorter operational times
than adders on higher-density parts, but their costs are al-
most the same.
The results in Figures 1 and 4 show that the carry-ripple
adder has the lowest cost and highest performance-cost ra-
tio because of its highly regular structure and its effective
use of the CLB’s dedicated carry logic. Therefore, it is prefer-
able where simplicity and cost are critical factors. The carry-
complete and carry-look-ahead adders are the least suitable
for implementation on FPGA devices due to their high costs,
irregular structures, and inability to use the dedicated carry
logic.
The optimized carry-skip adder is second lowest in costs
and second best in performance-cost ratios. However, the
operational time of an optimized carry-skip adder smaller
than 56 bits compares less favorably to that of the carry-ripple
adder. Thus, the carry-skip adder is not the best choice for
designs using smaller adder units.
The optimized S-R-R adder has the lowest cost of the three
carry-select adders and hence the best performance-cost ra-
tio. For implementation of adders larger than 48 bits, the op-
timized S-R-R adder is the most appropriate choice. When it
is longer than 48 bits, it has the best operational time at a
reasonable cost increase over carry-ripple adders. Although
it is not cheaper to implement than the carry-skip adder, the
technique does have the advantages of regular structures
and almost the same performance-cost ratio as the carry-
skip adder.
Our results also show that the timing models proposed
here are valid and the optimization schemes are effective.
This article can serve as a useful reference for designing
FPGA adders. Designers can easily extend these schemes to
FPGA devices other than the Xilinx 4000s, provided the de-
vices have similar dedicated carry logic and structure.
References
1. A.R. Omondi, Computer Arithmetic Systems: Algorithms, Ar-
chitecture and Implementations, Prentice-Hall, Hertfordshire,
UK, 1994.
2. S. Majerski, “On Determination of Optimal Distributions of Car-
ry Skip in Adders,” IEEE Trans. Electronic Computers, Vol. EC-
16, No. 1, 1967, pp. 45-58.
JANUARY–MARCH 1998 29
3. D. Salomon, “A Design for an Efficient NOR-Gate-Only, Binary-
Ripple Adder with Carry-Completion-Detection Logic,” Com-
puter J., Vol. 30, No. 3, 1987, pp. 283-285.
4. R.W. Doran, “Variants of an Improved Carry-Lookahead
Adder,” IEEE Trans. Computers, Vol. C-37, No. 9, Sept. 1988, pp.
1110-1113.
5. B.W.Y. Wei and C.D. Thompson, “Area-Time Optimal Adder
Design,” IEEE Trans. Computers, Vol. 39, No. 5, May 1990, pp.
666-675.
6. A. Guyot, B. Hochet, and J.M. Muller, “A Way to Build Efficient
Carry-Skip Adders,” IEEE Trans. Computers, Vol. C-36, No. 10,
Oct. 1987, pp. 1144-1152.
7. P.K. Chan and M.D.F. Schlag, “Analysis and Design of CMOS
Manchester Adders with Variable Carry-Skip,” IEEE Trans. Com-
puters, Vol. 39, No. 8, Aug. 1990, pp. 983-992.
8. P.K. Chan et al., “Delay Optimization of Carry-Skip Adders and
Block Carry-Lookahead Adders Using Multidimensional Dy-
namic Programming,” IEEE Trans. Computers, Vol. 41, No. 8,
Aug. 1992, pp. 920-930.
9. A. Tyagi, “A Reduced-Area Scheme for Carry-Select Adders,”
IEEE Trans. Computers, Vol. 42, No. 10, Oct. 1993, pp. 1163-
1170.
10. N. Weste and K. Eshraghian, Principles of CMOS VLSI Design,
A Systems Perspective, Addison-Wesley, Reading, Mass., 1985.
Shanzhen Xing is working toward his PhD de-
gree in the Department of Industrial and Man-
ufacturing Systems Engineering, University of
Hong Kong. His research involves computer
arithmetic and reconfigurable computing
systems.
William W.H. Yu is a lecturer in the Depart-
ment of Industrial and Manufacturing Sys-
tems Engineering, University of Hong Kong.
Previously, he was an associate professor in
the Department of Electronic Engineering,
Chung Yuan University, Taiwan, where he
taught for 12 years. His research interests
include artificial neural networks and reconfigurable computing
systems. Yu is a member of the IEEE.
Send questions or comments about this article to William W.H.
Yu, Dept. of Industrial and Manufacturing Systems Engineering,
University of Hong Kong, Pokfulam Road, Hong Kong; wwhyu@
hkucc.hku.hk.
Call for articles
Submit to:
Yervant Zorian
Editor-in-Chief IEEE Design & Test
LogicVision, Inc.
101 Metro Drive, Third floor
San Jose, CA 95110
Phone: (408) 453-0146, Fax: (408) 573-0757
zorian@lvision.com
IEEE Design & Test seeks general-interest submissions in
the field of design and test for publication in upcoming
1998 and 1999 issues. 
Tutorials, case studies, summaries of work in progress,
and descriptions of recently completed works are most
welcome. Readers particularly look for practical articles
that help them on the job.
Interested authors should submit a 150-word abstract
or an outline to Editor-in-Chief Yervant Zorian at the ad-
dress below. Include your full contact information (au-
thor(s) name(s), postal address, e-mail address, and
phone and fax numbers). D&T does not accept papers un-
der consideration elsewhere. Check D&T’s home page at
http://computer.org/dt for author guidelines.
.
