An Accurate and Efficient Method to Calculate the Error Statistics of
  Block-based Approximate Adders by Wu, Yi et al.
1An Accurate and Efficient Method to Calculate
the Error Statistics of Block-based
Approximate Adders
Yi Wu†, You Li†, Xiangxuan Ge†, and Weikang Qian, Member, IEEE
Abstract—Adders are key building blocks of many error-tolerant applications. Leveraging the application-level error tolerance, a
number of approximate adders were proposed recently. Many of them belong to the category of block-based approximate adders. For
approximate circuits, besides normal metrics such as area and delay, another important metric is the error measurement. Given the
popularity of block-based approximate adders, in this work, we propose an accurate and efficient method to obtain the error statistics of
these adders. We first show how to calculate the error rates. Then, we demonstrate an approach to get the exact error distribution,
which can be used to calculate other error characteristics, such as mean error distance and mean square error.
Index Terms—Approximate computing, Approxiamte adders, Error rate, Error distribution
F
1 INTRODUCTION
A PPROXIMATE circuits implement an approximate ver-sion of the target function. They are very attractive
for error-tolerant applications, such as image processing,
multimedia, and machine learning, since they can trade off
accuracy for improvement in circuit area, delay, and power
consumption [1].
Given the importance of adders in building many error-
tolerant applications, approximate adders have attracted a
lot of research effort recently. A number of approximate
adders were proposed in literature [2], [3], [4], [5], [6], [7],
[8], [9]. Generally speaking, there are two design types. The
first type replaces the 1-bit full adders at less significant bit
positions by a simpler but inaccurate module. For example,
an OR gate is used in the Low-Part-OR adder [2] and an
approximate mirror adder is used in [3] to substitute the
accurate 1-bit full adder at the lower bit positions. The more
significant part is intact. As a result, the reduction in delay
and power consumption is limited. Furthermore, this kind
of designs could have a high error rate.
The second design type is known as block-based approxi-
mate adder [10]. It divides the entire adder into a number of
blocks. The calculation of the sum in each block exploits the
carry speculation mechanism. It is based on the observation
that long carry chain rarely happens in the addition of ran-
dom inputs. Therefore, the carry chain for calculating each
sum bit can be truncated at a middle bit position. Although
the carry-in signal for calculating each sum bit could be
wrong, the critical path delay and the power consumption
are reduced. The majority of the available approximate
adders fall into this category. Examples include the Almost
Correct Adder [4], the Error Tolerant Adder Type II [5],
and the Carry-Skip Approximate Adder [6]. (More details
of these approximate adders will be discussed in Section 2).
This type of adder generally has a low error rate. A previous
work [10] also showed that it can achieve minimum mean
error distance under some conditions. Given its popular-
ity and optimality, we focus on block-based approximate
adders in our work.
• Yi Wu, You Li, Xiangxuan Ge, and Weikang Qian are with the University
of Michigan-Shanghai Jiao Tong University Joint Institute, Shanghai Jiao
Tong University, Shanghai, China, 200240.
• E-mail: {eejessie, you.li, gxx, qianwk}@sjtu.edu.cn
• †These authors contributed equally.
To measure the performance of an approximate adder,
besides the normal metrics such as area, delay, and power
consumption, we also need error statistics, including error
rate, mean error distance, and mean square error. Although
the error statistics of each proposed approximate adder were
analyzed by its authors, the method is ad hoc, depending on
the structure of the proposed adder. Furthermore, not every
error metric is given. For example, for some approximate
adders, only error rates were studied, but neither mean error
distance nor mean square error was reported.
To address the above problems, three recent works [10],
[11], [12] proposed general methods for obtaining important
error metrics of block-based approximate adders. [11] pro-
posed an analytical framework to evaluate the error statis-
tics of three types of adders: the Almost Correct Adder, the
Equal Segmentation Adder, and the Error Tolerant Adder
Type II. However, its results are just estimates and different
approaches are applied to evaluate different types of adders.
A more general framework was proposed in [10], which
can be applied to a wider range of approximate adders.
It gives an accurate analysis on the mean error distance,
but the error rate and the mean square error are still
estimates, not exact results. In some cases, the estimates
could be more than 7% away from the accurate values.
Also, none of [10] and [11] showed how to obtain the exact
error distributions. A recent work [12] provided an accurate
method to obtain the exact error distributions. However, its
aim is to provide an approach to analyze a more general
type of approximate adder, which also includes block-based
approximate adder. As a result of making the approach
more general, the method sacrifices efficiency and hence
needs a long runtime in analyzing adders of large sizes.
Given that the block-based approximate adders are one of
the most common designs and have good performance,
in this work, we specifically target at this type of adders
and propose an efficient method to obtain their exact error
distributions.
Our method works under the assumption that the inputs
to the approximate adder are uniformly distributed. We
make this assumption because:
1) Many approximate adders are not just designed for
a specific application. To estimate the overall perfor-
mance of an approximate adder over a range of ap-
ar
X
iv
:1
70
3.
03
52
2v
1 
 [c
s.E
T]
  1
0 M
ar 
20
17
2plications, it is reasonable to assume the inputs are
uniformly distributed.
2) For many specific applications, the inputs are more or
less close to uniform distribution.
3) A number of previous works ( [2], [5], [6], [7], [10], [11],
[13], [14]) in analyzing the error statistics of approxi-
mate adders also make this assumption.
Under the assumption of uniform distribution, we first
show an accurate and efficient method to calculate the
error rate. Using this technique, we further demonstrate
an approach to calculate the exact error distribution, by
which we can easily obtain other error metrics of interest,
such as mean error distance and mean square error. Com-
pared to the previous analytical approaches [10], [11], our
method is able to generate the exact error distributions and
gives exact error characteristics. Compared to the previous
work [12], our method is much faster because it exploits
the specific properties on the error patterns of the block-
based approximate adders. Indeed, our method achieve the
theoretical lower bound on the asymptotic runtime. The
proposed method provides an important aid to designers
in choosing a proper approximate adder.
In summary, the main contributions of our works are as
follows:
1) We propose an accurate and efficient method to obtain
the error rate of the block-based approximate adders.
2) We propose an efficient method to obtain the exact error
distribution of the block-based approximate adders,
which can be used to get other error characteristics
accurately. As we will show, the asymptotic runtime
of our method reaches the theoretical lower bound.
3) We apply our method to obtain the error distributions
of several previously proposed approximate adders.
The results demonstrate the existence of special pat-
terns on the error distributions of these approximate
adders. We give explanation for these special patterns.
4) We demonstrate experimentally the proposed method
is much faster and more accurate than the Monte Carlo
sampling method to obtain error statistics, especially
when the error probability is very small.
The remainder of the paper is organized as follows.
Section 2 introduces the general model of the block-based
approximate adder and links it to some previously proposed
approximate adders. Section 3 discusses some preliminaries.
Section 4 and Section 5 show our method to calculate error
rate and error distribution, respectively. Section 6 presents
the experimental results. Finally, Section 7 concludes the
paper.
2 BLOCK-BASED APPROXIMATE ADDERS
In this section, we first overview some previously pro-
posed approximate adders. Then we demonstrate that all of
them can be viewed as block-based approximate adder.
The adder proposed in [15] and the Almost Correct
Adder (ACA) [4] has a structure shown in Fig. 1. Each sum
bit is produced by a full adder, which takes a carry-in as
input. However, the carry-in is only obtained from l bits
before the sum bit instead of all the remaining bits. The
Error Tolerant Adder Type II (ETA-II), as shown in Fig. 2,
divides the entire n-bit adder into m sub-adders of equal bit
length of k = n/m [5]. The carry-in signal to each sub-adder
is produced from the previous k bits by a carry generator,
while the carry-in to each carry generator is a logic 0,
which essentially truncates the carry chain. Other block-
based approximate adders include Speculative Carry Select
FA
s
*
5
FA
s
*
6s
*
7
CG: Carry Generator
FA: Full Adder
SA SA
FA
c
*
5
c
*
6
c
*
7
CG
a6
b6
a5
b5
a4
b4
a3
b3
a2
b2
a7
b7
···
···
···
CG
CG
···
k=1 l=3
Fig. 1: Almost Correct Adder proposed in [4].
SA ···
···
···
CG: Carry Generator
SA: Sub-Adder
SA
a7k-1:6k
b7k-1:6k
SA SA
CG CG CG
c
*
7 c
*
6 c
*
5
a6k-1:5k
b6k-1:5k
a4k-1:3k
b4k-1:3k
a5k-1:4k
b5k-1:4k
s
*
6k-1:5k s
*
5k-1:4k
k l=k
s
*
7k-1:6k s
*
4k-1:3k
Fig. 2: Error Tolerant Adder Type II proposed in [5].
Adder (SCSA) [13], Error Tolerant Adder Type IV (ETA-
IV) [16], and Carry-Skip Approximate Adder (CSAA) [6]. At
the behavior level, given the same sub-adder length, SCSA
is the same as ETA-II. ETA-IV is similar to ETA-II except that
the length of its sub-adder is twice that of its carry generator.
CSAA is also similar to ETA-II except that the length of its
carry generator is twice that of its sub-adder.
All of the above-mentioned approximate adders can
be viewed as block-based approximate adder [10], with a
general model shown in Fig. 3. Notice that a similar model
was also proposed in [17]. In the model, the number of
bits is n. Assume the two addends of an n-bit adder are
A = an−1 . . . a0 and B = bn−1 . . . b0. The approximate
sum is denoted as S∗ = s∗n−1, . . . s
∗
0. The carry-out of the
approximate adder is denoted as c∗o. In the model, the sum
is divided into a number of blocks which are calculated
separately. All the blocks are with the same bit length of
k, where k is a factor of n. Let m = n/k, which represents
the number of blocks. In the model, the sum bits in the i-th
(0 ≤ i ≤ m − 1) block, s∗(i+1)k−1 . . . s∗ik, are generated by a
sub-adder, which takes a speculated carry-in c∗i . In the ideal
case, the carry-in should be produced by all the input bits
lower than the position ik. However, for the block-based
approximate adder, c∗i is produced by a truncated carry
generator of length l, as shown in Fig. 3. The carry generator
also takes a speculated carry-in c∗carry,i. For most of the
approximate adders, c∗carry,i = 0. Thus, in the following
analysis, we will assume that c∗carry,i = 0, although our
analysis is equally applicable to the case where c∗carry,i = 1.
Note that for all 0 ≤ i ≤ bl/kc, the speculated carry-in
c∗i is produced by all the remaining input bits and hence,
it is always correct. The carry-out c∗o of the entire adder is
produced by the leftmost sub-adder.
In summary, a block-based approximate adder is char-
acterized by 3 parameters, n, k, and l, where n is the
adder size, k is the block size, and l is the number of bits
used in the carry generator. All of the above-mentioned
approximate adders are just special cases of this model. For
3CG: Carry Generator
SA: Sub-Adder
c
*
o
s
*
5k-1:4ks
*
6k-1:5ks
*
7k-1:6k
c*carry,5
c*carry,6
c*carry,7
a6k-1:5k
b6k-1:5k
c
*
7
···
···
···
a7k-1:6k
b7k-1:6k
a5k-1:4k
b5k-1:4k
CG
CG
CG
SASASA
c
*
6
c
*
5
a6k-1:6k-l
b6k-1:6k-l
a5k-1:6k-l
b5k-1:6k-l
a4k-1:4k-l
b4k-1:4k-l
···
Fig. 3: General model of a block-based approximate adder.
example, the adder proposed in [15] and ACA correspond
to the case where k = 1. ETA II and SCSA correspond to
the case where k = l. ETA-IV corresponds to the case where
k = 2l. CSAA corresponds to the case where l = 2k.
Generally speaking, block-based approximate adder can
be extended to one with different sub-adder lengths and
carry generator lengths for different blocks, which is the
one considered in [12]. However, many existing approxi-
mate adders have the same sub-adder length and the same
carry-generator length for all the blocks, since for this kind
of design, no particular block dominates the critical path
length. Given this fact, in this work, we target at the block-
based approximate adders. For these adders, as we will
show in Section 5.3, our method to obtain the exact error
distribution has the lowest asymptotic runtime. However,
it should be pointed out that our proposed method can be
easily extended to handle the more general situation, at the
cost of increasing the asymptotic runtime.
3 PRELIMINARIES
In this section, we show some preliminaries that will be
used in our later analysis. We assume the inputs A and B
are uniformly distributed in [0, 2n − 1] and the carry-in to
the entire adder is 0.
3.1 Propagate, Generate, and Kill Signals
For each bit i (0 ≤ i ≤ n−1) in the adder, the propagate,
generate, and kill signals of that bit are defined as
pi = ai ⊕ bi, gi = ai · bi, ki = a¯i · b¯i.
If gi = 1, the carry-out of bit i is 1 regardless what the
carry-in to bit i is. Similarly, if ki = 1, the carry-out of bit i
is always 0. If pi = 1, then the carry-in of bit i propagates to
the carry-out of that bit.
For the i-th (0 ≤ i ≤ m−1) block of the adder, we define
the group propagate, generate, and kill signal as
Pi =
∏(i+1)k−1
j=ik pj ,
Gi =
∑(i+1)k−1
j=ik gj
∏(i+1)k−1
d=j+1 pd,
Ki =
∑(i+1)k−1
j=ik kj
∏(i+1)k−1
d=j+1 pd.
If Gi = 1, the carry-out of the i-th block will always be
the correct value of 1 no matter its carry-in is correct or not.
Similarly, if Ki = 1, the carry-out will always be the correct
value of 0. Only when Pi = 1 does the carry-out depend on
the carry-in signal, which could be wrong. The probabilities
of the above signals being one are
P (Pi)
4
= P (Pi = 1) =
1
2k
, (1)
P (Gi)
4
= P (Gi = 1) =
1
2 − 12k+1 , (2)
P (Ki)
4
= P (Ki = 1) =
1
2 − 12k+1 . (3)
3.2 Typical Error Measurement
Typical error measurement of an approximate arithmetic
circuit includes error rate, mean error distance, and mean
square error.
First, we define the error distance (ED) as the difference
of the approximate sum S∗ and the accurate sum S, i.e.,
ED = |S∗ − S| .
The error rate (ER) is defined as the percentage of input
combinations for which the approximate adder produces a
wrong result, i.e., a non-zero error distance. Mathematically,
it is calculated as
ER = P (ED 6= 0).
Mean error distance (MED) is the mean value of all the
error distances. Mean square error (MSE) are the mean value
of the squares of all the error distances. Mathematically, they
are calculated as
MED = E[ED] =
∑
EDi∈Ω
EDiP (EDi),
MSE = E[ED2] =
∑
EDi∈Ω
ED2i P (EDi),
where Ω is the set of all error distances.
4 CALCULATING ERROR RATE
In this section, we show the method to calculate the
error rate. It will be used later to obtain the exact error
distribution.
As can be seen in Fig. 3, the result of the approximate
adder is correct if and only if all the speculated carry-in
c∗i ’s (0 ≤ i ≤ m − 1) are correct. To calculate the error
rate, we define the event Di as the event in which all the
speculated carry-ins c∗i , c
∗
i−1, . . . , c
∗
0 are correct. We denote
the probability of the event Di to occur as di. Thus, the
error rate equals 1− dm−1. In the following, we will derive
a recursive formula to calculate di. We denote the correct
carry-in to the i-th sub-adder as ci. Since the recursive
formula differs based on whether or not the carry generator
length l is a multiple of the block size k, we will distinguish
these two cases.
4.1 Carry Generator Length l is a Multiple of Block Size
k
Define t = lk . Then t represents the number of blocks
covered by each carry generator. To illustrate our proposed
method, we use t = 2 as an example. In this case, the
carry generator includes 2 blocks of inputs. For 0 ≤ i ≤ 2,
since all the remaining input bits are used to generate the
speculated carry-ins c∗i , c
∗
i−1, . . . , c
∗
0, the event Di always
happens. Thus, di = 1, for all 0 ≤ i ≤ 2.
Next we consider di for i > 2. The event Di depends on
the inputs from block i−1 to 0. Our idea to calculate di is to
consider the inputs block by block from block i− 1 to block
0.
4G P G
P GP
ci   =1   ci−1   =?
c*i =1   c*i−1=?
K P K
(a)                    (b)                           (c)                                  (d)  
(e)                                                     (f)
block    i−1                   i−1                    i−1       i−2                    i−1        i−2
ci  =0   ci−1   =?
c*i=0   c*i−1=?
ci   =1  ci−1   =1 ci−2   =? 
c*i =1  c*i−1=1 c*i−2=?
ci   =0  ci−1   =0 ci−2   =? 
c*i =0  c*i−1=0 c*i−2=?
P KP
block    i−1       i−2       i−3                          i−1       i−2       i−3
ci   =0  ci−1   =0 ci−2   =0  ci−3   =? 
c*i =0  c*i−1=0 c*i−2=0  c*i−3=?
ci   =1  ci−1   =1 ci−2   =1  ci−3   =? 
c*i =0  c*i−1=1 c*i−2=1  c*i−3=?
Fig. 4: The speculated carry-ins and the correct carry-ins for
different input cases under the situation that l = 2k. (a) Gi−1 =
1; (b) Ki−1 = 1; (c) Pi−1 = Gi−2 = 1; (d) Pi−1 = Ki−2 = 1; (e)
Pi−1 = Pi−2 = Gi−3 = 1; (f) Pi−1 = Pi−2 = Ki−3 = 1.
First consider the inputs at block i−1. They satisfy either
Gi−1 = 1, Ki−1 = 1, or Pi−1 = 1. If the inputs satisfy
that Gi−1 = 1, as shown in Fig. 4(a), the speculated carry-
in c∗i = 1, which is equal to the correct carry-in ci. Thus,
the event Di happens if and only if c∗i−1, . . . , c
∗
0 are correct,
which means the inputs from block i−2 to 0 make the event
Di−1 happen. Therefore, we have
P (Di, Gi−1 = 1) = P (Gi−1)P (Di−1). (4)
The same conclusion holds if the inputs at block i − 1
satisfy that Ki−1 = 1, as shown in Fig. 4(b). Therefore, we
have
P (Di,Ki−1 = 1) = P (Ki−1)P (Di−1). (5)
If the inputs at block i − 1 satisfy Pi−1 = 1, then we
further consider the inputs at block i−2. We also distinguish
them into three cases: Gi−2 = 1, Ki−2 = 1, and Pi−2 =
1. In the case where Gi−2 = 1 (shown in Fig. 4(c)) and
the case where Ki−2 = 1 (shown in Fig. 4(d)), since the
carry generator covers 2 blocks of inputs, the speculated
carry-ins c∗i and c
∗
i−1 are equal to the correct carry-ins ci
and ci−1, respectively. Thus, the event Di happens if and
only if c∗i−2, . . . , c
∗
0 are correct, which means the inputs from
block i− 3 to 0 make the event Di−2 happen. Therefore, we
have
P (Di, Pi−1 = Gi−2 = 1) = P (Pi−1)P (Gi−2)P (Di−2), (6)
P (Di, Pi−1 = Ki−2 = 1) = P (Pi−1)P (Ki−2)P (Di−2). (7)
If the inputs at blocks i− 1 and i− 2 satisfy none of the
above cases, then we must have Pi−1 = Pi−2 = 1. Now we
further consider the inputs at block i−3. We distinguish the
following three cases:
1) The inputs satisfy that Gi−3 = 1, as shown in Fig. 4(e).
In this case, the correct carry-ins ci = ci−1 = ci−2 = 1.
However, the speculated carry-in c∗i is 0, since it is
produced by a carry generator that covers inputs at
blocks i−1 and i−2 and that carry generator propagates
the speculated carry-in to the carry generator, which is
assumed to be 0. Since c∗i 6= ci, the event Di cannot
happen in this case. Therefore, we have
P (Di, Pi−1 = Pi−2 = Gi−3 = 1) = 0. (8)
2) The inputs satisfy that Ki−3 = 1, as shown in Fig. 4(f).
In this case, the correct carry-ins ci = ci−1 = ci−2 = 0.
By the same argument used in Case 1, the speculated
carry-in c∗i must be 0. Since each carry generator covers
two blocks of inputs, the speculated carry-in c∗i−1 =
c∗i−2 = 0. Therefore, c
∗
j = cj for j = i, i − 1, i − 2.
Thus, the event Di happens if and only if c∗i−3, . . . , c
∗
0
are correct, which means the inputs from block i− 4 to
0 make the event Di−3 happen. Therefore, we have
P (Di, Pi−1 = Pi−2 = Ki−3 = 1)
= P (Pi−1)P (Pi−2)P (Ki−3)P (Di−3).
(9)
3) The inputs satisfy that Pi−3 = 1. In this case, we
continue to look at the inputs at block i− 4.
We continue the above analysis. By the same reasoning
used for the case where Pi−1 = Pi−2 = Gi−3 = 1, we have
that for any 3 < j ≤ i, if the inputs from block i−1 to block
i−j satisfy that Pi−1 = · · · = Pi−j+1 = Gi−j = 1, the event
Di cannot happen, since c∗i = 0 6= ci = 1, i.e.,
P (Di, Pi−1 = Pi−2 = · · · = Pi−j+1 = Gi−j = 1) = 0. (10)
On the other hand, if the inputs satisfy that Pi−1 = · · · =
Pi−j+1 = Ki−j = 1, the event Di happens if and only if the
inputs from block i−j−1 to 0 make the event Di−j happen.
Therefore, we have
P (Di, Pi−1 = Pi−2 = · · · = Pi−j+1 = Ki−j = 1)
= P (Pi−1)P (Pi−2) · · ·P (Pi−j+1)P (Ki−j)P (Di−j). (11)
Finally, there is a remaining input case which satisfies
that Pi−1 = · · · = P0 = 1. In this case, the speculated carry-
ins are c∗i−1 = · · · = c∗0 = 0 and the correct carry-ins are
ci−1 = · · · = c0 = 0. Thus, the event Di happens. Therefore,
we have
P (Di, Pi−1 = Pi−2 = · · · = P0 = 1)
= P (Pi−1)P (Pi−2) · · ·P (P0). (12)
Notice the probability that the event Di occurs can be
calculated as
P (Di) = P (Di, Gi−1 = 1) + P (Di,Ki−1 = 1)
+ P (Di, Pi−1 = Gi−2 = 1) + P (Di, Pi−1 = Ki−2 = 1)
+ · · ·+ P (Di, Pi−1 = · · · = P1 = G0 = 1)
+ P (Di, Pi−1 = · · · = P1 = K0 = 1)
+ P (Di, Pi−1 = · · · = P1 = P0 = 1).
Given Eq. (4)–(12), we can calculate di as follows:
di = P (Di) = P (Gi−1)P (Di−1) + P (Ki−1)P (Di−1)
+ P (Pi−1)P (Gi−2)P (Di−2) + P (Pi−1)P (Ki−2)P (Di−2)
+ P (Pi−1)P (Pi−2)P (Ki−3)P (Di−3)
+ P (Pi−1)P (Pi−2)P (Pi−3)P (Ki−4)P (Di−4)
+ · · ·+ P (Pi−1)P (Pi−2) · · ·P (P1)P (K0)P (D0)
+ P (Pi−1)P (Pi−2) · · ·P (P0)
=P (Pi−1) · · ·P (P0) +
2∑
j=1
P (Pi−1) · · ·P (Pi−j+1)P (Gi−j)di−j
+
i∑
j=1
P (Pi−1) · · ·P (Pi−j+1)P (Ki−j)di−j .
For an arbitrary t, we can generalize the above analysis
and obtain that for 0 ≤ i ≤ t, di = 1 and for i > t,
di = P (Pi−1) · · ·P (P0)
+
t∑
j=1
P (Pi−1) · · ·P (Pi−j+1)P (Gi−j)di−j
+
i∑
j=1
P (Pi−1) · · ·P (Pi−j+1)P (Ki−j)di−j .
5The above equation gives a recursive way to calculate di.
The values of P (Pi), P (Gi), P (Ki) are calculated by Eq. (1),
(2), and (3), respectively. The time complexity to obtain
dm−1 and the resultant error rate is O(m2).
4.2 Carry Generator Length l is not a Multiple of Block
Size k
Define t = b lk c. Each carry speculative chain is com-
posed of t blocks of bit length k and a remaining block of
bit length k′ = l − tk. Note that 0 < k′ < k.
For each block of k bits, we divide it into the left group
of k′ bits and the right group of k − k′ bits. The major
difference in the analysis here is that we need to consider
the propagate/generate/kill state of both the left group and
the right group in a block. For the left group of k′ bits in the
i-th (0 ≤ i ≤ m− 1) block of the adder, we define its group
propagate, generate, and kill signal as
PLi =
∏(i+1)k−1
j=(i+1)k−k′ pj ,
GLi =
∑(i+1)k−1
j=(i+1)k−k′ gj
∏(i+1)k−1
d=j+1 pd,
KLi =
∑(i+1)k−1
j=(i+1)k−k′ kj
∏(i+1)k−1
d=j+1 pd.
The probabilities of the above signals being one are
P (PLi)
4
= P (PLi = 1) =
1
2k′ ,
P (GLi)
4
= P (GLi = 1) =
1
2 − 12k′+1 ,
P (KLi)
4
= P (KLi = 1) =
1
2 − 12k′+1 .
Similarly, we define the group propagate, generate, and
kill signal of the right group of k − k′ bits of the i-th
block. These signals are denoted as PRi, GRi, and KRi,
respectively. Their probabilities of being one are calculated
similarly as above.
To illustrate the method to calculate di, we also use t = 2
as an example. For 0 ≤ i ≤ 2, it is not hard to see that di = 1.
Thus, we only focus on i > 2.
The basic idea to obtain the probability di is same as
what we use to handle the case where l is a multiple of k,
i.e., examining inputs block by block from block i − 1 to
block 0. For the inputs satisfying either (1) Gi−1 = 1, (2)
Ki−1 = 1, (3) Pi−1 = Gi−2 = 1, or (4) Pi−1 = Ki−2 = 1,
the conclusions are the same as what we have when l = 2k.
Fig. 4(a)-(d) show the correct carry-ins and the speculated
carry-ins for these four cases, respectively.
If the inputs at blocks i− 1 and i− 2 satisfy none of the
above cases, then we have Pi−1 = Pi−2 = 1. We further
consider the inputs at block i− 3. The difference compared
to the situation where l is a multiple of k is that we need to
distinguish the following five cases:
1) The inputs satisfy thatGLi−3 = 1, as shown in Fig. 5(a).
Since each carry generator covers two blocks of inputs
plus the left group of the third block, the speculated
carry-ins c∗i = c
∗
i−1 = c
∗
i−2 = 1. In the correct adder,
the carry-ins ci = ci−1 = ci−2 = 1. Thus, the event Di
happens if and only if c∗i−3, . . . , c
∗
0 are correct, which
means the inputs from block i − 4 to 0 make the event
Di−3 happen. Therefore, we have
P (Di, Pi−1 = Pi−2 = GLi−3 = 1)
= P (Pi−1)P (Pi−2)P (GLi−3)P (Di−3).
(13)
2) The inputs satisfy that KLi−3 = 1, as shown in
Fig. 5(b). The analysis is same as Case 1 and we have
the same conclusion: the event Di happens if and only
P P P P
block   i−1      i−2       i−3                    i−1       i−2       i−3
(a)                                              (b)
P P P P
(c)                                              (d)
P P P P
block   i−1       i−2       i−3       i−4                    i−1        i−2       i−3       i−4
PP
(e)                                                           (f)
P P P P PP
(g)                                                           (h)
GL KL
ci   =1  ci−1   =1 ci−2   =1  ci−3   =? 
c*i =1  c*i−1=1 c*i−2=1  c*i−3=?
ci   =0  ci−1   =0 ci−2   =0  ci−3   =? 
c*i =0  c*i−1=0 c*i−2=0  c*i−3=?
ci   =0  ci−1   =0 ci−2   =0  ci−3   =? 
c*i =0  c*i−1=0 c*i−2=0  c*i−3=?
ci   =1  ci−1   =1 ci−2   =1  ci−3   =? 
c*i =0  c*i−1=1 c*i−2=1  c*i−3=?
PL GR PL KR
block   i−1      i−2       i−3                    i−1       i−2       i−3
GL KL
ci   =1  ci−1   =1 ci−2   =1  ci−3   =1 ci−4   =? 
c*i =0  c*i−1=1 c*i−2=1  c*i−3=1 c*i−4=?
ci   =0  ci−1   =0 ci−2   =0  ci−3   =0 ci−4   =? 
c*i =0  c*i−1=0 c*i−2=0  c*i−3=0 c*i−4=?
block   i−1       i−2       i−3       i−4                    i−1        i−2       i−3       i−4
GRPL PL KR
ci   =0  ci−1   =0 ci−2   =0  ci−3   =0 ci−4   =? 
c*i =0  c*i−1=0 c*i−2=0  c*i−3=0 c*i−4=?
ci   =1  ci−1   =1 ci−2   =1  ci−3   =1 ci−4   =? 
c*i =0  c*i−1=0 c*i−2=1  c*i−3=1 c*i−4=?
Fig. 5: The speculated carry-ins and the correct carry-ins for
different input cases under the situation that l = 2k + k′ where
0 < k′ < k. (a) Pi−1 = Pi−2 = GLi−3 = 1; (b) Pi−1 = Pi−2 =
KLi−3 = 1; (c) Pi−1 = Pi−2 = PLi−3 = GRi−3 = 1; (d) Pi−1 =
Pi−2 = PLi−3 = KRi−3 = 1.
if the inputs from block i− 4 to 0 make the event Di−3
happen. Therefore, we have
P (Di, Pi−1 = Pi−2 = KLi−3 = 1)
= P (Pi−1)P (Pi−2)P (KLi−3)P (Di−3).
(14)
3) The inputs satisfy that PLi−3 = GRi−3 = 1, as shown
in Fig. 5(c). In this case, the correct carry-ins ci = ci−1 =
ci−2 = 1. However, the speculated carry-in c∗i is 0, since
it is produced by a carry generator that covers inputs at
block i− 1, block i− 2, and the left group of block i− 3
(see Fig. 5(c) and that carry generator propagates a 0.
Since c∗i 6= ci, the event Di cannot happen. Therefore,
we have
P (Di, Pi−1 = Pi−2 = PLi−3 = GRi−3 = 1) = 0.
(15)
4) The inputs satisfy that PLi−3 = KRi−3 = 1. In this
case, as shown in Fig. 5(d), c∗j = cj for j = i−1, i−2, i−
3. Thus, the event Di happens if and only if the inputs
from block i − 4 to 0 make the event Di−3 happen.
Therefore, we have
P (Di, Pi−1 = Pi−2 = PLi−3 = KRi−3 = 1)
= P (Pi−1)P (Pi−2)P (PLi−3)P (KRi−3)P (Di−3).
(16)
5) The inputs satisfy that Pi−3 = 1. In this case, we need
to continue checking the inputs at block i− 4.
Now we consider the remaining case where Pi−1 =
Pi−2 = Pi−3 = 1. We further check the inputs at block
i − 4. Similarly, they can be divided into the five cases
as shown above. The situations corresponding to the first
four cases are shown in Fig. 5(e)-(h), respectively. Since
Pi−1 = Pi−2 = Pi−3 = 1 and each carry generator covers
two blocks of inputs plus the left group of the third block,
the speculated carry-in c∗i = 0. In Case 1 (i.e., GLi−4 = 1)
6and Case 3 (i.e., PLi−4 = GRi−4 = 1), since the correct
carry-in ci = 1 6= c∗i , the event Di cannot happen. Therefore
we have
P (Di, Pi−1 = Pi−2 = Pi−3 = GLi−4 = 1) = 0, (17)
P (Di, Pi−1 = Pi−2 = Pi−3 = PLi−4 = GRi−4 = 1) = 0. (18)
In Case 2 (i.e., KLi−4 = 1) and Case 4 (i.e., PLi−4 =
KRi−4 = 1), c∗j = cj for j = i−1, . . . , i−4. Thus, the event
Di happens if and only if the inputs from block i − 5 to 0
make the event Di−4 happen. Therefore we have
P (Di, Pi−1 = Pi−2 = Pi−3 = KLi−4 = 1)
= P (Pi−1)P (Pi−2)P (Pi−3)P (KLi−4)P (Di−4).
(19)
P (Di, Pi−1 = Pi−2 = Pi−3 = PLi−4 = KRi−4 = 1)
= P (Pi−1)P (Pi−2)P (Pi−3)P (PLi−4)P (KRi−4)P (Di−4).
(20)
In Case 5, the inputs at block i− 4 satisfy that Pi−4 = 1; we
continue analyzing the inputs of the next block in the same
way.
By the same reasoning used for the case where Pi−1 =
Pi−2 = Pi−3 = 1, we have that for any 4 < j ≤ i, if the
inputs from block i − 1 to block i − j satisfy either Pi−1 =
· · · = Pi−j+1 = GLi−j = 1 or Pi−1 = · · · = Pi−j+1 =
PLi−j = GRi−j = 1, the event Di cannot happen. If the
inputs satisfy either Pi−1 = · · · = Pi−j+1 = KLi−j = 1 or
Pi−1 = · · · = Pi−j+1 = PLi−j = KRi−j = 1, the event Di
happens if and only if the inputs from block i − j − 1 to 0
make the event Di−j happen. The equations to calculate the
probabilities are similar to Eq. (19) and (20). Finally, for the
remaining input case in which Pi−1 = · · · = P0 = 1, the
event Di happens.
By the above discussion, for the example in which t = 2,
we can calculate di as follows:
di = P (Di) = P (Gi−1)P (Di−1) + P (Ki−1)P (Di−1)
+ P (Pi−1)P (Gi−2)P (Di−2) + P (Pi−1)P (Ki−2)P (Di−2)
+ P (Pi−1)P (Pi−2)P (GLi−3)P (Di−3)
+
i∑
j=3
[P (Pi−1) · · ·P (Pi−j+1)P (KLi−j)P (Di−j)
+ P (Pi−1) · · ·P (Pi−j+1)P (PLi−j)P (KRi−j)P (Di−j)]
+ P (Pi−1)P (Pi−2) · · ·P (P0)
Noticing that P (KLj) + P (PLj)P (KRj) = P (Kj), we
can further simplify the above equation as
di =
2∑
j=1
P (Pi−1) · · ·P (Pi−j+1)P (Gi−j)di−j
+
i∑
j=1
P (Pi−1) · · ·P (Pi−j+1)P (Ki−j)di−j
+ P (Pi−1)P (Pi−2)P (GLi−3)di−3
+ P (Pi−1) · · ·P (P0).
For an arbitrary t, we can generalize the above analysis
and obtain that for 0 ≤ i ≤ t, di = 1 and for i > t,
di =
t∑
j=1
P (Pi−1) · · ·P (Pi−j+1)P (Gi−j)di−j
+
i∑
j=1
P (Pi−1) · · ·P (Pi−j+1)P (Ki−j)di−j
+ P (Pi−1) · · ·P (Pi−t)P (GLi−t−1)di−t−1
+ P (Pi−1) · · ·P (P0).
The above equation gives a recursive way to calculate di.
The time complexity to obtain dm−1 and the resultant error
rate is O(m2).
5 OBTAINING ERROR DISTRIBUTION
In this section, we show our method to obtain the accu-
rate error distribution. For a given input combination, the
output error is defined as err = c∗os
∗
n−1 . . . s
∗
0− cosn−1 . . . s0,
where co, sn−1, . . . , s0 are the correct outputs of the adder.
By the definition, the error distance of an error is equal
to the absolute value of the error. The error distribution
is the probability distribution of the error distance. Similar
as characterizing the error rate, we will discuss based on
whether or not the carry generator length l is a multiple
of the block size k in Section 5.1 and 5.2, respectively.
Finally, we will discuss the time complexity of our method
in Section 5.3.
5.1 Carry Generator Length l is a Multiple of Block Size
k
5.1.1 Error Pattern and Probability
We define t = l/k. In this section, we first analyze the
error pattern, which is the binary representation of the error
distance. The questions we ask are: 1) when does an error
happen? 2) if an error happens, what does its error pattern
look like?
To analyze the error pattern, we divide the m blocks into
a number of partitions. Each partition is composed of two
sequences of blocks, where all the blocks in the left sequence
have their group propagate signals as 0 and all the blocks in
the right sequence have their group propagate signals as 1.
For the leftmost (rightmost) partition, it is possible that all of
its blocks have their group propagate signals as 1 (0). Fig. 6
gives an example of 10 blocks divided into 4 partitions.
0123456789
0111101001
I1111 II10IIIIV
Block：
P：
Partition：
Fig. 6: An example of 10 blocks divided into 4 partitions.
We now study the approximate sum in each partition.
Suppose the partition starts at block ib and ends at block
ie (ib > ie). The blocks ib, ib − 1, . . . , im (ie < im ≤ ib)
have their group propagate signals as 0 and the blocks im−
1, . . . , ie have their group propagate signals as 1. We have
the following two claims on errors in the partition.
Lemma 1
There exists an error in the approximate sum of a partition if
and only if Gie−1 = 1 and im − ie ≥ t.
Proof: “if” part: The carry speculative chain for the
carry-in c∗im is composed of the bits in blocks im−1, . . . , im−
t. Since Pim−1 = · · · = Pim−t = 1, we have c∗im = 0.
However, because Pim−1 = · · · = Pie = 1 and Gie−1 = 1
, the correct carry-in cim = 1. Thus, the sum at block im is
incorrect.
“only if” part: we prove by contradiction. If the condition
is not satisfied, then there are two cases: (1) Kie−1 = 1 and
(2) Gie−1 = 1 and im − ie < t. For Case 1, the speculated
carry-ins c∗im , . . . , c
∗
ie
are all equal to the correct value of
0. For Case 2, the speculated carry-ins c∗im , . . . , c
∗
ie
are all
equal to the correct value of 1. For both cases, since Pib−1 =· · · = Pim = 0, the speculated carry-ins c∗ib , . . . , c∗im+1 are
7P GP
block                     im+1    im=ie+3   ie+2      ie+1         ie        ie−1           
G/K PG/K
correct carry-in  1/0       1/0          1          1           1          1
spec. carry-in     1/0       1/0          0          0           1          1
correct sum (s)          x         y+1      0000    0000     0000
approx. sum (s*)       x           y         1111    0000     0000
|err| = s−s*             0000     0000     0001    0000     0000
t = 2partition
Fig. 7: Illustration of the error pattern |err| in a partition.
all correct. Therefore, for both cases, the speculated carry-
ins c∗ib , . . . , c
∗
ie
are all correct and hence, the approximate
sum of the partition is correct.
Lemma 2
If there exists an error in the approximate sum of a partition,
the approximate sum from block (ie + t− 1) to block ie, i.e.,
s∗(ie+t)k−1 . . . s
∗
iek
, is correct and the approximate sum from
block ib to block (ie+t), i.e., s∗(ib+1)k−1 . . . s
∗
(ie+t)k
, is smaller
than the correct sum by 1.
Proof: If there exists an error in the approximate sum
of a partition, by Lemma 1, Pie+t−1 = · · · = Pie = 1, and
Gie−1 = 1. Since each carry speculative chain is composed
of t blocks, the speculated carry-ins c∗ie+t−1, . . . , c
∗
ie
equal
the correct value of 1. Therefore, the approximate sum from
block (ie + t− 1) to block ie is correct.
Since Pim−1 = · · · = Pie+t−1 = 1, the correct carry-
ins cim = . . . = cie+t = 1. However, due to the trunca-
tion of the carry speculative chain, the speculated carry-ins
c∗im = . . . = c
∗
ie+t
= 0. Furthermore, considering the fact
that Pim−1 = · · · = Pie+t = 1, the correct sums of blocks
im−1, . . . , ie+t are k 0’s in the binary representation, while
the approximate sums of these blocks are k 1’s in the binary
representation. Since Pim = 0, cim = 1, and c
∗
im
= 0, the
approximate sum of block im is smaller than its correct sum
by 1. Finally, since Pib−1 = · · · = Pim = 0, the speculated
carry-ins c∗ib , . . . , c
∗
im+1
are all correct and hence the sums
of blocks ib, . . . , im + 1 are all correct. Given the above
characterization of the sums of blocks ib, . . . , ie + t, we can
see that the approximate sum from block ib to block (ie + t)
is smaller than the correct sum by 1.
An illustration of Lemma 2 is shown in Fig. 7. In this
example, we assume the block size is k = 4, the number
of blocks covered by each carry generator is t = 2, im =
ie + 3, and ib = im + 1. Then, the correct carry-ins cie+3 =
cie+2 = cie+1 = cie = 1, while the speculated carry-ins
c∗ie+3 = c
∗
ie+2
= 0 and c∗ie+1 = c
∗
ie
= 1. For block im,
since its group propagate signal Pim 6= 1, we have c∗im+1 =
cim+1. Therefore, the approximate sums at blocks im + 1,
ie + 1, and ie are correct. The approximate sum at block im
is smaller than the correct sum by 1. The approximate sum
at block ie+2 is (1111)2, while the correct sum at that block
is (0000)2. Overall, the approximate sum from block im + 1
to block ie + 2 is smaller than the correct value by 1, while
the approximate sum from block ie + 1 to block ie is correct.
By Lemma 2, the error of an approximate adder is always
negative and the 1’s in an error pattern can only occur at the
bit position ik, where i is an integer. Furthermore, Lemma 2
implies that if a 1 appears at a position ik, then the block
i must be inside a partition whose rightmost block is block
P −G
block                        i      i−1    i−2    i−3    i−4            j       j−1    j−2    j−3          
− P
t = 2
− P P G
correct carry-in            1        1       1                                 1       1        1
spec. carry-in               0        1       1                                 0       1        1
correct sum (s)               0000  0000               y+1               0000  0000
approx. sum (s*)            0000  0000                 y                  0000  0000
|err| = s−s*            0001 0000 0000  0000  0000        0001 0000  0000
...
...
Fig. 8: Illustration of the situation in which a 1 appears at bit
position jk of the error pattern |err| and the next 1 appears at
bit position ik (i > j). Here we assume k = 4 and t = 2.
i − t. By Lemma 1, we should have Pi−1 = · · · = Pi−t = 1
and Gi−t−1 = 1. Therefore, we have the following claim.
Theorem 1
A 1 in an error pattern can only occur at the bit position ik,
where i is an integer. It occurs at the bit position ik if and
only if Pi−1 = · · · = Pi−t = 1 and Gi−t−1 = 1.
Based on the above theorem, we also have the following
corollary.
Corollary 1
If there are two adjacent 1’s in an error pattern, where the left
one is at position ik and the right one is at position jk, then
j < i− t.
Proof: Since a 1 occurs at the bit position ik of the error
pattern, by Theorem 1, we have Gi−t−1 = 1. As a result, 1
cannot occur at bit positions (i − 1)k, . . . , (i − t)k, because
otherwise, due to Theorem 1, we have Pi−t−1 = 1, which
contradicts with Gi−t−1 = 1. Therefore, the next 1 in the
error pattern must be at the bit position jk, where j < i− t.
Now we study the following question: if there is a 1 at
position jk of an error pattern, under what situation does
the next 1 on the left of jk appear at a position ik where
i > j + t? Fig. 8 illustrates this problem using an example
with k = 4 and t = 2. As shown in the figure, in the error
pattern, there is a 1 at bit position jk and the next 1 is at bit
position ik. Since the correct carry-in to the bit jk is 1, the
correct sum from block i− 3 to block j is
s(i−2)k−1 . . . sjk = a(i−2)k−1 . . . ajk + b(i−2)k−1 . . . bjk + 1,
(21)
where a(i−2)k−1 . . . ajk and b(i−2)k−1 . . . bjk denote the in-
puts from block i− 3 to block j.
On the other hand, as shown in Fig. 8, except the last bit
of block j, all the other bits in blocks i− 3, . . . , j of the error
pattern |err| are 0. Therefore, the correct sum from block
i − 3 to j is larger than the approximate sum by 1. Given
Eq. (21), the approximate sum is
s∗(i−2)k−1 . . . s
∗
jk = a(i−2)k−1 . . . ajk+b(i−2)k−1 . . . bjk. (22)
Furthermore, the approximate sum from block i − 3
to j is equal to the sum produced by an approximate
adder of the same type with (i − j − 2) blocks, where the
inputs are a(i−2)k−1 . . . ajk and b(i−2)k−1 . . . bjk. For clarity,
we call this approximate adder imaginary approximate adder.
Given Eq. (22), we have that the inputs a(i−2)k−1 . . . ajk and
b(i−2)k−1 . . . bjk should make the imaginary approximate
adder produce the correct sum. This means all the specu-
lated carry-ins c∗i−j−3, . . . , c
∗
0 of the imaginary adder should
8be correct. In other words, the inputs at blocks i−4, . . . , j of
the adder in Fig. 8 should make the Event Di−j−3 defined
in Section 4 happen. For a general t, the inputs at blocks
i− t− 2, . . . , j should make the Event Di−t−j−1 happen.
Therefore, given that there is a 1 at position jk of an error
pattern, the next 1 appears at a position ik where i > j + t
if and only if Pi−t = · · · = Pi−1 = 1, Gi−t−1 = 1, and the
inputs at blocks i − t − 2, . . . , j make the event Di−t−j−1
happen. The probability is
ei,j
4
= P (Pi−1) · · ·P (Pi−t)P (Gi−t−1)di−t−j−1. (23)
The rightmost position in the error pattern where a 1
can occur is the position (t + 1)k. The probability that the
rightmost 1 is at a bit position ik (i ≥ t + 1) is
ei,0
4
= P (Pi−1) · · ·P (Pi−t)P (Gi−t−1)di−t−1.
The leftmost position in the error pattern where a 1 can
occur is the position (m − 1)k. Given that there is a 1 at
position ik (i ≤ m − 1), there is no 1 on the left of position
ik if and only if the inputs at blocks m − 2, . . . , i make the
event Dm−i−1 happen. The probability is dm−i−1.
Given the above discussion, we finally get the following
claim on error pattern and probability.
Theorem 2
A mk-bit binary representation is a possible error pattern if
and only if there exist numbers t+ 1 ≤ i1 < i2 < · · · < ir ≤
m−1 such that for all 1 ≤ j ≤ r−1, we have ij+1−ij > t and
the ones in the binary representation appear at bit positions
i1k, . . . , irk. The probability of that error pattern is
dm−ir−1 ·
r∏
j=1
eij ,ij−1 ,
where i0 = 0.
5.1.2 Algorithm to Obtain Error Distribution
Using Theorem 2, we can enumerate all possible error
patterns and calculate their probabilities. This gives us a
method to obtain the error distribution.
However, the enumeration-based method can be further
optimized by saving common multiplications of probabili-
ties and common additions of error components appeared in
the calculation. We propose a divide-and-conquer method
to do this. The idea is to grow a partial error pattern
and its probability into the complete error pattern and the
probability. The procedure uses a recursive helper function
ED(i, j, ePar, pPar) shown in Algorithm 1. The argument
i refers to the bit position ik, which is under check and can
potentially have a 1. j refers to the bit position jk, which is
the nearest bit position on the right of ik in the error pattern
that has a 1. ePar is the partial error distance and pPar is
the partial probability.
When checking the block i, we have the following two
cases:
1) There is a 1 at the bit position ik in the error pattern.
Then, the nearest bit position on the left of ik that
could have a 1 is (i + t + 1)k. We continue to check
that bit position. The error magnitude 2ik is added to
the partial error distance and the partial probability is
multiplied with the value ei,j . This is shown in Line 6
of Algorithm 1.
2) There is no 1 at the bit position ik in the error pattern.
Then, we further check the bit position (i + 1)k. The
partial error distance and probability do not change.
This is shown in Line 7 of Algorithm 1.
Algorithm 1 ED(i, j, ePar, pPar): a recursive helper function
to obtain the error distribution.
1: if i ≥ m then
2: pPar = pPar · dm−j−1;
3: Print out ePar and pPar;
4: return ;
5: end if
6: ED(i+ t+ 1, i, ePar + 2ik, pPar · ei,j);
7: ED(i+ 1, j, ePar, pPar);
8: return ;
When i ≥ m, ePar becomes the complete error distance.
The probability of ePar should be pPar multiplied by
dm−j−1, which accounts for the probability that there is no
1 on the left of position jk in the error pattern. The initial
function call is ED(t + 1, 0, 0, 1), because the rightmost
position in an error pattern where a 1 can occur is the
position (t + 1)k, the partial error distance is 0, and the
partial probability is 1.
5.2 Carry Generator Length l is not a Multiple of Block
Size k
This situation is slightly more complicated. However,
the overall analysis flow is similar to the case where l is
a multiple of k. Similar as Section 4.2, we define t = b lk c
and k′ = l− tk. In this situation, Lemma 1 is changed to the
following one.
Lemma 3
There exists an error in the approximate sum of a partition if
and only if either of the following two events happens:
1) PLie−1 = GRie−1 = 1 and im− ie ≥ t, where PLi and
GRi are defined in Section 4.2.
2) GLie−1 = 1 and im − ie ≥ t + 1.
Proof: “if” part: If the event 1 happens, then the
speculated carry-in c∗ie+t = 0, because the carry generator
for the carry-in c∗ie+t covers blocks ie + t− 1, . . . , ie plus the
left group of block ie− 1, and their group propagate signals
are all 1. On the other hand, the correct carry-in cie+t = 1.
Thus, the sum at block (ie + t) is incorrect. By a similar
argument, if the event 2 happens, then the speculated carry-
in c∗ie+t+1 = 0, while the correct carry-in cie+t+1 = 1. Thus,
the sum at block (ie + t + 1) is incorrect.
“only if” part: we prove by contradiction. If neither of the
two events happens, then there are four cases: (1) KLie−1 =
1, (2) PLie−1 = KRie−1 = 1, (3) PLie−1 = GRie−1 = 1 and
im − ie ≤ t− 1, and (4) GLie−1 = 1 and im − ie ≤ t. For all
cases, the speculated carry-ins c∗ib , . . . , c
∗
ie
are all correct and
hence, the approximate sum of the partition is correct.
As a result, Theorem 1 is changed to the following one.
Theorem 3
A 1 in an error pattern can only occur at the bit position ik,
where i is an integer. It occurs at the bit position ik if and
only if either of the following two events happens:
1) Pi−1 = · · · = Pi−t = 1, PLi−t−1 = 1, and GRi−t−1 =
1.
2) Pi−1 = · · · = Pi−t−1 = 1 and GLi−t−2 = 1.
Given the above theorem, we can see that if two adjacent
1’s in an error pattern are at positions ik and jk (i > j),
respectively, then i − j > t. For the case where l is not a
multiple of k, we still use ei,j to denote the probability that
given there is a 1 at position jk in an error pattern, the next
1 appears at a position ik, where i > j + t. However, to
9obtain ei,j , we need to distinguish between i = j+ t+1 and
i > j + t + 1.
Given that there is a 1 at position jk in an error pattern,
the next 1 appears at a position ik where i = j + t + 1 if
and only if Pi−1 = · · · = Pi−t = 1, PLi−t−1 = 1, and
GRi−t−1 = 1. The probability is
ei,i−t−1
4
= P (Pi−1) · · ·P (Pi−t)P (PLi−t−1)P (GRi−t−1).
Given that there is a 1 at position jk in an error pattern,
the next 1 appears at a position ik where i > j+ t+ 1 if and
only if either of the following two events happens:
1) Pi−1 = · · · = Pi−t = 1, PLi−t−1 = 1, GRi−t−1 = 1,
and the inputs at blocks i− t− 2, . . . , j make the event
Di−t−j−1 happen.
2) Pi−1 = · · · = Pi−t−1 = 1, GLi−t−2 = 1, and the
inputs at blocks i−t−3, . . . , j make the event Di−t−j−2
happen.
The probability that one of the above two events happens is
ei,j
4
= P (Pi−1) · · ·P (Pi−t)P (PLi−t−1)P (GRi−t−1)di−t−j−1
+ P (Pi−1) · · ·P (Pi−t−1)P (GLi−t−2)di−t−j−2.
The rightmost position in an error pattern where a 1
can occur is the position (t + 1)k. It can be shown that the
probability that the rightmost 1 is at the position (t + 1)k is
et+1,0
4
= P (Pt) · · ·P (P1)P (PL0)P (GR0),
while the probability that the rightmost 1 is at the position
ik (i > t + 1) is
ei,0
4
= P (Pi−1) · · ·P (Pi−t)P (PLi−t−1)P (GRi−t−1)di−t−1
+ P (Pi−1) · · ·P (Pi−t−1)P (GLi−t−2)di−t−2.
The probability that given there is a 1 at position ik (i ≤
m−1) in the error pattern, there is no 1 on the left of position
ik is still dm−i−1, the same value as we have for the case
where l is a multiple of k.
For the case where l is not a multiple of k, Theorem 2 still
holds, which characterizes the error pattern and probability.
The only difference is that the values ei,j are replaced with
the new ei,j obtained in this section. Finally, the procedure
to obtain the error distribution is similar to that used for the
case where l is a multiple of k. Again, the only change is to
use the new values of ei,j obtained in this section.
5.3 Time Complexity Analysis
(3, 0)
(6, 3)
(9, 6) (7, 3)
(10, 7) (8, 3)
(4, 0)
(7, 4)
(10, 7) (8, 4)
(5, 0)
(8, 5) (6, 0)
(9, 6) (7, 0)
(10, 7) (8, 0)
[6,3]
[7,3] [3] [7,4] [4] [5]
[6]
[7] Ø
Fig. 9: Function calls of Algorithm 1 for a 32-bit CSAA with
k = 4 and l = 8.
As shown in Algorithm 1, the time complexity of the
proposed method is proportional to the number of calls of
the function ED. Since ED is a recursive function and it
calls itself twice in each function call, the process of the
function calls can be modeled as a binary tree. For example,
the binary tree representing the function calls for a 32-bit
CSAA with k = 4 and l = 8 is shown in Fig. 9. Each pair
(x, y) in this figure represents one function call of the form
ED(i, j, ePar, pPar), where x is equal to i and y is equal to
j. The internal nodes in this binary tree are represented by
square nodes, while the leaf nodes are represented by round
nodes. At the leaf, there is no further recursive function call,
since i ≥ m = n/k = 32/4 = 8. The vector under each
leaf node represents the error pattern produced at that leaf.
A number j in the vector indicates that there is a 1 at the
bit position jk of the error pattern. For example, the vector
[6, 3] represents an error pattern with 1s at bit positions 24
and 12. Note that the rightmost leaf corresponds to an all-
zero error pattern. Thus, the vector is ∅. We can see that the
number of error patterns is equal to the number of leaves
in the binary tree. Denote the number of error pattern as N .
Then the total number of nodes in the binary tree is 2N − 1.
Since each call of ED corresponds to a node in the binary
tree, the runtime of the proposed method is proportional to
2N−1, which implies that the time complexity is O(N). On
the other hand, any algorithm for obtaining the error distri-
bution must have runtime complexity at least Ω(N), since
it must produce all the error patterns and the associated
probabilities. Therefore, the proposed algorithm achieves
the theoretical lower bound on the asymptotic runtime.
To further quantify the runtime, we analyze the number
of error patterns for a given block-based approximate adder.
We show how we derive the number for a block-based
approximate adder using an example. Consider an adder
with n = 32, k = 4, and l = 4. The number of blocks
m = n/k = 8 and t = l/k = 1. The following is a list of
all the non-zero error patterns, represented using the vector
representation shown in Fig. 9:
[7],
[6],
[5], [7, 5],
[4], [6, 4], [7, 4],
[3], [5, 3], [7, 5, 3], [6, 3], [7, 3],
[2], [4, 2], [6, 4, 2], [7, 4, 2], [5, 2], [7, 5, 2], [6, 2], [7, 2].
Note that the error patterns in the same row above have
their lowest 1 at the same bit position. For example, in the
4-th row, all the three error patterns have their lowest 1 at
the bit position 4k = 16.
If we denote the number of error patterns with lowest 1
at bit position ik as xm−i, we have x1 = x2 = 1, x3 = 2,
x4 = 3, x5 = 5, and x6 = 8 for this example. As we showed
in Sections 5.1, the lowest bit position in an error pattern
where a 1 can occur is (t+1)k. Therefore, the maximal index
of the sequence xi is m− t− 1 = 6. Based on the numerical
values of xi’s, we find that xi’s satisfy the following pattern:
xi = 1, (24)
for all 1 ≤ i ≤ 2, and
xi = xi−2 + xi−3 + · · ·+ x1 + 1, (25)
for all 2 < i ≤ 6. Indeed, the above two equations make
sense. For any 1 ≤ i ≤ t + 1 = 2, if an error pattern has
its lowest 1 at bit position (m − i)k, then by Theorem 2, it
cannot have any 1s on the left of that bit position. Therefore,
the number of error patterns with the lowest 1 at bit position
(m−i)k is just 1, and hence Eq. (24) holds. For any 2 < i ≤ 6,
the set of error patterns with the lowest 1 at bit position
(m− i)k can be partitioned into (i− 1) subsets. The subset
10
j (1 ≤ j ≤ i − 2) contains all the error patterns with their
second lowest 1 at bit position (m− i+ 1 + j)k. The subset
(i − 1) contains all the error patterns with no 1 on the left
of bit position (m − i)k. The number of error patterns in
the subset j (1 ≤ j ≤ i − 2) is equal to the number of
error patterns with the lowest 1 at bit position (m− i+ 1 +
j)k. According to the definition, this number is just xi−j−1.
The number of error patterns in the subset (i − 1) is just 1.
Therefore, we have
xi =
i−2∑
j=1
xi−j−1 + 1 = xi−2 + xi−3 + · · ·+ x1 + 1.
Thus, Eq. (25) also holds.
Indeed, in the general case, no matter whether l is a
multiple of k or not, we have
xi = 1, (26)
for all 1 ≤ i ≤ t + 1, and
xi = xi−t−1 + xi−t−2 + · · ·+ x1 + 1, (27)
for all t + 1 < i ≤ m − t − 1. Note that the total number
of non-zero error patterns is given by xm−t−1 + xm−t−2 +
. . . + x1. To count the total number of error patterns, we
should also include the zero error pattern. Therefore, the
total number of error patterns is
N = xm−t−1 + xm−t−2 + . . . + x1 + 1.
If we extend the recursive definition of xi shown in Eq. (27)
also to i > m − t − 1, then the number of error patterns is
just xm.
Note that by Eq. (27), we also have
xi = xi−t−1 + xi−1, (28)
for all i > t + 1.
In summary, the number of error patterns is given by
xm, where xi (i ≥ 1) is recursively defined by Eq. (26)
and (28). When t = 0, we have xm = 2m−1. Therefore,
the runtime of the proposed algorithm is O(2m). When
t = 1, then the sequence xi is a Fibonacci sequence and
xm is the m-th value in the sequence, which is equal to
1√
5
((
1+
√
5
2
)m − ( 1−√52 )m). Therefore, the runtime of the
proposed algorithm is O
((√
5+1
2
)m)
.
6 EXPERIMENTAL RESULTS
We implemented our methods for calculating error rate
and error distribution using C++. The error distributions
of several block-based approximate adders obtained by
our method are shown in Section 6.1. Analysis was made
to explain the special pattern of the error distributions.
We compared the runtime and accuracy of the proposed
method to other existing methods in Sections 6.2 and 6.3,
respectively.
6.1 Error Distribution Study
Exact error distributions were generated for four block-
based approximate adders using our method, which are
ETA-IV with k = 4 and l = 2, ETA-II with k = 4 and l = 4,
CSAA with k = 4 and l = 8, and a block-based approximate
adder with k = 4 and l = 10. All these adders are 64-bit in
size. The scatter diagrams of the error distributions of the
four adders are shown in Fig. 10.
In Fig. 10, each circle represents an error distance and its
associated probability. In each plot, the x-axis is the error
distance in the log2 scale and the y-axis is the probability
also in the log2 scale. We can see that in each figure, the
scatter points approximately form a triangle. Inside the
triangle, points are located in a regular way. However, if we
zoom in the figures, we will find that some circles overlap
with others. The circles with a thicker outline or the solid
rounded rectangles in the figures are actually resulted by
the clustering of a number of circles. The reason for the
clustering along the x-axis is that the error distance of an
error pattern is dominated by the highest 1 in the error
pattern. For example, if the highest 1 of an error pattern
appears at the bit position 36, then the 1s on the right of
bit position 36 in the error pattern can only appear at bit
position lower than or equal to 32. Thus, the error distance
is close to 236 (indeed, the error distance is larger than or
equal to 236), which are approximately 36 in the log2 scale.
Thus, all error patterns which have their leading 1s at the
same location are located very closely in the x dimension.
Since the leading 1 in an error pattern can only occur at bit
positions ik, where k = 4 and i is an integer, the clusters
are located near x = 4i in the log2 scale, as shown in the
figures.
Moreover, we can also observe from the figures that for
points with close error distances, they cluster into a number
of groups along the y-axis. The reason behind this is that
among the error patterns with the leading 1 at the same
location, the patterns with the same number of 1s have
similar probabilities. We use the 64-bit CSAA with k = 4
and l = 8 to illustrate this. Consider all error patterns
with their error distances around 36 in the log2 scale. Their
leading 1s are all at bit position 36. Since t = l/k = 2, two
adjacent 1s in any error pattern must be at least 12 bits (or 3
blocks) away and the lowest bit position that can have a 1 is
12. Thus, there are at most three 1s in these error patterns.
We divide them into three sets based on the number of 1s in
the error pattern. The first set consists of the error patterns
with a single 1. Using the vector representation shown in
Fig. 9, the set includes only one error pattern of the form
[9]. By Theorem 2, its probability is d6 · e9,0. The second set
is composed of error patterns with two 1s. They are [9, 6],
[9, 5], [9, 4], and [9, 3]. Their probabilities are d6 · (e9,6e6,0),
d6 · (e9,5e5,0), d6 · (e9,4e4,0), and d6 · (e9,3e3,0), respectively.
The third set consists of error patterns with three 1s. The
set only has one element, which is [9, 6, 3]. Its probability
is d6 · (e9,6e6,3e3,0). It can be seen that the probability of
an error pattern with q 1s has q ei,j terms in its product
expression.
Now consider ei,j . Since for the example approx-
imate adder we consider, its l is a multiple of k,
the value ei,j is given by Eq. (23), i.e., ei,j =
P (Pi−1) · · ·P (Pi−t)P (Gi−t−1)di−t−j−1. Under the as-
sumption that the inputs are uniformly distributed, the
value P (Pi−1) · · ·P (Pi−t)P (Gi−t−1) is a constant for a
fixed t. We denote this constant as c. Then, we can represent
the value of ei,j as c · di−t−j−1. Therefore, the value of ei,j
is proportional to di−t−j−1. The sequence of black dots in
Fig. 11 shows the values of di for 0 ≤ i ≤ m−1 for the CSAA
with k = 4 and l = 8. From the figure, we can see that all
di’s are very close. As a result, all ei,j ’s for 0 ≤ i, j ≤ m− 1
and i − j > t are also very close. Consequently, we can
see that the error patterns in the same set have very close
probabilities, since their probability expressions contain the
same number of ei,j terms. Therefore, the error patterns in
the same set also cluster in the y dimension. Since there
are three sets, the number of clusters is three, as shown in
Fig. 10c. Furthermore, since the number of ei,j terms in the
probability expression of an error pattern is equal to the
11
 
0 10 20 30 40 50 60 70
-55
-50
-45
-40
-35
-30
-25
-20
-15
-10
-5
Error Distance (in log
2
 scale)
p
ro
b
a
b
ili
ty
 (
in
 l
o
g
2
 s
c
a
le
)
(a)
 
0 10 20 30 40 50 60 70
-40
-35
-30
-25
-20
-15
-10
-5
Error Distance (in log
2
 scale)
p
ro
b
a
b
ili
ty
 (
in
 l
o
g
2
 s
c
a
le
)
(b)
 
10 20 30 40 50 60 70
-50
-45
-40
-35
-30
-25
-20
-15
-10
-5
Error Distance (in log
2
 scale)
p
ro
b
a
b
ili
ty
 (
in
 l
o
g
2
 s
c
a
le
)
(c)
 
10 20 30 40 50 60 70
-60
-55
-50
-45
-40
-35
-30
-25
-20
-15
-10
Error Distance (in log
2
 scale)
p
ro
b
a
b
ili
ty
 (
in
 l
o
g
2
 s
c
a
le
)
(d)
Fig. 10: Scatter diagrams for error distribution produced by our method for: (a) ETA-IV with k = 4 and l = 2; (b) ETA-II with
k = 4 and l = 4; (c) CSAA with k = 4 and l = 8; (d) a block-based approximate adder with k = 4 and l = 10.
 
0 5 10 15
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
i
d
i
 
 
ETA-IV with k = 4, l = 2
ETA-II with k = 4, l = 4
CSAA with k = 4, l = 8
the block-based adder with k = 4, l = 10
Fig. 11: d values for ETA-IV with k = 4 and l = 2, ETA-II with
k = 4 and l = 4, CSAA with k = 4 and l = 8, and a block-based
approximate adder with k = 4 and l = 10.
number of 1s in the error pattern, the probability value of an
error pattern in the log2 scale is proportional to the number
of 1s in the error pattern. This explains why the distances
along the y dimension between any two adjacent clusters
are the same. Similar analysis can be made for points with
other error distances.
However, we found that the clustering of the points
along the y-axis is not so strong for ETA-IV, as shown in
Fig. 10a. The reason lies in that di’s for i = 0, 1, · · · ,m − 1
of this adder are quite different from each other, as shown
by the sequence of red dots in Fig. 11, making the values of
ei,j also vary greatly for different (i− j) (i− j > t). Hence,
for error patterns with close error distances, although some
of them have the same number of 1s, their probabilities are
not very close.
Based on the above analysis, we can see that the number
of clusters over the error patterns with close error distance
(i.e., error patterns with the leading 1 at the same bit
position) is equal to the maximal number of 1s in such
error patterns. Suppose the leading 1 is at bit position ik.
By Theorem 2, the maximal number of 1s in such error
patterns is bi/(t + 1)c. Therefore, the number of clusters
on the error patterns with error distance close to ik (in the
log2 scale) is equal to bi/(t + 1)c. This explains the trend of
how the number of clusters increases with the error distance
ik as shown in the figures. For example, for the ETA-II,
as shown in Fig. 10b, starting from the first error distance
4 · 2, two adjacent error distances 4i and 4(i + 1) have
the same number of clusters. Then, the next two adjacent
error distances 4(i + 2) and 4(i + 3) have one more cluster
than the previous two. This pattern continues until the last
error distance. This is because for ETA-II, t = 1 and hence
the number of clusters for an error distance ik is equal to
bi/2c. From the figures, we can also see that the minimum
non-zero error distances for the four approximate adders
are different, while the maximal error distances are close.
The minimal non-zero error distances for ETA-IV, ETA-II,
CSAA, and the block-based approximate adder with k = 4
and l = 10 are 4, 8, 12, and 12 in the log2 scale, respectively,
while the maximal error distances are all close to 60. This is
reasonable, since by Theorem 2, for a non-zero error pattern,
the lowest and the highest bit positions where a 1 can occur
are (t + 1)k and (m− 1)k, respectively.
Given that the error patterns are clustered around error
distance ik in the log2 scale, we are also interested in
studying how the sum of the probabilities of all the error
patterns with distance around ik changes with the value
ik. For the four approximate adders above, we obtained the
sums and plotted the bar graphs shown in Fig. 12. It can
be easily seen that the height of each bar located at error
distance ik (in the log2 scale) also represents the probability
of an error pattern with the leading 1 at the bit position ik,
12
 
4 8 12 16 20 24 28 32 36 40 44 48 52 56 60
0
0.02
0.04
0.06
0.08
0.1
0.12
Error Distance (in log
2
 scale)
P
ro
b
a
b
ili
ty
(a)
 
8 12 16 20 24 28 32 36 40 44 48 52 56 60
0
0.005
0.01
0.015
0.02
0.025
0.03
Error Distance (in log
2
 scale)
P
ro
b
a
b
ili
ty
(b)
 
12 16 20 24 28 32 36 40 44 48 52 56 60
0
0.5
1
1.5
2
x 10
-3
Error Distance (in log
2
 scale)
P
ro
b
a
b
ili
ty
(c)
 
12 16 20 24 28 32 36 40 44 48 52 56 60
0
1
2
3
4
5
6
x 10
-4
Error Distance (in log
2
 scale)
P
ro
b
a
b
ili
ty
(d)
Fig. 12: Distributions on the sum of probabilities of all the error patterns with close error distance for: (a) ETA-IV with k = 4 and
l = 2; (b) ETA-II with k = 4 and l = 4; (c) CSAA with k = 4 and l = 8; (d) a block-based approximate adder with k = 4 and
l = 10.
or the probability of an error pattern with error distance
in the range [2ik, 2(i+1)k). We denote such a probability
as P (LO = ik). From the figures, we can see that as i
increases, the probability P (LO = ik) also increases or
keeps unchanged. However, for different adders, the trends
are different. To understand the reason, we will analyze the
probability P (LO = ik). Indeed, we have
P (LO = ik) = P (e[> ik] = 0|e[ik] = 1)P (e[ik] = 1),
where P (e[> ik] = 0|e[ik] = 1) is the probability that there
is no 1 on the left of the bit position ik under the condition
that there is a 1 at bit position ik, and P (e[ik] = 1) is the
probability that there is a 1 at the bit position ik. From
Section 5, we know that P (e[> ik] = 0|e[ik] = 1) = dm−i−1.
Now we only need to obtain P (e[ik] = 1).
If l is a multiple of k, then by Theorem 1, P (e[ik] =
1) = P (Pi−1) · · ·P (Pi−t)P (Gi−t−1). Under the assumption
that the inputs are uniformly distributed, the product is a
constant for a fixed t and hence, P (LO = ik) is proportional
to dm−i−1. This can be verified by Fig. 12b and 12c, which
show the bar graphs for the ETA-II and CSAA, respectively.
Indeed, the trend of how the height of the bar changes
with error distance ik follows the same trend of how the
sequence dm−t−1, dm−t−2, . . . , d0 changes, which can be
observed from Fig. 11, in which the sequence of blue dots
corresponds to the sequence d0, d1, . . . , dm−1 for the ETA-II
and the sequence of black dots corresponds to the sequence
for CSAA. As shown in Fig. 11, for CSAA, the sequence
d0, . . . , dm−1 decreases very slowly. This explains why the
bars in the graph for CSAA increases very slowly.
If l is not a multiple of k, then P (e[ik] = 1) can be
obtained by Theorem 3. However, in this case, P (e[ik] = 1)
for i = t+1 differs from P (e[ik] = 1)’s for i > t+1. Indeed,
we have
P (e[ik] = 1) =
{
A(i), i = t + 1
A(i) + B(i), i > t + 1
,
where A(i) = P (Pi−1) · · ·P (Pi−t)P (PLi−t−1)P (GRi−t−1)
and B(i) = P (Pi−1) · · ·P (Pi−t−1)P (GLi−t−2). Under the
assumption that the inputs are uniformly distributed, both
A(i) and B(i) are positive constants independent of i. This
explains the special pattern shown in Fig. 12d for the block-
based approximate adder with k = 4 and l = 10. From the
figure, we can see the heights of the second bar up to the last
bar are almost the same. This is because P (e[ik] = 1) are the
same for all i > t + 1 and the values d0, d1, . . . , dm−1 are
very close as shown by the sequence of cyan dots in Fig. 11.
However, we can see the height of the first bar is obviously
shorter than the other bars. This is because P (e[(t + 1)k] =
1) is smaller than any other P (e[ik] = 1) (i > t + 1) by
a positive constant B(i). The above reasoning can be also
applied to explain the pattern shown in Fig. 12a for the ETA-
IV. Specifically, the trend from the second bar to the last bar
follows the trend of the sequence dm−3, dm−4, . . . , d0, which
is shown by the sequence of red dots in Fig. 11. The first bar
is short than the second bar because P (e[(t + 1)k] = 1) is
smaller than P (e[(t + 2)k] = 1) and dm−2 < dm−3.
6.2 Runtime Study
In this section, we compared the runtime of our method
to obtain error distribution with the Monte Carlo sam-
pling method, which randomly chooses a subset of input
combinations for simulation, and the exhaustive method,
which enumerates all the input combinations. All the meth-
ods were implemented in C++. All the experiments were
conducted on a virtual machine running Linux operating
system with 1GB memory. The host machine is a 3.1 GHz
desktop. We also compared the asymptotic runtime of our
method with the exact method proposed in [12].
Fig. 13 shows the runtime of the proposed method to
obtain error distribution and the Monte-Carlo methods with
10K, 100K, and 1M samples on four approximate adders:
ETA-IV with k = 4 and l = 2, ETA-II with k = l = 4, CSAA
with k = 4 and l = 8, and a block-based approximate adder
with k = 4 and l = 10. For each type of adder, we did ex-
periments on four different block numbers m = 4, 8, 12, 16.
Thus, the operand sizes n vary from 16 to 64.
From the four plots in Fig. 13, we can see that the
proposed method consumes much less time than the Monte-
Carlo method. For all the experiments, the proposed method
takes less than 0.02 second. For three 64-bit approximate
adders with l = 4, l = 8, and l = 10, the proposed method
needs less than 0.002 seconds. In contrast, for the Monte-
Carlo method with 10K samples, the runtime is 0.05 ∼ 0.9
seconds. As the sample size increases, the runtime of the
Monte Carlo method also increases linearly. When the sam-
ple size reaches 1M, the runtime of the Monte-Carlo method
is 5 ∼ 90 seconds.
Comparing the four plots in Fig. 13, we can also see
that the runtime of the Monte-Carlo method increases with
the carry generator size l. This is reasonable since a larger
carry generator size l implies longer simulation time. In
contrast, the runtime of the proposed method decreases
with l. Especially, our method for ETA-IV with l = 2 takes
much more time than the other three adders when the
number of blocks are 12 and 16. This is also reasonable.
As we discussed in Section 5.3, the runtime of the proposed
method is proportional to N , where N is the total number
of error patterns of a block-based approximate adder. For a
fixed n and k, as l decreases, the value t = bl/kc decreases.
Consequently, the number of error patterns increases. Ac-
cording to our experimental results, a 64-bit ETA-IV with
k = 4 and l = 2 has 32768 error patterns, while a 64-bit
ETA-II with k = 4 and l = 4 has 987 error patterns. The
number of error patterns for a 64-bit CSAA with k = 4
and l = 8 and a 64-bit block-based approximate adder with
13
 
 
 
4 6 8 10 12 14 16
10
-4
10
-2
10
0
10
2
Number of blocks
ru
n
ti
m
e
 (
s
)
4 6 8 10 12 14 16
10
-4
10
-2
10
0
10
2
Number of blocks
ru
n
ti
m
e
 (
s
)
(a)
 
 
 
4 6 8 10 12 14 16
10
-4
10
-2
10
0
10
2
Number of blocks
ru
n
ti
m
e
 (
s
)
4 6 8 10 12 14 16
10
-4
10
-2
10
0
10
2
Number of blocks
ru
n
ti
m
e
 (
s
)
(b)
 
 
4 6 8 10 12 14 16
10
-4
10
-2
10
0
10
2
Number of blocks
ru
n
ti
m
e
 (
s
)
4 6 8 10 12 14 16
10
-4
10
-2
10
0
10
2
Number of blocks
ru
n
ti
m
e
 (
s
)
 
 
Proposed method
MC (10K)
MC (100K)
MC (1M)
(c)
 
 
4 6 8 10 12 14 16
10
-4
10
-2
10
0
10
2
Number of blocks
ru
n
ti
m
e
 (
s
)
4 6 8 10 12 14 16
10
-4
10
-2
10
0
10
2
Number of blocks
ru
n
ti
m
e
 (
s
)
 
 
Proposed method
MC (10K)
MC (100K)
MC (1M)
(d)
Fig. 13: Runtime comparison between the proposed method to obtain the error distribution and the Monte-Carlo (MC) sampling
method for: (a) ETA-IV with k = 4 and l = 2; (b) ETA-II with k = 4 and l = 4; (c) CSAA with k = 4 and l = 8; (d) a block-based
approximate adder with k = 4 and l = 10.
TABLE 1: Runtime of the exhaustive method in second.
n = 8 n = 12 n = 16
ETA-IV, k = 4, l = 2 0.09 30 7949
ETA-II, k = 4, l = 4 0.148 54 18877
CSAA, k = 4, l = 8 0.209 83 31340
Block-based, k = 4, l = 10 0.148 97 36668
k = 4 and l = 10 are both 189. Therefore, the 64-bit ETA-IV
consumes much more time than the other three adders.
To compare our method with the exhaustive method, we
obtained the runtime of the exhaustive method for the four
types of approximate adders with size n = 8, 12, 16. The re-
sults are shown in Table 1. We can see that when n = 16, the
runtime for the exhaustive method has already reached 7000
seconds, which is much larger than our method. Indeed, the
exhaustive method has a time complexity of O(4n), where
n is the size of an adder. As a result, its runtime explodes
very quickly.
Finally, we also compared the asymptotic runtime of our
method with the method proposed in [12]. The asymptotic
runtime of the method in [12] is O(2m) and that of our
method is O(N(m, t)), where N(m, t) is the total number
of error patterns of a block-based approximate adder, deter-
mined by the number of blocks m and the value t = bl/kc.
The value N(m, t) can be obtained by the method shown
in Section 5.3. By ignoring the constant, we used the ratio
2m/N(m, t) to measure the asymptotic runtime speed-up
of our method over the method in [12]. In Fig. 14, for each
t = 0, 1, 2, 3, we show the asymptotic runtime speed-up
versus m for m ranging from 1 to 16. We can see that
for block-based approximate adders with t = 0, such as
ETA-IV, both methods have the same asymptotic runtime.
However, for adders with t > 0, such as ETA-II and CSAA,
our method is much more efficient than the method in [12].
On the one hand, for any fixed t > 0, the speed-up grows
exponentially with m. On the other hand, for any fixed m,
the speed-up also increases with t, because the number of
error patterns drops as t increases. From our experimental
results, when m = 16, the speed-ups for t = 1, 2, and 3 are
approximately 66, 347, and 950, respectively, demonstrating
that our method is much more efficient than the method [12]
in providing the exact error distributions for block-based
approximate adders.
6.3 Accuracy Study
Since the error distribution produced by our method is
exact, we can further obtain the exact error statistics, such
as error rate (ER), mean error distance (MED), and mean
square error (MSE). In this section, we demonstrate the
advantage of our method in accuracy by comparing it with
the method in [10] and the Monte-Carlo sampling method.
 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
10
0
10
1
10
2
10
3
Number of blocks
ru
n
ti
m
e
 s
p
e
e
d
-u
p
 
 
t = 0
t = 1
t = 2
t = 3
Fig. 14: Asymptotic runtime speed-up of the proposed method
over the method in [12].
To show the advantage in accuracy, we used the results
of the proposed method as the reference and computed the
relative errors of ERs and MSEs obtained by the method
in [10] and the Monte Carlo sampling methods with sample
sizes as 10K, 100K, and 1M. Note that since the work [10]
has given an exact analytical expression for the mean error
distance (MED), we did not consider the relative errors of
MED in our experiment. The relative errors of ERs and
MSEs in percentage are shown in Table 2. The experiments
were performed on eight different 32-bit block-based ap-
proximate adders. The types of the adders and their k and
l values are listed in the first column of the table. For each
sample size M in the Monte Carlo simulation, we ran the
entire M -sample simulation 100 times and obtained the
relative error for each simulation run. The final relative error
listed in the table is the average relative error over 100 runs.
The results in the table show that the relative errors on ERs
and the MSEs computed by the method in [10] can be up
to 4.7% and 8.6%, respectively. For the Monte-Carlo method
with sample size equal to 10K, the relative errors on ERs and
MSEs can be up to 15.0% and 32.8%, respectively. When the
sample size increases, the relative errors decrease. However,
simulations with a larger sample size takes longer time.
It should be noted that the method in [10] can only
estimate the values of ER and MSE. It cannot provide er-
ror distributions. However, the Monte-Carlo method could
generate an error distribution through random simulation,
although the result is not exact. Next, we compared the error
distributions given by the Monte-Carlo method to the exact
error distributions produced by the proposed method. We
applied both methods to generate the type of distribution as
shown in Fig. 12. The results on 64-bit ETA-IV with k = 4
and l = 2, ETA-II with k = 4 and l = 4, CSAA with k = 4
and l = 8, and a block-based approximate adder with k = 4
and l = 10 are shown in Fig. 15. The white bars show the
accurate error distribution produced by our method, while
the red lines on each bar indicate the range of probabilities
14
 
4 8 12 16 20 24 28 32 36 40 44 48 52 56 60
0
0.02
0.04
0.06
0.08
0.1
0.12
Error Distance (in log
2
 scale)
P
ro
b
a
b
ili
ty
(a)
 
8 12 16 20 24 28 32 36 40 44 48 52 56 60
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Error Distance (in log
2
 scale)
P
ro
b
a
b
ili
ty
(b)
 
12 16 20 24 28 32 36 40 44 48 52 56 60
0
0.5
1
1.5
2
2.5
x 10
-3
Error Distance (in log
2
 scale)
P
ro
b
a
b
ili
ty
(c)
 
12 16 20 24 28 32 36 40 44 48 52 56 60
0
2
4
6
8
x 10
-4
Error Distance (in log
2
 scale)
P
ro
b
a
b
ili
ty
(d)
Fig. 15: Comparison between our method and the Monte Carlo sampling method with 100K samples in producing error
distribution for: (a) ETA-IV with k = 4 and l = 2; (b) ETA-II with k = 4 and l = 4; (c) CSAA with k = 4 and l = 8; (d) a
block-based approximate adder with k = 4 and l = 10.
TABLE 2: Relative errors in percentage of the method proposed
in [10] and the Monte Carlo method.
[10] Monte Carlo simulation10K 100K 1M
ETA-IV
k = 2, l = 1
ER 0.812 0.188 0.064 0.055
MSE 8.573 1.705 0.521 0.161
ETA-II
k = 2, l = 2
ER 4.658 0.436 0.133 0.041
MSE 1.541 2.439 0.669 0.222
CSAA
k = 2, l = 4
ER 3.713 1.374 0.367 0.144
MSE 0.098 4.531 1.652 0.502
k = 2, l = 5
ER 0.142 1.843 0.582 0.211
MSE 0.037 6.629 2.049 0.813
ETA-IV
k = 4, l = 2
ER 0.381 0.676 0.189 0.064
MSE 1.254 2.236 0.706 0.231
ETA-II
k = 4, l = 4
ER 2.331 1.687 0.608 0.195
MSE 0.025 4.771 1.355 0.472
CSAA
k = 4, l = 8
ER 0.256 8.544 2.719 0.831
MSE 0 17.284 5.130 1.669
k = 4, l = 10
ER 4.093 14.991 4.634 1.619
MSE 0 32.770 11.240 3.922
in 20 Monte Carlo simulation runs, each with 100K samples.
From the figure, we can see that when l is small, the Monte
Carlo simulation can produce a result close to the accu-
rate distribution, but when l increases, the accuracy of the
Monte Carlo simulation degrades. The reason is that when
l increases, the error probability decreases. For example, as
shown in Fig. 15, the error probability of ETA-IV is on the
scale of 0.01, while the error probability of the block-based
approximate adder with k = 4 and l = 10 is on the scale
of 0.0001. As the probability becomes smaller, the relative
variation of the Monte Carlo result becomes larger. This
clearly indicates the accuracy of the Monte Carlo method
is reduced when the error probability is small, which is not
uncommon for approximate adders. In contrast, our method
is guaranteed to obtain the exact distribution.
7 CONCLUSION
In this paper, we proposed an accurate and efficient
method to obtain the error rate and error distribution of
block-based approximate adders. The method to obtain er-
ror distribution indeed achieves the theoretical lower bound
on the asymptotic runtime. Once the distribution is known,
some other error metrics of interest, such as mean error
distance and mean square error, can be easily obtained. Ex-
perimental results demonstrated that the proposed methods
are accurate and efficient. Compared to the Monte Carlo
sampling method, our method is much better, especially
when the error probability is small.
In this work, we considered block-based approximate
adders with the speculated carry-in to the carry generator
as 0. It is also possible to use the input bit at one bit position
lower as the speculated carry-in [18]. In our future work,
we will develop techniques to analyze the error statistics for
this type of approximate adders.
ACKNOWLEDGMENTS
This work is supported by National Natural Science
Foundation of China (NSFC) under Grant No. 61574089 and
61472243.
REFERENCES
[1] J. Han and M. Orshansky, “Approximate computing: An emerging
paradigm for energy-efficient design,” in European Test Symposium,
2013, pp. 1–6.
[2] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, “Bio-
inspired imprecise computational blocks for efficient VLSI imple-
mentation of soft-computing applications,” IEEE Transactions on
Circuits and Systems I, vol. 57, no. 4, pp. 850–862, 2010.
[3] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, “Low-
power digital signal processing using approximate adders,” IEEE
Transactions on Computer-Aided Design of Integrated Circuits and
Systems, vol. 32, no. 1, pp. 124–137, 2013.
[4] A. K. Verma, P. Brisk, and P. Ienne, “Variable latency speculative
addition: A new paradigm for arithmetic circuit design,” in Design,
Automation and Test in Europe, 2008, pp. 1250–1255.
[5] N. Zhu, W. L. Goh, and K. S. Yeo, “An enhanced low-power
high-speed adder for error-tolerant application,” in International
Symposium on Integrated Circuits, 2009, pp. 69–72.
[6] Y. Kim, Y. Zhang, and P. Li, “An energy efficient approximate
adder with carry skip for error resilient neuromorphic vlsi sys-
tems,” in International Conference on Computer-Aided Design, 2013,
pp. 130–137.
[7] A. B. Kahng and S. Kang, “Accuracy-configurable adder for ap-
proximate arithmetic designs,” in Design Automation Conference,
2012, pp. 820–825.
[8] J. Hu and W. Qian, “A new approximate adder with low relative
error and correct sign calculation,” in Design, Automation and Test
in Europe, 2015, pp. 1449–1454.
[9] R. Ye, T. Wang, F. Yuan, R. Kumar, and Q. Xu, “On reconfiguration-
oriented approximate adder design and its application,” in Inter-
national Conference on Computer-Aided Design, 2013, pp. 48–54.
[10] L. Li and H. Zhou, “On error modeling and analysis of approxi-
mate adders,” in International Conference on Computer-Aided Design,
2014, pp. 511–518.
[11] C. Liu, J. Han, and F. Lombardi, “An analytical framework for
evaluating the error characteristics of approximate adders,” IEEE
Transactions on Computers, vol. 64, no. 5, pp. 1268–1281, 2015.
[12] S. Mazahir, O. Hasan, R. Hafiz, M. Shafique, and J. Henkel, “Proba-
bilistic error modeling for approximate adders,” IEEE Transactions
on Computers, vol. PP, no. 99, pp. 1–14, 2016.
[13] K. Du, P. Varman, and K. Mohanram, “High performance reliable
variable latency carry select addition,” in Design, Automation and
Test in Europe, 2012, pp. 1257–1262.
[14] I. C. Lin, Y. M. Yang, and C. C. Lin, “High-performance low-power
carry speculative addition with variable latency,” Very Large Scale
Integration Systems IEEE Transactions on, vol. 23, no. 9, pp. 1591–
1603, 2015.
[15] S.-L. Lu, “Speeding up processing with approximation circuits,”
Computer, vol. 37, no. 3, pp. 67–73, 2004.
[16] N. Zhu, W. L. Goh, G. Wang, and K. S. Yeo, “Enhanced low-power
high-speed adder for error-tolerant application,” in International
SoC Design Conference, 2010, pp. 323–327.
[17] M. Shafique, W. Ahmad, R. Hafiz, and J. Henkel, “A low latency
generic accuracy configurable adder,” in Design Automation Con-
ference, 2015, pp. 86:1–86:6.
[18] J. Huang, J. Lach, and G. Robins, “A methodology for energy-
quality tradeoff using imprecise hardware,” in Design Automation
Conference, 2012, pp. 504–509.
