Optimization of DSP Applications Using Parameterized Error Models for
  Low Power Approximate Adders by Dharmaraj, Celia et al.
ar
X
iv
:2
00
1.
02
10
2v
1 
 [e
es
s.S
P]
  6
 Ja
n 2
02
0
1
Optimization of DSP Applications Using
Parameterized Error Models for Low Power
Approximate Adders
Celia Dharmaraj, Student Member, IEEE, Vinita Vasudevan, Member, IEEE,
and Nitin Chandrachoodan, Member, IEEE
Abstract—Approximate circuit design has gained significance
in recent years targeting error tolerant applications. In this paper,
we first demonstrate that the commonly used assumption that
the inputs to the adder are uniformly distributed results in an
inaccurate prediction of error statistics for multi-level circuits.
To overcome this problem, we derive parameterized error models
that can be used within any optimization framework in order
to optimize the number of approximate bits. We also show
that in order to accurately compute the MSE, the optimization
framework needs to take into account not just the functionality
of the adder, but also its position in the circuit, functionality of
its parents and the number of approximate bits in the parent
blocks. We demonstrate a significant improvement of accuracy
in the prediction of the noise power of DSP systems containing
approximate adders.
Index Terms—Approximate computing, error model, static
probability, optimization, noise power.
I. INTRODUCTION
Approximate computing is widely used in signal and im-
age processing applications to obtain improvements in power
and/or speed while maintaining the required accuracy. Adders
are the basic building blocks in these applications and a typical
implementation has a large number of adders. A variety of
approximate adders have been proposed in the literature, with
different levels of trade-offs between accuracy and perfor-
mance. These adders can be classified as low-latency [1] (and
references therein) and low-power approximate adders (LPAA)
[2]–[7]. Low power implementations of DSP systems using
LPAAs optimize the power consumption by maximizing the
number of approximate bits in each adder for a given accuracy.
In the literature, multiple approaches have been proposed to
find optimal approximation levels for adders used in low power
implementations. An approximate Finite Impulse Response
(FIR) filter is designed by fixing the level of approximation
of the adders using Monte-Carlo (MC) simulations in [8].
Approximate mirror adder-5 (AMA-5) [4] modeled assum-
ing uniformly distributed inputs are used in a 2D Discrete
Cosine Transform (DCT) module constructed using 1D DCT
blocks in [9] and the optimization problem is solved using
a mixed integer non-linear problem solver. Cartesian Genetic
Programming (CGP) is used to design various approximate
implementations of four point 1D DCT in [10]. An expression
The authors are with the Department of Electrical Engineering, Indian
Institute of Technology Madras, India.
E-mail: ee13d003,vinita,nitin@ee.iitm.ac.in.
TABLE I: Noise power (in dB) of an 8× 8 DCT module that uses Lower
part OR adders. NP1: Noise power assuming uniform distribution, NPsim:
Noise power using simulations, e = |NPsim −NP1|.
NP1 NPsim e No. of approximate bits
-53.45 -51.20 2.25 L1 − L4: 5; L5 − L6: 6
-51.76 -44.67 7.09 L1 − L2: 5; L3 − L6: 6
-51.32 -43.00 8.32 L1 − L2: 5; L3 − L5: 6; L6: 7
-50.55 -41.21 9.34 L1 − L2: 5; L3 − L4: 6; L5 − L6: 7
-47.59 -36.50 11.09 L1: 5; L2 − L3: 6; L4 − L5: 7; L6: 8
for variance of error of AMA 1-5 adders [4] and Lower-
part-OR adder (LOA) [5] is obtained in [11] empirically by
regression assuming uniform inputs, and heuristics are used to
solve the approximation-level optimization problem. In [12],
AMA 1-5 adders and Transmission Gate based Approximate
adders TGA I-II [13] are considered. An expression for mean
square error (MSE) is obtained assuming that the distribution
of inputs and error are uniform. This is then used in a Lagrange
multiplier based optimization approach.
All of the above previous works on optimization use error
metrics based on uniformly distributed inputs. Moreover, the
same error model is used for all adders in the circuit. To verify
the validity of these assumptions, we found the MSE in an
approximate 8 × 8 DCT module [14], that has 288 adders
in a 6-level adder tree. We evaluated the noise power in dB
(10×log10 MSE), assuming an error model for LOA based on
uniformly distributed inputs [5] and compared it with values
obtained using MC simulations, with 105 uniformly distributed
random inputs (NP1 and NPsim respectively in Table I).
The numbers after levels (L1 − L6) indicate the number of
approximate bits. As seen from Table I, the analytical noise
power differs from the simulated value by as much as 11 dB,
though the model worked well for an individual adder.
While the assumption of uniformly distributed lower order
bits may be justified for the primary inputs, neither the output
of LPAAs nor the error is uniformly distributed. A more
accurate method of obtaining the probability mass function
(PMF) of error is proposed in [15]. However, including
this method within an optimization routine would require
extensive computations. Moreover, in most applications, an
accurate estimate of the mean error and MSE is sufficient
and we do not need the PMF of the error.
In this paper, we show that mean and MSE can be computed
accurately without computing the PMF of the error if the
number of approximate bits and the static probabilities of the
inputs is taken into account. To this end, we derive parame-
2terised error models that can be used within any optimization
framework in order to optimize the number of approximate
bits. We also show that in order to accurately compute the
MSE, the optimization framework needs to take into account
not just the functionality of the adder, but also its position
in the circuit, functionality of its parents and the number
of approximate bits in the parent blocks. We demonstrate a
significant improvement of accuracy in the prediction of MSE
in applications such as FIR filter, IIR filter and DCT using a
variety of LPAAs.
Each approximate adder requires the static probabilities of
its input bits, which eventually traces back to the primary
inputs. Even though the PMF of the input signal is usually
very non-uniform, the PMF of the lower order bits is often
close to uniform, so that simple expressions for the error can
be used. Some justification for this assumption is included in
[7], but it is heuristic and there is no condition that can be
checked to test for uniformity. In this paper, we show that the
discrete Fourier transform (DFT) of the input signal has to
satisfy certain constraints if the PMF is uniform. This can be
used to easily find the maximum number of bits that can be
considered to be uniformly distributed.
II. PARAMETERIZED ERROR MODELS FOR LOW POWER
APPROXIMATE ADDERS
As mentioned, the simple error models fail to give accurate
estimates of the mean error and MSE since they typically
assume that the inputs or the error is uniformly distributed.
The more complex models that compute the PMF of the error
are more accurate, but unsuitable for use in an optimizer.
We propose parameterized error models with the input static
probability as parameters, that can take into account the input
distribution without evaluating the full PMF. The models are
more complex than the expressions in [12], but they are
analytical expressions, suitable for use in an optimizer.
Assume that an (N, k)-bit LPAA with inputs a and b has
k approximate bits and N − k accurate bits, with sˆi =
f(ai, bi, cˆi−1) and cˆi = g(ai, bi, cˆi−1) denoting the i
th bit
of the approximate sum and carry. Let Pxi denote the static
probability of a signal xi. As with all models in the literature,
we continue to assume that a and b are independent. This
assumption is an approximation when the circuit has recon-
vergent fanouts. However, it is a reasonable approximation
in many cases as correlations are diluted as the logic depth
increases, as argued in [15].
The error in the output is due to the approximate lower part
sum and the approximate carry to accurate adder. This can be
written as
e =
k−1∑
i=0
(ai + bi)2
i −
k−1∑
i=0
sˆi2
i − cˆk−12
k. (1)
The mean error of the approximate adder is given by
E{e} =
k−1∑
i=0
(Pai + Pbi)2
i −
k−1∑
i=0
Psˆi2
i − Pcˆk−12
k. (2)
To derive Psˆi , we need Pcˆi−1 . One way to do this is to
start with the LSB, for which Pcˆ−1 = 0 and find the static
probability of all the other carry bits based on the truth table.
Alternatively, we can assume that the static probability of
the carry is independent of the position i.e., Pcˆi = Pcˆi−1 .
This is a reasonable assumption as the input static proba-
bility is not expected to vary with location. For example,
for AMA-1 if we assume Pcˆi = Pcˆi−1 , we get 0.67 as the
static probability, whereas if we use the first method, we get
[0.5,0.625,0.656,0.67,0.67] for the five LSBs as in [4]. Using
0.67 as the static probability of the carry, Psˆi = 0.33 which
matches well with actual values for k > 3. Note that we
have not assumed anything about the PMF of the inputs. If
it happens to be uniform, we can use Pai = 0.5. However,
if the inputs come from another approximate adder, we just
need to use the right value of the static probability.
The MSE (E{e2}) can be derived in a similar fashion.
The expression for MSE computation requires evaluation
of correlated terms such as E{sˆisˆj}, E{aisˆj}, E{bisˆj},
E{aicˆk−1}, E{bicˆk−1} and E{sˆicˆk−1} in terms of the static
probabilities. Each of these can be derived using truth table of
the approximate adder. The correlation between bits E{sˆisˆj}
can be neglected in most adders i.e., the individual bits of the
sum are independent. However, it is significant in a few adders
such as AMA2 (sˆi = cˆi), where there is a close relationship
between the sum and carry. An exception to this method for
deriving error models is ETA-I [6], where the lower part sum
cannot be written as a truth table. Its error metrics are derived
in our earlier work [16].
III. MULTI-LEVEL SYSTEMS
In DSP systems, the inputs to the adder are either the
primary inputs or they are output of another approximate
adder, subtractor or (in our case, accurate) multiplier. In each
case, the static probabilites of the input needs to be evaluated
correctly. We consider each case in the following subsections.
A. Static Probabilities: Primary inputs
Typical PMF of any primary input such as an image is not
uniform. However, since error expression of a LPAA involves
the static probability of the lower k bits, we would only need
to check if the PMF of these k bits is uniform. A brute-force
technique to check would be to compute the PMF of the lower
order bits for each value of k and check for uniformity of the
PMF. In this section, we show that it is possible to check if
the distribution is uniform by computing the DFT of the input.
A single DFT is sufficient to find the values of k for which
this assumption is reasonable.
Let FA be the 2
N -point DFT of the PMF of N -bit signal
A and FAL be the 2
k-point DFT of the PMF of AL (k LSBs
of A). We have,
FAL [m] =
2
k−1∑
n=0
P (AL = n)e
−jmn2pi/2k (3)
=
2
k−1∑
n=0
2
N−k−1∑
l=0
P (A = l2k + n)e−jmn2pi/2
k
=
2
N−k−1∑
l=0
l2k+2k−1∑
n′=l2k
P (A = n′)e−jmn
′
2pi/2kejml2
k
2pi/2k
3=
2
N−1∑
n′=0
P (A = n′)e−jmn
′
2pi/2k
=FA[m · 2
N−k], 0 ≤ m < 2k. (4)
If AL is uniform, P (AL = n) =
1
2k
, 0 ≤ n < 2k. Hence from
(3), if AL is uniform, we have
FAL [m] =
2
k−1∑
n=0
1
2k
e−jmn2pi/2
k
=
{
1, if m = 0
0, if 0 < m < 2k.
(5)
Since DFT is unique, the converse is also true. Therefore using
(4), we have the following condition to be satisfied for AL to
be uniformly distributed.
FA[m · 2
N−k] =
{
1, if m = 0
0, if 0 < m < 2k.
(6)
In [17], they have similar condition for continuous signals that
are quantized, although the derivation is a little more involved.
To illustrate this condition (6), let us consider the Cam-
eraman image with N = 8. For the image pixel distribution,
FA[m · 2
N−k] for different values of k, m varying from 0 to
2k − 1, is plotted in Fig. 1. It is seen that for lower values
of k, the value of the transform is very close to zero for
0 < m < 2k. As k increases, the value of transform also
increases and for k = 5, the values are high. This is confirmed
from the actual PMF of the lower order bits of the image
shown in Fig. 2. From the figure, it is seen that distribution can
be considered uniform even if half the bits are approximated.
This turns out to be true for all the standard images we have
looked at (for example, for the Lena image, k = 6, for Rice,
k = 5). Hence, we assume that the PMF of the lower order
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
0.05
0.1
0.15
m
F
A
[m
·
2N
−
k
]
k = 1 k = 2 k = 3 k = 4 k = 5
Fig. 1: Illustration of condition for k lower-order bits of Cameraman image
to be uniform.
0 50 100 150 200 250
(a) N = 8
0 1
(b) k = 1
0 1 2 3
(c) k = 2
0 2 4 6
(d) k = 3
0 5 10 15
(e) k = 4
0 5 10 15 20 25 30
(f) k = 5
Fig. 2: (a) PMF of Cameraman image; (b)-(f) PMF of the lower k bits of the
image.
+ +
+
N N N N
m1,k1 m2,k2
m3,k3
Adder1 Adder2
Adder3
(a)
+ +
+
∗
N N N N
m1,k1
(Pai)
m2,
(Pi)
(Pbi)
k2
m3,k3
c
Adder1 Adder2
Adder3
(b)
Fig. 3: Adder tree in a circuit with N bits at primary inputs.
bits of primary inputs to the approximate adder is uniform i.e.,
the static probability is 0.5.
B. Static Probabilities: Adders in the higher levels
If the inputs to the adder are the output of another adder
as in Fig. 3a, the mean and MSE are derived using Psˆi and
Pcˆk−1 as discussed previously.
The other possibility is that input is the output of a
multiplier. In this work, we are only optimizing adders and
all multipliers are accurate, with the output truncated to the
standard precision used in the circuit. Also, we only consider
linear systems, so that one of the inputs to the multiplier is
a constant coefficient. In Fig. 3b, consider Adder3 which has
an input from the output of the multiplier. Depending on the
value of the constant coefficient c, the probability of the LSBs
at the output of the multiplier (Pbi ) will vary. Let Pi denote the
probability that the ith bit of the k2 LSBs of the multiplicand
(output of Adder2) is 1. Consider the following cases.
1) When c = 2l and l ≥ 0, the product is the logical left
shift of the multiplicand. So Pbi = 0 for the first l LSBs
and Pbi = Pi−l for the next k2 − l LSBs.
2) When c = −2l and l ≥ 0, Pbi = 0 for the first l LSBs
and Pbi = 1− Pi−l for the next k2 − l LSBs is a good
approximation, accounting for flipping involved in the
two’s complement representation for negative numbers.
3) When c = 2l and l < 0, the product is the right shift of
the multiplicand. So Pbi = Pi+|l| for k2 − l LSBs.
4) When c = −2l and l < 0, Pbi = 1 − Pi+|l| for k2 − l
LSBs.
5) If c is an arbitrary constant that is not a power of 2, MC
simulations indicate that the average static probability
of the output bits is 0.5 ± 0.03. This is also intuitively
correct, since the product is the sum of several partial
products and the LSBs are truncated to the precision
maintained in the system. Hence, the overall PMF is
likely to be symmetric (moving towards Gaussian),
which means that the static probability is 0.5. So we
assume that Pbi = 0.5 for k2 LSBs.
C. Truncation and Median adder (MA) in higher levels
Both Truncation and MA [7] have their lower part sum bits
fixed to constant all 0’s and 1’s respectively. In these adders,
since the lower part sum is known, the lower part sum of the
adders in higher levels can be fixed more accurately so that
the accuracy of the approximate circuit is improved. In case
of Truncation adder, the approximate sum is obviously zero.
In Fig. 3a with MA, Adder1 and Adder2 will have their lower
part sum as 2k1 − 1 and 2k2 − 1, respectively. For Adder3, we
4lower the MSE of the circuit by setting the sum to 2k3+1 − 1
instead of 2k3 − 1 for the following cases:
1) When k3 ≤ k1, k2, the lower part sum is known exactly
and is equal to 2k3+1 − 2.
2) When k1 ≥ k3 > k2, the mean of the sum is (3× 2
k3 +
2k2 − 4)/2, which is closer to 2k3+1 − 1 than 2k3 − 1.
Using this setting, we obtain up to 6 dB improvement in MSE
for the adder tree in Fig. 3a. Beyond the first level, the input
static probability for Median adders is set to 1.
IV. OPTIMIZATION FRAMEWORK
Classical wordlength optimization uses a simple model for
the quantization error and the same model is used for all
nodes in the circuit. In the literature, a similar framework
with a single expression for error for all the adders has
been used to find the optimal number of approximate bits.
However, the discussion in the previous section shows that this
is inadequate to get accurate numbers. Hence, we made several
modifications to the framework, which are detailed below.
The input to the optimizer is the circuit implemented using
adders, multipliers and registers and the corresponding signal
flow graph. The primary inputs to the system are normalized
to 1.N fixed point numbers. For each functional unit in the
system, we use the required number of integer bits while
maintaining the number of fractional bits asN . The goal of the
optimizer is to maximize the number of approximate fractional
bits of the adders in the circuit for a given MSE at the
output. We use the three step procedure discussed in [18] and
adapt it for approximate computing. It uses Minimum Width
algorithm, Mildest Greedy Ascent algorithm and Tabu search
algorithm. The main steps in our optimization framework are
as follows:
• We have a pre-processing step in which adder gets the
static probability of the inputs based on its parent nodes.
If the parent is a register, it gets the static probability of
the inputs to the register. Also, the transfer function from
each adder node to the output is computed.
• Next, we run a minimum width algorithm (MWA) that
gives the maximum number of approximate fractional bits
at the output of each adder when all the other adders are
accurate and the required MSE constraint is satisfied.
• Starting with the number of approximate bits from the
MWA, a greedy descent procedure is used to decrease the
number of approximate bits in the adder that causes the
maximum improvement in MSE. An important difference
from the quantization noise optimization in [18] is that
the approximation noise can worsen even if the number
of approximate bits is decreased. For circuits with mul-
tiple outputs, we find the fan-in cone of each output in
sequence. The adders in the fan-in cone are optimized
while keeping the adders in the fan-in cone that increase
the MSE of previously targeted outputs untouched.
• Finally we run a tabu search algorithm targeting signals
with minimum number of approximate bits (instead of the
most sensitive signal) and keep increasing the number of
approximate bits as long as MSE constraint is met. We
2 3 4 5 6 7 8
0
1
2
3
Number of approximate bits (k)
E
rr
o
r
in
n
o
is
e
p
o
w
er
(d
B
)
(1)Pa,i = Pb,i = 0.5 (2)Paiaj = Pa,iPa,j (3)Pc,i = Pc,i−1 (4) Param. model
(a) AMA1 in the second level
2 3 4 5 6 7 8
0
2
4
6
Number of approximate bits (k)
E
rr
o
r
in
n
o
is
e
p
o
w
er
(d
B
)
(b) AMA2 in the second level
Fig. 4: Error in noise power computation of Adder3 in Fig.3a when the tree
is implemented using (a) AMA-1 and (b) AMA-2 adders.
found that this heuristic provides better optimization of
approximate bits.
In each of these algorithms, the overall MSE at the output is
computed using the transfer functions from the adder nodes
and the parameterized error model for the adder.
V. EXPERIMENTAL RESULTS
In this section, we first validate our error model. Each
assumption is tested against simulation and verified for cor-
rectness. We then obtain the optimum number of approximate
bits for a given MSE using our optimization framework. This
is done for FIR and IIR filters and an 8 × 8 DCT module.
These results are used to show that the optimizer requires the
parameterized error models with the correct values of the static
probability for accurate prediction of the MSE.
In order to validate our error model, we used the simple
adder tree depicted in Fig. 3a. Some of the LPAAs like AMA-1
and AMA-2 adders involve carry propagation in the approxi-
mate lower part sum. Therefore, in addition to the correct value
of the static probability, evaluation of the MSE also requires
correlations between bits to be taken into account. Fig. 4
shows a comparison of the error in noise power computation of
Adder3 in Fig. 3a (1) assuming Pa,i = Pb,i = 0.5 (2) assuming
that the individual bits of each input are independent (i.e.
Paiaj = Pa,iPaj ) (3) assuming that Pc,i = Pc,i−1 (4) using
the parameterized error model including all the correlations.
It can be seen that, Pc,i = Pc,i−1 is a good approximation for
k > 3. The correlations between the bits in each input can
be ignored in AMA-1, AMA-2 requires all correlations to be
taken into account. From the discussion in Section II, both
these results are as expected.
We validated our optimization framework using an 18-tap
FIR filter (direct form I realization), IIR filter (direct form II
realization of a 4th order low pass Butterworth filter) and
DCT [14] consisting of 17, 8 and 288 adders respectively.
These are typically the benchmarks that have been used in
the literature. For each of these systems, we obtained the
optimum number of approximate bits for each adder in the
system, given an overall MSE specification. This was done
for the Truncation, MA, AMA-5, LOA and ETA-I adders.
These adders have minimal hardware for evaluation of the
approximate sum and are energy-efficient. Using the optimal
configuration of approximate bits for each adder obtained from
the optimizer, the circuits are implemented and simulated with
105 uniform random inputs to obtain the actual MSE of the
5−60 −50 −40 −30 −20
−60
−50
−40
−30
−20
Actual noise power (dB)
A
n
al
y
ti
ca
l
n
o
is
e
p
o
w
er
(d
B
) Trunc AMA5 LOA
ETA-I MA
(a) Uniform distribution
−60 −50 −40 −30 −20
−60
−50
−40
−30
−20
Actual noise power (dB)
A
n
al
y
ti
ca
l
n
o
is
e
p
o
w
er
(d
B
) Trunc AMA5 LOA
ETA-I MA
(b) Parameterized error model
Fig. 5: Comparison of simulated noise power and analytically computed noise
power for FIR filter when input static probability is (a) assumed to be 0.5 (b)
computed as in Section III-B for various approximate adders.
−60 −55 −50 −45 −40 −35 −30
−60
−50
−40
−30
Actual noise power (dB)
A
n
al
y
ti
ca
l
n
o
is
e
p
o
w
er
(d
B
) Trunc AMA5 LOA
ETA-I MA
(a) Uniform distribution
−60 −55 −50 −45 −40 −35 −30
−60
−50
−40
−30
Actual noise power (dB)
A
n
al
y
ti
ca
l
n
o
is
e
p
o
w
er
(d
B
)
Trunc AMA5 LOA
ETA-I MA
(b) Parameterized error model
Fig. 6: Comparison of simulated noise power and analytically computed noise
power for IIR filter when input static probability is (a) assumed to be 0.5 (b)
computed as in Section III-B for various approximate adders.
circuits. The actual MSE is compared with the analytical value
obtained using the parameterized error model.
Figs. 5 and 6 show the comparison of simulated value of
noise power and analytically computed noise power in dB
for the FIR filter and IIR filter respectively using various
approximate adders for two cases - (a) assuming that the input
static probabilities are 0.5 for all the adders (b) computing
the input static probabilities for adders as described in Sec-
tion III-B. It can be seen that the analytical values obtained by
using the parameterized error model match very well with the
actual values, while those obtained using uniform probabilities
deviate significantly (upto 6 dB and 10 dB error in case of FIR
and IIR filters respectively).
The DCT module [14] is a multiplierless implementation
with adders and subtractors. Fig. 7a and Fig. 7b show the
comparison of simulated value of noise power and analytically
computed noise power using the parameterized error model for
various approximate adders, for two different images namely
Rice image and Lena image respectively. It can be seen that
the analytical values match reasonably well with the actual
values (with a maximum of 2.5 dB and 3 dB error for Rice
and Lena images respectively).
VI. CONCLUSION
We have proposed parameterized error models for approxi-
mate adders using input static probabilities as parameters and
incorporated these error models in an optimization framework.
We have shown that the parameterized error models provide
better noise power prediction than the typical error models that
assume uniform input distribution. The results of FIR and IIR
filters and DCT module show that the use of parameterized
−40 −35 −30 −25 −20
−40
−35
−30
−25
−20
Simulated noise power (dB)
A
n
al
y
ti
ca
l
n
o
is
e
p
o
w
er
(d
B
) Trunc
AMA5
LOA
MA
NPanal = NPsim
(a) Rice image input
−40 −35 −30 −25 −20
−40
−35
−30
−25
−20
Simulated noise power (dB)
A
n
al
y
ti
ca
l
n
o
is
e
p
o
w
er
(d
B
) Trunc
AMA5
LOA
MA
NPanal = NPsim
(b) Lena image input
Fig. 7: Comparison of simulated noise power and analytically computed noise
power for 8× 8 DCT using parameterized error model.
error model in the optimization framework improves the
accurate prediction of the overall MSE.
REFERENCES
[1] M. Shafique, W. Ahmad, R. Hafiz, and J. Henkel, “A low latency generic
accuracy configurable adder,” in Proceedings of the 52Nd Annual Design
Automation Conference, DAC ’15, (New York, NY, USA), ACM, 2015.
[2] H. A. F. Almurib, T. N. Kumar, and F. Lombardi, “Inexact designs
for approximate low power addition by cell replacement,” in Design,
Automation and Test in Europe (DATE), 2016.
[3] Z. Yang, A. Jain, J. Liang, J. Han, and F. Lombardi, “Approximate
xor/xnor-based adders for inexact computing,” 2013 13th IEEE Interna-
tional Conference on Nanotechnology (IEEE-NANO 2013), 2013.
[4] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, “Low-power
digital signal processing using approximate adders,” IEEE Trans. on
Comp.-Aided Design of Integrated Circuits and Systems, vol. 32, 1 2013.
[5] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, “Bio-inspired
Imprecise computational blocks for efficient VLSI implementation of
soft-computing applications,” IEEE Trans. on Circuits and Systems I:
Regular Papers, vol. 57, 4 2010.
[6] N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and Z. H. Kong, “Design of
low-power high-speed truncation-error-tolerant adder and its application
in digital signal processing,” IEEE TVLSI, vol. 18, no. 8, 2010.
[7] D. Celia, V. Vasudevan, and N. Chandrachoodan, “Optimizing power-
accuracy trade-off in approximate adders,” in 2018 Design, Automation
Test in Europe Conference Exhibition (DATE), March 2018.
[8] L. B. Soares, S. Bampi, and E. Costa, “Approximate adder synthesis for
area- and energy-efficient fir filters in cmos vlsi,” in 2015 IEEE 13th
International New Circuits and Systems Conference, June 2015.
[9] F. S. Snigdha, D. Sengupta, J. Hu, and S. S. Sapatnekar, “Optimal design
of jpeg hardware under the approximate computing paradigm,” in 2016
53nd ACM/EDAC/IEEE DAC, June 2016.
[10] Z. Vasicek, V. Mrazek, and L. S. Brno, “Towards low power approximate
dct architecture for hevc standard,” in Design, Automation Test in Europe
Conference Exhibition (DATE), 2017, March 2017.
[11] D. Sengupta, F. S. Snigdha, Jiang Hu, and S. S. Sapatnekar, “Saber:
Selection of approximate bits for the design of error tolerant circuits,”
in 2017 54th ACM/EDAC/IEEE DAC, June 2017.
[12] M. Pashaeifar, M. Kamal, A. Afzali-Kusha, and M. Pedram, “A the-
oretical framework for quality estimation and optimization of dsp
applications using low-power approximate adders,” IEEE Trans. on
Circuits and Systems I: Regular Papers, vol. 66, Jan 2019.
[13] Z. Yang, J. Han, and F. Lombardi, “Transmission gate-based approx-
imate adders for inexact computing,” in Proceedings of the 2015
IEEE/ACM NANOARCH15, July 2015.
[14] S. Bouguezel, M. O. Ahmad, and M. N. S. Swamy, “A fast 8 × 8
transform for image compression,” in 2009 International Conference on
Microelectronics - ICM, Dec 2009.
[15] D. Sengupta, F. S. Snigdha, J. Hu, and S. S. Sapatnekar, “An analytical
approach for error pmf characterization in approximate circuits,” IEEE
Trans. on CAD of Integrated Circuits and Systems, vol. 38, Jan 2019.
[16] D. Celia, V. Vasudevan, and N. Chandrachoodan, “Probabilistic error
modeling for two-part segmented approximate adders,” in 2018 IEEE
International Symposium on Circuits and Systems (ISCAS), May 2018.
[17] A. Sripad and D. Snyder, “A necessary and sufficient condition for
quantization errors to be uniform and white,” IEEE Trans. on Acoustics,
Speech, and Signal Processing, vol. 25, October 1977.
[18] D. Menard, N. Herve, O. Sentieys, and H.-N. Nguyen, “High-level
synthesis under fixed-point accuracy constraint,” Journal of Electrical
and Computer Engineering, vol. 2012, Jan. 2012.
