Energy-Efficient Digital Design Through Inexact and Approximate Arithmetic Circuits by Camus, Vincent et al.
Energy-Efﬁcient Digital Design through
Inexact and Approximate Arithmetic Circuits
Vincent Camus†, Jeremy Schlachter†, Christian Enz
Integrated Circuits Laboratory (ICLAB)
Ecole Polytechnique Fe´de´rale de Lausanne (EPFL), Switzerland
vincent.camus@epﬂ.ch, jeremy.schlachter@epﬂ.ch
†these authors contributed equally to this work
Abstract—Inexact and approximate circuit design is a promis-
ing approach to improve performance and energy efﬁciency in
technology-scaled and low-power digital systems. Such strategy
is suitable for error tolerant applications involving perceptive or
statistical outputs. This paper reviews two established techniques
applicable to arithmetic units: circuit pruning and carry spec-
ulation. A critical comparative study is carried out considering
several error metrics.
I. INTRODUCTION
Approximate computing has become a major ﬁeld of
research in the sense that it could signiﬁcantly improve energy
efﬁciency and performances of modern digital circuit. It is also
a potential solution to overcome the limitations of technology
scaling and the forecasted end of Moore’s law. To that extent,
many inexact or approximate circuits have been presented in
the literature, and most of them are based on manual design
or tweaks, hardly integrable in a standard digital ﬂow.
This article aims at reviewing two techniques that can
easily be implemented in a standard digital ﬂow: probabilistic
pruning and carry speculation. The remainder of this paper
is organised as follow: section II reminds the state-of-the-art
for the two design techniques, section III re-deﬁnes the error
metrics and provides a functional description of the pruning
and the speculation techniques, and ﬁnally section IV carries
out a comparative study considering several error metrics.
II. STATE-OF-THE-ART
A. Probabilistic Pruning
Probabilistic pruning is a design technique consisting in
removing circuit blocks or elements such as full adder cells,
gate clusters or single gates in order to trade exactness of
computation against power, area and delay savings without any
overhead. The decision of pruning one of these elements is
based on two parameters: the signiﬁcance, which is a structural
parameter, and the activity determined by hardware simulations.
The amount of error is simply proportional to the number
of pruned elements. This technique was ﬁrst introduced in
[1], where full adder cells were removed from various adder
architectures, resulting in gains of up to 7.5X in Energy-Delay-
Area Product (EDAP), for a 10% relative error magnitude.
Later on, this technique has been improved by integrating it
in the standard digital ﬂow using existing tools, and by applying
This work was supported by the NanoTera “IcySoC” project and the Swiss
National Science Foundation grant No 200021-144418.
978-1-4799-8893-8/15/$31.00 c©2015 IEEE
pruning at the gate level [2]. This ﬁner granularity enables an
order of magnitude area and power savings for a 64-bit adder
with 10% relative error magnitude. It has also been shown
that 25% power and area reduction can be achieved for 16-bit
multipliers.
B. Speculative Adders
Speculative adders [3] exploit the fact that carry propagate
sequences in additions are typically short, making it possible
to estimate intermediate carries using a limited number of
previous stages. They split the binary addition into several
subpaths executed concurrently for higher execution speed and
energy efﬁciency, but at the risk of generating occasionally
incorrect results. Thus, the critical path of the adder can be
divided in two or more shorter paths, relaxing constraints over
the entire design and improving the speed, area and power
beyond the theoretical bounds of exact adders.
A number of speculative adders have been proposed in
literature with different approaches in order to reduce the error
frequency or magnitude. The ETAII adder [4] consists of regular
sub-adder blocks with input carries speculated from Carry
Look Ahead (CLA) blocks of the same length. In the ETAIIM
version, several of the most signiﬁcant CLA blocks are chained
to increase accuracy. The ETBA adder [5], direct descendent
of the ETAIIM, adds variable speculation signs and sub-adder
sum balancing multiplexed blocks to mitigate relative errors.
The ETAIV [6] and CSA [7] adders have enhanced accuracy
by considering two prior carry speculation blocks instead of
one, coupled respectively with a carry select or a carry skip
technique, with the latter also using sum balancing over several
sub-adder blocks. On the other side, ISA [8] and CSC [9] adders
have recently improved circuit performance and efﬁciency
by introducing off-critical path error reduction techniques.
The ISA adder concept [8] has also proposed an optimal
and generalized approach of speculative compensated adders,
encompassing aforementioned adders, and has introduced a
simple methodology to allow designers to generate efﬁcient
architectures from a delay-accuracy speciﬁcation.
III. INEXACT ARITHMETIC CIRCUIT DESIGN
A. Gate-Level Pruning
Gate-Level Pruning is a CAD technique to automatically
generate inexact circuits starting from a conventional design
by adding only one small step in the digital design ﬂow. The
CAD framework is presented in Fig. 1.
Any exact circuit can be represented by a directed acyclic
graph as depicted in Fig. 2, where the nodes are components
Synthesis Logic?simulation
Sorted?list
of?adders
SAP?
calculationWire?pruning
Initial?exact?
design
Gate?level?
netlist
Switching?activityModified??netlist
Accuracy
estimates
Fig. 1. CAD framework for Gate-Level Pruning.
In 6 In 5 In 4 In 3 In 2 In 1
AND OR NOT NOT XOR
AND OR OR AND
Out 4 Out 3 Out 2 Out 1
1248
12
12244
88
11264
4
8
8
8 20 4 8 4 1
20212223
Fig. 2. Directed acyclic graph representation of a gate level netlist and the
associated signiﬁcance attribution.
such as gates, and whose edges are wires. The decision to
prune a node is based on two criteria: the signiﬁcance, which
is a structural parameter, and the activity or toggle count. The
nodes with the lowest Signiﬁcance-Activity Product (SAP) are
pruned ﬁrst. By doing so, the error magnitude grows with the
amount of pruning. Alternatively, depending on the application’s
requirement, the designer may choose to prune nodes according
to the activity only, in order to minimize the error rate.
The activity of each wire is extracted from the SAIF ﬁle
(Switching Activity Interchange Format) obtained through gate-
level hardware simulations. This ﬁle contains the toggle count
of each wire, as well as the time spent at the logic levels
0 and 1 respectively. In order to get an accurate activity
estimation, the system should be simulated with an input
stimulus representative of the real operation of the circuit.
The more the simulation is realistic, the more the toggle count
is accurate and leads to an efﬁcient pruning.
The signiﬁcance of each primary output is set by the
designer depending on the application’s requirement. In this
paper, pruning is applied on several arithmetic circuits where
each primary output is weighted by a power of two. It is
therefore worth applying a weighted signiﬁcance attribution,
where each output bit position has a signiﬁcance two times
higher than the previous when moving from the LSB to the
MSB. Reverse topological graph traversal is then performed to
compute each nodes’ signiﬁcances as follows:
σi =
∑
σdesc(i) (1)
where σi is the signiﬁcance of the node i and σdesc(i) is the
signiﬁcance of the direct descendants of node i. An example
of weighted signiﬁcance attribution is shown in Fig. 2.
Once the signiﬁcance and activity is determined, the nodes,
i.e. gates and their corresponding wires, are ranked according
to their Signiﬁcance Activity Product (SAP). The ones with
the lowest SAP are disconnected from the verilog netlist, and
a re-synthesis is performed in order to remove or replace the
unconnected gates.
SPEC
ADD ADD ADD
COMP COMP
An-x-1-An-2x
Bn-x-1-Bn-2x
Sn-x-1-Sn-2x
An-2x-1-An-3x
Bn-2x-1-Bn-3x
Sn-2x-1-Sn-3x
An-1-An-x
Bn-1-Bn-x
Sn-1-Sn-x
ADD ADD
COMP
A2x-1-Ax
B2x-1-Bx
S2x-1-Sx
Ax-1-A0
Bx-1-B0
Sx-1-S0
COMP
SPEC SPEC SPEC
Fig. 3. General block diagram of an Inexact Speculative Adder (ISA).
Each speculative segment consists of a carry speculator (SPEC), a regular
adder (ADD) and an error compensation block (COMP).
B. Inexact Speculative Adders
1) General Concept
The general block diagram of an Inexact Speculative Adder
(ISA) adder is depicted in Fig. 3. An ISA splits the carry
propagation chain in multiple paths executed concurrently. Each
path consists of a carry speculator block (SPEC), a sub-adder
block (ADD) and an error compensation block (COMP). For
each of these SPEC-ADD-COMP paths, the different blocks
have the following functions:
• SPEC – The speculator block generates a partial carry
signal from a limited number of operand bits in a carry
look-ahead approach and sourced by either a static or a
dynamic input. When a propagate chain covers the full
SPEC block, the exact carry cannot be speculated from
the partial product and the output carry is guessed at the
input value. As long propagate sequences are uncommon
in uniform input distribution [4], the probability of fault
decreases when increasing the size of this block.
• ADD – The sub-adder block calculates local sums from
the speculated carry of the SPEC block.
• COMP – Without compensation, an internal overﬂow
caused by an inconsistent carry could lead to a massive er-
ror. Therefore, the COMP block detects those speculation
faults by comparing the carry generated from the SPEC
with the carry-out coming from the prior ADD block.
It then compensates faulty sums either by attempting to
correct a few bits of the local sum or by reducing relative
error over a few bits of the preceding sum.
The ﬁrst speculative path, operating on the LSBs of the
adder, does not have SPEC nor COMP blocks since it uses
directly the adder carry-in. The achieved addition arithmetic is
illustrated in Fig. 4.
??? ?? ??? ??? ?? ???
??? ?? ??? ??? ?? ????
BalancingCorrecting
2?bit?carry?chains
speculated?at?0
Compensated?sum
??? ?? ???
??? ?? ???
??? ?? ???
??? ?? ???
P G P PPP
Block?sums?with
limited?carry?chain
Operands
? ? ?
?
Fig. 4. Example of ISA addition arithmetic with 2-bit speculation, 1-bit
correction and 1-bit error reduction. Faults only occur in the two right-hand
paths. The 1st LSB of the central path can be corrected. The 1st LSB of the
right path cannot be corrected, so the 1st MSB of the preceding sum is ﬂipped.
2) Error Compensation
The COMP’s error correction technique, introduced in [8],
consists in incrementing or decrementing only a small group
of LSBs of the local sum in order to compensate the erroneous
speculated carry. In most cases, this can fully resolve carry
errors. In the case where those stages are all in propagate modes,
correction is impossible as it would lead to an internal overﬂow.
In that case, the uncorrected bits, having a higher signiﬁcance
than the error bit, ensure a low relative error of the result. Using
the COMP’s error correction technique thus reduces both error
rate and relative error. The correction hardware is executed
concurrently to the local addition, thus this technique impacts
minimally the critical path of the adder.
The COMP’s error reduction technique consists in balancing
a group of MSBs of the preceding sub-adders in opposite
direction than the error. This technique, similarly as in [5],
has been intensively employed in literature. But to avoid high
relative errors and better control the worst-case error (REMAX),
it relies on large SPEC block directly lying in the critical path
of the adder.
3) Design Strategy
The ISA offers a general topology of speculative compen-
sated addition inclusive of the state-of-the-art and that allows
an optimal balance between circuit and accuracy speciﬁcations.
A design methodology through a delay-accuracy approach
is presented in Fig. 5. The adequate delay tradeoff is mainly
obtained by sizing SPEC and ADD blocks, principal slack
elements of the ISA. Then, the COMP’s error correction and
error reduction techniques enable to tune and ﬁt the accuracy
requirements at the cost of hardware overhead and with a
minimum delay penalty for multiplexing the result on a few
compensated bits.
Minimum?
accuracy
COMP
estimations
Sorted?list
of?adders
ADD?&?SPEC
estimations
Delay?
specification
RTL?/?logic
synthesis
RTL?error
simulation
Fig. 5. CAD framework for ISA design.
Adders in literature describe particular cases of implemen-
tation excessively considering either performances or errors. In
the ISA architecture, the speculation overhead can be traded for
longer sub-adders while ﬁtting the same delay requirement. It
is then possible to use fewer speculative paths and limit the in-
critical path speculation-compensation overhead to a few bits of
each path while ﬁtting the accuracy requirement. This approach
allows notable improvements in circuit performances [8].
IV. RESULTS AND COMPARATIVE STUDY
A. Accuracy Metrics
In order to quantify the error produced by an inexact circuit,
one has to choose one or more error metric depending on the
application’s requirement. The metrics used to characterize
approximate adders are based on the relative error (RE) of a
sum, deﬁned as:
RE =
∣∣∣∣
Sapprox − Scorrect
Scorrect
∣∣∣∣ (2)
where Sapprox and Scorrect are respectively the approximate and
correct sums of an addition. In [1], three interesting metrics
are deﬁned:
• Error Rate – The error rate corresponds the ratio of erro-
neous computations over the entire set of computations
and is deﬁned as follow:
Error Rate =
Number of erroneous computations
Total number of computations
(3)
• Relative Error RMS (RERMS) – The root mean square
(RMS) of RE is a good estimator of accuracy and is
interesting for many applications, particularly in image
and video processing. It is deﬁned as:
RERMS =
√√√√ 1
N
N∑
n=1
REn2 (4)
• Maximum Relative Error (REMAX) – REMAX represents the
largest relative error of an adder and deﬁnes its worst case
accuracy. It is here obtained over a set of computations.
B. Results
In order to perform a comparative study, both techniques
have been applied to 32-bit adders synthesized in a 65 nm UMC
technology library and all the designs have been simulated
with a set of ﬁve million uniformly distributed random inputs.
Fig. 6 show the error characteristics and the normalized costs
in terms of energy and Power-Delay-Area Product (PDAP) for
selected pruned and speculative adders respectively synthesized
at 3.3GHz. As timing constraint has been proved to strongly
impact the gains with those techniques, the same work has
been reproduced for adders running at 1.33GHz in Fig. 7.
Only speculative adders with regular structures have been
synthesized (i.e. 2x16, 4x8, 8x4 and 16x2 bit concurrent paths)
with diverse error characteristics. For this reason, the displayed
characteristics present steps corresponding changes of structure
sizes. Generally, ISA adders built out of small sub-adders show
high errors and high savings (on the left of the ﬁgures), whereas
ISA with large sub-adders are preferred for low errors and lower
savings (on the right of ﬁgures).
Fig. 6 and 7 clearly show that the two techniques have a
different impact on the output quality. The error rate of pruned
adders rapidly reaches 100%, the reason being that in the
ﬁrst steps of the pruning process, some of the least signiﬁcant
outputs are removed. On the other hand, in speculative adders,
a small speculation-correction overhead leads to a decrease of
the error rate despite lower circuit efﬁciency.
For both techniques and frequencies, the RERMS and the
REMAX grow with an exponentially trend versus circuit savings.
The ISA adders on the left of ﬁgures have a low REMAX, but
this one does not follow the same exponential trend as it is
expensive to control. Thus, a gap appears between REMAX and
RERMS when the constraints on the circuit become too high (at
1% for 3.3GHz and 10−3 % at 1.6GHz).
Timing constraint has a signiﬁcant inﬂuence on the result
obtained with the two techniques. Fig. 6 shows that at high
frequency of 3.3GHz, and for a relative PDAP cost of 0.42,
the REMAX and the RERMS of the pruned adder are equal to 4%
and 0.008% respectively. In comparison, the speculative adder
having a similar PDAP has a REMAX of 10-1 % and a RERMS of
 0
 0.2
 0.4
 0.6
 0.8
 1
 1.2
 
 
 
 
 
 
10-8
10-6
10-4
10-2
100
102
104
N
or
m
al
iz
ed
 c
os
ts
R
at
e 
an
d 
RE
 (%
)
Err. Rate (%)
REMAX (%)RERMS (%)
PDAP
Energy
(a) Pruned adders at 3.3GHz
 0
 0.2
 0.4
 0.6
 0.8
 1
 1.2
 
 
 
 
 
 
10-8
10-6
10-4
10-2
100
102
104
N
or
m
al
iz
ed
 c
os
ts
R
at
e 
an
d 
RE
 (%
)
Err. Rate (%)
RERMS (%)REMAX (%)
PDAP
Energy
(b) Speculative adders at 3.3GHz
Fig. 6. Error characteristics and normalized cost of 32-bit pruned (a) and
speculative (b) adders synthesized at 3.3GHz
10-4 %. This could lead to the conclusion that the speculation
technique can achieve similar energy savings than the pruning
technique, at a much higher accuracy level. However, Fig. 7
actually depicts the opposite trend when using a slightly lower
frequency of 1.6GHz. Hence, a more extensive comparative
study might show that the two depicted design techniques might
produce uncorrelated errors, and therefore could be combined
to get additive savings.
V. CONCLUSION
This paper reviewed and compared two well established
techniques for generating approximate hardware: carry specu-
lation and gate-level pruning. It has been shown that both can
achieve up to 85% PDAP reduction for a RMS relative error
of 1%. However the two techniques clearly have a different
impact on the accuracy of the generated adders. Additionally,
timing constraint signiﬁcantly impacts the efﬁciency of such
techniques: in the conducted experiments, speculative adder
present fewer error than pruning for equivalent PDAP, at a
3.3GHz frequency. On the other hand, the gate-level pruning
is more efﬁcient than carry speculation at 1.3GHz. A more
extensive study would certainly prove that the two techniques
produce uncorrelated errors, and thus could be combined to
further reduce power consumption, silicon area and critical
path delay.
 0
 0.2
 0.4
 0.6
 0.8
 1
 1.2
 
 
 
 
 
 
10-8
10-6
10-4
10-2
100
102
104
N
or
m
al
iz
ed
 c
os
ts
R
at
e 
an
d 
RE
 (%
)
Err. Rate (%)
REMAX (%)RERMS (%)
PDAP
Energy
(a) Pruned adders at 1.6GHz
 0
 0.2
 0.4
 0.6
 0.8
 1
 1.2
 
 
 
 
 
 
10-8
10-6
10-4
10-2
100
102
104
N
or
m
al
iz
ed
 c
os
ts
R
at
e 
an
d 
RE
 (%
)
Err. Rate (%)
REMAX (%)RERMS (%)
PDAP
Energy
(b) Speculative adders at 1.6GHz
Fig. 7. Error characteristics and normalized cost of 32-bit pruned (a) and
speculative (b) adders synthesized at 1.6GHz
REFERENCES
[1] A. Lingamneni, C. Enz, J. L. Nagel, K. Palem, and C. Piguet, “Energy
parsimonious circuit design through probabilistic pruning,” in Design,
Automation Test in Europe Conference Exhibition (DATE), 2011, March
2011, pp. 1–6.
[2] J. Schlachter, V. Camus, C. Enz, and K. Palem, “Automatic Generation of
Inexact Digital Circuits by Gate-level Pruning,” in Circuits and Systems
(ISCAS), 2015 IEEE International Symposium on, May 2015.
[3] T. Liu and S.-L. Lu, “Performance Improvement with Circuit-level
Speculation,” in Microarchitecture, 2000. MICRO-33. Proceedings. 33rd
Annual IEEE/ACM International Symposium on, 2000, pp. 348–355.
[4] N. Zhu, W.-L. Goh, and K.-S. Yeo, “An Enhanced Low-power High-speed
Adder For Error-tolerant Application,” in Integrated Circuits (ISIC), Proc.
of the 2009 12th International Symposium on, Dec 2009, pp. 69–72.
[5] M. Weber, M. Putic, H. Zhang, J. Lach, and J. Huang, “Balancing Adder
for Error Tolerant Applications,” in Circuits and Systems (ISCAS), 2013
IEEE International Symposium on, May 2013, pp. 3038–3041.
[6] N. Zhu, W.-L. Goh, G. Wang, and K.-S. Yeo, “Enhanced Low-power High-
speed Adder for Error-tolerant Application,” in SoC Design Conference
(ISOCC), 2010 International, Nov 2010, pp. 323–327.
[7] Y. Kim, Y. Zhang, and P. Li, “An Energy Efﬁcient Approximate Adder
with Carry Skip for Error Resilient Neuromorphic VLSI Systems,”
in Computer-Aided Design (ICCAD), 2013 IEEE/ACM International
Conference on, Nov 2013, pp. 130–137.
[8] V. Camus, J. Schlachter, and C. Enz, “Energy-Efﬁcient Inexact Speculative
Adder with High Performance and Accuracy Control,” in Circuits and
Systems (ISCAS), 2015 IEEE International Symposium on, May 2015.
[9] J. Hu and W. Qian, “A New Approximate Adder with Low Relative Error
and Correct Sign Calculation,” in Design, Automation and Test in Europe
(DATE), 2015 IEEE Conference and Exhibition on, March 2015.
