Cross-Layer Inexact Design for Low-Power Applications by Camus, Vincent et al.
Example of K-Maps of the (a) initial Correct Function (carry of a full adder)
(b) function with a favorable 0 to 1 bit flip (c) function with a favorable 1 to 0 
bit flip (d) function with a non-favorable 0 to 1 bit flip [2]
Cross-Layer Inexact Design for Low-Power Applications
IcySoC RTD 2013
1 Ecole polytechnique fédérale de Lausanne (EPFL), Integrated Circuits Laboratory (ICLAB)
V. Camus1, G. Karakonstantis2, J. Schlachter1, A. Burg2, C. Enz1
2 Ecole polytechnique fédérale de Lausanne (EPFL), Telecommunications Circuits Lab (TCL)
Inexact characterization and co-design framework
Circuit pruning and minimization Algorithmic pruning
Circuit level implementations
Abstract
Achieving miniaturization, low-power and real-time data processing requires always more expensive design constraints and margins. Guard
bands and worst case safety margins coupled with error correction units to ensure perfect calculations and robustness against Process-
Voltage-Temperature (PVT) variations strongly deteriorate chip performance, power and cost efficiency.
Approximate and error tolerant circuits are a radical new approach to trade calculation accuracy for better speed, power, area and yield.
The IcySoC project platform revisits low-power and low-voltage VLSI design through a cross-layer combined inexact design framework.
Algorithm
Circuit
Device
Algorithmic pruning
Voltage-frequency scaling
The main idea in working at circuit level is to conceive
fundamentally different circuit schemes to better control the
calculation error characteristics and allow more efficient pruning
and minimization of these circuits.
Pruning consists in deleting circuit’s gates or cells having the lowest
Significance-Activity Product (SAP).
Ranking Scheme for a 16 bits Kogge-Stone adder [1] Gains up to 8X in energy-delay-area product [1]
Minimization consists in introducing bit flips in Karnaugh maps of
logic functions in order to reduce circuit complexity.
Gains up to 8X in energy-delay-area product [2]
[1] A. Lingamneni et al., Algorithmic methodologies for ultra-efficient inexact architectures for sustaining technology scaling, CF, 2012.
[2] A. Lingamneni et al., Parsimonious circuits for error-tolerant applications through probabilistic logic minimization, PATMOS, 2011.
[3] M. Weber et al., Balancing Adder for error tolerant applications, ISCAS, 2013.
[4] N. Zhu et al., An enhanced low-power high-speed Adder For Error-Tolerant application, ISIC, 2009.
16-bit Ripple Carry Adder
Approximate speculative adder ETBA: carry speculations at
different levels prevent long propagations of carry propagates [3]
Quality and Energy-Delay product of JPEG decompression using 
different speculative adders (ETBA and ETAIIM [4]) versus Kogge-Stone
Adder (KSA) and reduced bandwidth Ripple Carry Adder (RBD) [3] 
Inexact design techniques used independently at any
level of the system has been proved to lead to
significant gains in energy efficiency. However, fully
exploiting the potential of these techniques requires
cross-level work to adapt to the desired application.
By joining forces in a bottom-up approach, two labs at
EPFL collaborate in this project, combining their
respective inexact design methodologies to a holistic
framework for low-power design.
TCL
Circuit level implementations
Circuit pruning and minimization ICLAB
ICLAB
TCL
Application
Variable robustness memory TCL
Approximate algorithm
Energy, delay, area
Error characteristics
Optimized application
Inexact
building
&
charac.
platform
ICLAB
&
TCL
Pruning at the algorithm level consists of classifying the significance
of computations and skipping the less-significant ones for adjusting to
dynamically changing hardware capabilities and for saving power.
 Quality-loss does not affect the ability of
detecting any sinus arrhythmia condition [5]
 Many other applications [6, 7]
Design for Graceful Quality Degradation
Guaranteed Correctness Best- Effort  Computing  
- HIGH priority
- Time redundancy 
- Might incur timing penalty
- LOW priority
- Can be pruned to compensate
for any penalty/facilitate VS
Input DataQuality Metric
Significance Characterization
Significant
Computations  
Less-Significant
Computations  
Improve Characteristics
Find a domain in which the input 
signal is approximately-sparse
No
Sufficient?
Yes
Classify computations into groups 
based  on their significance in 
obtaining adequate Quality 
x
F
y = F(x)
Algorithm
Express the signal in such a  
domain 
x
W
y = G(W(x))
G
x’
Minor
Quality loss
Ensure 
Adequate Quality
DWT based FFT  – drop 60% operations 
LFP/HFP  =  0.4652
Conventional FFT (split-radix) 
LFP/HFP  =  0.451
Total LFP = 148.5
Total HFP = 319.2
Total ULFP= 54.1
Total LFP= 151.9
Total HFP= 336.8
Total ULFP=54.9
(Sinus Arrhythmia: dominant 
HFP in 0.15-0.4 Hz )
LF     HF     ULF
 Application to FFT and Spectral Analysis of Heart Rate
[5]  G. Karakonstantis et al., A Quality-Scalable Spectral Analysis System for Energy- Efficient Health Monitoring, DATE, 2014.
[6]  G. Karakonstantis et al., Process-Variation Resilient & Voltage-Scalable DCT Architecture for Robust Low-Power Computing, 
IEEE Trans. on VLSI Systems, 2010.
[7]  G. Karakonstantis et al., On the Exploitation of the Error Resilience of Wireless Systems under Unreliable Silicon, DAC, 2012.
8
-p
o
in
t D
W
T
4
-p
o
in
t D
W
T
4
-p
o
in
t D
W
T
2
-p
o
in
t 
D
W
T
2
-p
o
in
t 
D
W
T
2
-p
o
in
t 
D
W
T
2
-p
o
in
t 
D
W
T
DWT stage 
(HF band pruned) 
Twiddle factors
(small factors pruned)
