Bio-inspired error-tolerant and energy-efficient signal processing by Zhang, Gong
c 2012 by Gong Zhang.
BIO-INSPIRED ERROR-TOLERANT AND ENERGY-EFFICIENT
SIGNAL PROCESSING
BY
GONG ZHANG
THESIS
Submitted in partial fulllment of the requirements
for the degree of Master of Science in Electrical and Computer Engineering
in the Graduate College of the
University of Illinois at Urbana-Champaign, 2012
Urbana, Illinois
Master's Committee:
Professor Naresh R. Shanbhag, Chair
Professor Douglas L. Jones
Abstract
In many applications at the sensory edge, such as security and environmen-
tal sensing, reliable sensor nodes must operate for extended time periods
on battery supplies. To meet this constraint, energy-ecient systems have
been developed through dierent technologies. The primary and the most
eective approach has been technology scaling. Another emerging technique
is to operate circuits in the subthreshold region as some of the applications
such as environmental sensing do not require high throughput. However,
both techniques lead to large process, voltage and temperature variations
and therefore jeopardize system reliability.
In order to achieve both energy savings and system reliability, we take
inspirations from biological methods, such as population-coding, and apply
these methods to a canonical problem of non-coherent (unknown phase) fre-
quency estimation. Energy eciency is achieved using low cost, overlapping
band-pass lters rather than conventional non-overlapping band-pass lters.
Energy savings are also achieved by operating the hardware at a voltage
lower than the nominal voltage (voltage overscaling), which leads to hardware
timing errors. In the presence of these hardware errors, signal statistics are
generated from overlapping band-pass lters with frequency redundancy. Ro-
bust techniques, such as median estimation and algorithmic noise-tolerance,
are applied to lter outputs to achieve error-tolerance. Energy/performance
trade-os are further explored by altering the supply voltage. Simulation re-
sults show that the root-mean-squared-error of the bio-inspired method can
be reduced by an order of magnitude relative to that of the conventional ar-
chitecture while achieving an energy consumption reduction of 78% relative
to the conventional method which is under hardware-error-free operations at
nominal supply voltage.
ii
To my family, for their love and support.
iii
Table of Contents
Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . 1
1.1 Future Computer Workloads and Perceptual Sensor Network . 1
1.2 Energy-Ecient Digital Signal Processing . . . . . . . . . . . 1
1.3 Error-Tolerant Digital Signal Processing . . . . . . . . . . . . 2
1.4 Problem Denition . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Timing Errors and Voltage Overscaling . . . . . . . . . . . . . 6
1.6 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . 7
Chapter 2 Bio-inspired Signal Processing and System De-
scription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1 The Redundant Sensor Network . . . . . . . . . . . . . . . . . 8
2.2 The Low-Cost Sensor . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Signal Statistics in Time Domain under Voltage Overscaling . 10
2.4 The Nonoverlapping Estimator . . . . . . . . . . . . . . . . . 11
2.5 The Bio-inspired Estimator . . . . . . . . . . . . . . . . . . . 13
Chapter 3 Simulation Setup and Results Comparison . . . 16
3.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Performance Comparison . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 Performance comparison . . . . . . . . . . . . . . . . . 17
3.2.2 Energy consumption comparison . . . . . . . . . . . . 18
Chapter 4 Conclusion . . . . . . . . . . . . . . . . . . . . . . 21
4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
iv
Chapter 1
Introduction
1.1 Future Computer Workloads and
Perceptual Sensor Network
The past few decades have witnessed dramatic change of computational work-
loads, which used to be CPU centric and could be based on high-end servers
and personal computers. Today's computational workload features mobile
devices, which are the fastest growing sector in the consumer electronics in-
dustry in the past ten years. Future workloads can be roughly classied
into four categories: high performance computing tasks, complex distributed
systems, personalized services and surrounding perceptual processing sensor
networks. The last category is evolving towards more advanced user inter-
faces [1].
Such emerging applications impose great challenges because there are new
design metrics needed. For example, sensor networks running on batter-
ies need to be energy ecient while low power consumption may be one of
the traditional design constraints. Furthermore, sensor networks have more
stringent requirements on traditional design metrics such as reliability, as
these networks often operate in a highly dynamic and noisy environment.
1.2 Energy-Ecient Digital Signal
Processing
Computational platforms on the sensory edge comprise computational cores
and the input/output components such as sensors and analog-to-digital con-
verters. The computational core is usually implemented using digital circuits
1
and consumes energy as given by [2]
E=CV 2dd; (1.1)
where C is the average switching capacitance and Vdd is the supply volt-
age. This indicates that energy consumption scales down signicantly with
technology scaling as both C and Vdd scale down with transistor size until
the 130 nm technology node. Beyond 130 nm technology node, the supply
voltage is kept at 1.2 V to keep the leakage current down. However, new
device technologies such as FinFET [3] could further reduce the supply volt-
age. Nevertheless, transistor feature size scaling has been the major driving
force to reduce energy consumption by digital circuit operation. However,
transistor feature size scaling magnies variations caused by manufacturing
process, supply voltage uctuations, temperature hot spots on the chip, and
circuit aging eects [4].
Another recent trend in digital circuit design that achieves low energy
operation at the cost of sacricing throughput is to operate the circuit in
the sub-threshold regime where the gate-to-source voltage (Vgs) is below
the threshold voltage. As the current scales exponentially with the Vgs in
this regime, the throughput of the system is greatly reduced [5]. Hence,
sub-threshold design is suitable for certain applications such as bio-medical
applications in which the sampling rate is usually in the kilohertz range. Sim-
ilarly to feature size scaling, sub-threshold design achieves energy eciency
while introducing more susceptibility to process, voltage and temperature
(PVT) variations [6].
In summary, technology scaling and subthreshold circuit design can reduce
circuit energy consumption eectively. However, these trends make digital
circuits exhibit statistical rather deterministic behavior, which leads to the
challenge of error-resilient or error-tolerant digital circuit design.
1.3 Error-Tolerant Digital Signal Processing
In general, redundancy can be used to detect and/or correct hardware errors,
including errors caused by timing violations in digital circuits. Redundancy
can be achieved by either spatial, temporal or spatio-temporal redundancy.
Spatial redundancy in hardware is achieved by adding hardware replicas.
2
Figure 1.1: N-modular redundancy (NMR).
NMR consists of N replicas of the original system and a voter, as depicted
in Figure 1.1. The most commonly used NMR is triple modular redundancy
(TMR), having three replicas and a majority voter. If a single hardware
error occurs in any of the three replicas, the other two units can detect and
correct the error by majority voting. However, the implementation requires
the voter to be error-free [7].
The check-point technique (temporal redundancy) uses re-computation or
redundancy in time and divides the datapath into stages. At the end of
each stage the state of the computation is stored so that computations can
be rolled back and recomputed to guarantee a consistent outcome [8]. The
check-point technique is an eective way to correct transient errors.
A recent work (RAZOR) can be viewed as an example of spatio-temporal
redundancy. It aims to correct timing errors caused by PVT variations. Each
pipeline stage incorporates two ip-ops in parallel. One has the normal clock
while the other one (RAZOR ip-op) has a clock with a delayed edge. The
clock frequency and the supply voltage are set to guarantee that the RAZOR
ip-op meets the setup and hold time constraints in the worst case. At each
check-point, the two latched data outputs are compared against each other.
Inconsistent results would cause the computation to fall back to the previ-
ous check-point and run again while scaling the supply voltage to meet the
3
Figure 1.2: Stochastic sensor network on a chip (SSNOC) [10],[11].
timing constraints [9].
Redundancy can achieve error-tolerance at the expense of energy consump-
tion and therefore counters the energy benets due to transistor feature size
scaling and subthreshold circuit design.
In some applications, it is possible to introduce redundancy into the system
without signicantly increasing the energy consumption overhead. As shown
in Figure 1.2 this technique is applied to the PN-code acquisition problem.
The original main architecture is decomposed into sub-systems (sensors) with
similar output statistics. Results from individual sensors are fed into the fu-
sion block, which is a median lter, to achieve error-tolerance. Redundancy
is achieved with small energy consumption overhead associated with the nal
fusion block [10][11]. However, this technique is application specic.
4
In this work, bio-inspired signal processing concepts such as population-
coding are explored to develop a robust and energy-ecient bio-inspired com-
putational system.
1.4 Problem Denition
Future computing applications on the sensory edge will be characterized by
very dierent requirements both for energy consumption and noise tolerance,
with an emphasis on tasks such as sensory data processing, mining and fusion,
detection and recognition, and scene and situation analysis. Neither conven-
tional computing systems nor signal-processing algorithms perform well for
these anticipated future workloads. In contrast, biological systems perform
extraordinarily well in such situations. Biological systems display great ro-
bustness to variation and uncertainty, remarkable abilities to fuse data from
dierent senses, and incredibly low energy consumption in performing these
tasks.
To demonstrate bio-inspired signal processing concepts, an audio-frequency-
band application is chosen. This application reects many characteristics of
sensory-edge applications. It is expected that the system design principles ob-
tained for this application apply equally to many other types of sensory data
such as acoustically steered cameras, multi-modal automated light switches,
acoustic omnipresence with anyone else in the same or a similarly equipped
space, and interactive toys and devices.
The problem chosen in this paper is a single-tone sinusoid frequency es-
timation in the audio frequency range (2  14 kHz). When the signal to
be estimated only contains additive white Gaussian noise, the optimal esti-
mator in term of mean-square-error (MSE) is a non-overlapping lter bank,
in which the frequency estimation is given by the lter with the maximum-
energy response [12].
For the problem formulated above, a certain number of samples in the
time domain need to be accumulated to record the lter energy response in
the time domain. In this work, 64 samples are recorded as shown in Figure
1.3. The solid line plots the lter response in time domain with an input
frequency within the passband of this lter. When the input signal matches
the lter, the signal-to-noise ratio is maximized and the output is considered
5
10 20 30 40 50 60
−1
−0.5
0
0.5
1
Samples
N
or
m
al
iz
ed
 a
m
pl
itu
de
 
 
signal not present
signal present
Figure 1.3: Narrowband bandpass lter response with and without the
signal.
as the energy response. At the next time step, as the input signal has shifted
phase and does not match the lter, the output has smaller amplitude. The
dashed line shows that the lter output has a much smaller amplitude when
the input frequency is outside the passband of the lter.
1.5 Timing Errors and Voltage Overscaling
This work focuses on timing errors caused by PVT variation. Timing errors
are introduced into the system by deliberately lowering the supply voltage to
reduce the system energy consumption. Doing so increases the critical path
delay. For a given clock frequency this could cause set-up time violations
and hence timing errors. Voltage overscaling is applied to the combinational
logic part of the digital circuits while the registers are operated under the
nominal supply voltage [6].
6
1.6 Thesis Organization
The remainder of the thesis is organized as follows: Chapter 2 presents bio-
inspired signal processing concepts and an algorithmic noise-tolerance (ANT)
lter based on signal statistics in the time domain, which are applied to the
canonical problem introduced in the introduction. Furthermore, Chapter
2 formalizes the algorithms explored and the system architecture. Chap-
ter 3 presents the simulation methodology and compares the performance in
terms of root-mean-squared-error (RMSE) and energy consumption of dier-
ent methods. Chapter 4 gives the conclusion and potential further research
directions.
7
Chapter 2
Bio-inspired Signal Processing
and System Description
After the specic problem of estimating a single-tone sinusoid frequency has
been introduced in the last chapter, this chapter elaborates the bio-inspired
signal processing concepts such as population-coding in the context of this
canonical problem.
2.1 The Redundant Sensor Network
Output contains signal + noise
Output contains noise
a
B
fil
te
r 
o
u
tp
u
ts
Output contains signal + 
hardware error
f
input frequency
fil
te
r 
o
u
tp
u
ts
Figure 2.1: Conventional nonoverlapping lter approach. The arrow with
`a' indicates the lter output with hardware errors. The arrow with `B'
indicates the lter output with the signal in the lter's passband.
In the presence of hardware errors caused by voltage overscaling mentioned
in Chapter 1.5, the conventional estimator is not optimal in terms of MSE
anymore. Illustrated in Figure 2.1, a hardware error can cause a false peak,
8
which leads to a discrepancy between the estimated and the correct value of
the input frequency.
Demonstrated ubiquitously in biological systems, the population activity
of groups of neurons provides more accurate information than individual
ones. This mechanism has particularly been observed in the control of eye
and arm movements [13]. Based on these observations, a population-coding
concept can be proposed and has two key features: 1) Correlated outputs of
neighboring processing units are combined/fused to provide noise tolerance.
2) Low-precision and low-cost computations.
Output contains signal + noise
Output contains noise
a
B
fil
te
r 
o
u
tp
u
ts
Output contains signal + 
hardware error
f
input frequency
fil
te
r 
o
u
tp
u
ts
Figure 2.2: Overlapping lter with hardware redundancy. The arrow with
`a' indicates the lter output with hardware errors. The arrow with `B'
indicates a group of correlated lter outputs with the signal in the
passband.
Inspired by this population-coding concept, a more error-tolerant solution
is proposed for the problem imposed by the conventional non-overlapping
lter approach. As shown in Figure 2.2, overlapping lters with wider band-
width are used in combination with robust techniques (e.g., the median)
applied to bundles of neighboring lters to achieve higher error tolerance.
Instead of searching for the maximum of the lter outputs directly, the me-
dian value of the neighboring lter outputs is used to search the maximum.
For instance, an isolated lter output with hardware errors (denoted by the
9
arrow with `a') with two adjacent lters containing noise only will be ig-
nored when the median estimator is applied to these three neighboring lter
outputs.
2.2 The Low-Cost Sensor
Another benet of using overlapping lters is that fewer taps are required
to build FIR lters with wider bandwidth. In this work, coecients for
lters are obtained by using windowed linear-phase FIR digital lter design
function (FIR1). MATLAB simulations show that 16 instead of 64 taps are
needed to implement a lter with 1 kHz bandwidth instead of 0.25 kHz in
order to keep the 20 dB reduction from mainlobe to sidelobe. This indicates
signicant hardware cost reduction and energy consumption savings, which
can be related to the second feature of population-coding.
2.3 Signal Statistics in Time Domain under
Voltage Overscaling
Besides frequency redundancy mentioned early in this chapter, signal statis-
tics in the time domain under voltage overscaling can be also utilized to lter
out the hardware timing errors caused by voltage overscaling [14]. As shown
in the top panel of Figure 2.3, when the input frequency is not in the lter
passband, with voltage overscaling the lter output in the time domain oc-
casionally has errors, resulting in isolated abrupt signal amplitude change at
certain time periods. In contrast, when the input frequency is in the lter
passband, the lter output amplitude change over one time step could also
be large. But because of the oscillating behavior of the lter output when
the input frequency is in the lter passband as shown in the bottom panel of
Figure 2.3, a lter output sample with similar amplitude might be found a
few time steps earlier. Based on this property, a ANT lter can be developed
to lter out hardware errors. The details are given later in this chapter.
10
10 20 30 40 50 60
−1
−0.5
0
0.5
1
Samples
N
or
m
al
iz
ed
 a
m
pl
itu
de
10 20 30 40 50 60
−1
−0.5
0
0.5
1
Samples
N
or
m
al
iz
ed
 a
m
pl
itu
de
Figure 2.3: Filter outputs in the time domain with hardware errors.
2.4 The Nonoverlapping Estimator
As described earlier, the nonoverlapping estimator is only optimal in terms
of MSE in the presence of additive white Gaussian noise and absence of
hardware errors. Using the Hamming window design method, 64-tap FIR
lters with a 0.25 kilohertz bandwidth are constructed. The lter with the
maximum energy response indicates the value of the input signal frequency.
The block diagram of the conventional estimator is shown in Figure 2.4.
The major components are the FIR lter banks (annotated by Fi), the en-
ergy estimator (annotated by Ei) and the frequency estimator (annotated by
MAX). The individual FIR lter is implemented by the direct-form architec-
ture with 8-bit inputs and 8-bit lter coecients. The adders in the MAC
have 22-bit outputs to avoid potential overow. However, the nal outputs
to the next stage energy estimators only have 8-bit precision (the high 8 bits
of the FIR lter output) as this is sucient for the next stage computation.
The energy estimator consists of a series of shift registers and records 64 sam-
ples in the time domain for a particular lter output. Then the maximum
response within these samples is considered as the energy response.
11
Figure 2.4: Conventional estimator overview.
Figure 2.5 shows the block diagram of the nal frequency estimator based
on the lter energy responses from the last stage. The elementary block in
this diagram is a block with four sets of inputs such as the lter energy re-
sponse and the index of one particular lter, which indicates the passband
range of the lter. Its outputs are the max lter energy response of the four
inputs and the corresponding lter index. Using this block, the estimator
is implemented in a three-stage hierarchy [15]. The rst stage has 16 basic
blocks (annotated by M-A...M-R). Each takes four energy responses from
the energy detector and passes the largest within the four inputs and the
corresponding lter index to the next stage. The second stage utilizes four
basic blocks (annotated by M1...M4) and generates 4 intermediate outputs
in a similar fashion as in the rst stage. The nal stage use one basic block
(annotated by M) and chooses the largest energy response and its lter in-
dex, from which the input frequency range can be derived (annotated by bf
in the Figure 2.5).
12
Figure 2.5: High level frequency estimator diagram.
2.5 The Bio-inspired Estimator
This estimator uses overlapping lters with wider bandwidth to achieve fre-
quency redundancy. Using the Hamming window design method, 16-tap
FIR lters with 1 kilohertz bandwidth are constructed while adjacent lters'
bandwidth overlapping is 0.75 kilohertz.
The more detailed robust estimator is shown in Figure 2.6. It contains four
major sub-blocks, including the prediction-based ANT lter bank (annotated
by P   ANTi), the energy estimator, the sliding median lter and the fre-
quency estimator. The FIR lters Fi are similar to those in the conventional
estimator. The dierence is that 16-tap FIR lters instead of 64-tap lters
are used. The input and the lter coecients are both 8-bit. The adders in
13
Figure 2.6: Robust estimator overview.
the MAC have 20-bit outputs instead of 22-bit as in the conventional esti-
mator. The nal outputs to the next stage energy estimators still only have
8-bit precision (the high 8 bits of the FIR lter output).
Figure 2.7 describes the structure of the ANT-based error compensator
(EC), which takes the lter output from one of the upstream FIR lters and
compensates for the hardware noise caused by voltage overscaling. A shift-
register chain is used to store the lter output response of the last four time
steps. If the signal magnitude change is larger than a quarter of the output
dynamic range and there is no signal in the last four steps with magnitude
similar to that of the current signal, then the current signal is replaced by
the signal in the last time step. This means that the EC block employs a
1-step predictor to compensate for hardware errors [16]. The functionality of
the EC block in Figure 2.7 is:
 Error detection: If jy[n]  y[n  1]j > Eth and jy[n]  y[n  2]j > Eth=2
and jy[n]  y[n  3]j > Eth=2 and jy[n]  y[n  4]j > Eth=2, an error is
declared. Eth is set as a quarter of the lter output dynamic range.
 Error correction: If an error is detected, y^[n]=y[n 1]; otherwise,y^[n]=y[n].
14
The second sub-block in Figure 2.6 is the lter energy estimator and it is
the same as in the conventional estimator described in Chapter 2.4.
The third sub-block is the sliding median lter, which takes the neighboring
lter responses as inputs and passes the median energy response as the output
to the next stage. For example, the ith median lter takes the group of
energy responses Ei 1,Ei and Ei+1 as inputs and chooses the median value
as the energy response for the ith FIR lter. The next median lter takes
the energy responses Ei, Ei+1 and Ei+2 as inputs and chooses the median
value as the energy response for the next FIR lter.
The fourth sub-block is the frequency estimator, which takes the output
of the sliding median block instead of using energy response from the energy
response estimator directly, searches for the largest response and decides the
input frequency range. The detailed architecture is the same as that used in
the conventional estimator as in Figure 2.5.
Figure 2.7: ANT-based EC block.
15
Chapter 3
Simulation Setup and Results
Comparison
In this chapter, the performance in terms of RMSE is simulated using the
RTL model of the algorithms. The energy consumption is based on the
synthesis tool estimation (SYNOPSIS design compiler) and is normalized
to total energy consumption of the conventional method under the nominal
supply voltage.
3.1 Simulation Setup
In order to introduce timing errors to the FIR lter under dierent voltage
overscalings, a structural Verilog model of the FIR lter is developed. As
shown in Table 3.1, the sum and carry bit delay of the one-bit adder are sim-
ulated using SPICE in a 45 nm process under dierent voltage overscaling.
Then an 8-bit ripple-carry adder (RCA) and an 8-by-8 bit signed multiplier
are designed using the one-bit adder as the primary block. Finally, multi-
ply/accumulate units (MAC) and FIR lters are constructed using the RCA
and the multiplier.
The delay parameter in the adder's Verilog model can be modied accord-
ing to dierent voltages. After the RTL model is constructed, the hardware
error is introduced to each lter output by changing the delay parameters for
specic Kvos = Vdd=Vdd crit, where the Vdd crit is the critical supply voltage.
Voltage overscaling is only applied to the combinational logic part of the FIR
lter banks. The erroneous lter output is then fed into dierent estimation
blocks under the nominal supply voltage to estimate the input frequency.
16
Table 3.1: Delay parameters for one bit adder driving a load of an identical
adder under voltage overscaling
voltage(v) Sum bit delay(ps) Carry bit delay(ps)
1.20 45 41
1.15 48 43
1.10 51 46
1.05 55 48
1.00 61 52
0.95 68 57
0.90 77 63
0.85 90 70
0.80 100 80
0.75 121 95
0.70 151 115
0.65 230 150
0.60 381 221
3.2 Performance Comparison
3.2.1 Performance comparison
The performance of dierent estimators in terms of RMSE is summarized in
Figure 3.1. RMSE is dened as:
RMSE =
vuut 1
N
NX
i=1
(fi   f^i)2 (3.1)
where N is the number of trials, fi and f^i are input frequency and its es-
timated value, respectively, for a particular trial. The conventional method
shows negligible but non-zero RMSE without voltage-overscaling because of
the background additive white Gaussian noise. However, it exhibits no toler-
ance to hardware errors and its RMSE increases by two orders of magnitude
under slight voltage overscaling.
At the nominal voltage and without hardware timing errors, the RMSE
of the bio-inspired method is greater than the corresponding value of the
conventional method, as the lter bandwidth is four times wider and there
is more background Gaussian noise power leaking into the corresponding
passband. Unlike the conventional method, the bio-inspired method shows
17
0.5 0.6 0.7 0.8 0.9 110
1
102
103
104
K
vos
50
SE
(H
z)
Conventional
Bio−inspired
Figure 3.1: Performance comparison.
moderate tolerance against hardware errors introduced by voltage overscaling
and the RMSE increases only slightly with further reduced supply voltage.
When Kvos reaches 0.75, the performance of the bio-inspired method deterio-
rates greatly and the RMSE is at the same magnitude as in the conventional
method.
3.2.2 Energy consumption comparison
It is necessary to assess the performance of dierent estimators in the context
of energy consumption, which is estimated by the power and the operation
time of dierent estimators. As no time redundancy is explored in this work,
the operation time of dierent methods is the same. The energy consump-
tion comparison would be equivalent to the power consumption comparison.
The power consumption is estimated by the SYNOPSIS synthesis tool, `DC
compiler'. In general, the system is divided into the FIR lter and the esti-
mator. The power of each part is estimated individually and the total power
is considered as the sum of the two parts. Under voltage overscaling, the
18
101 102 103 104
0
0.2
0.4
0.6
0.8
1
5MSE(Hz)
N
or
m
al
iz
ed
 p
ow
er
Conventional
Bio−inspired
Figure 3.2: Power consumption at dierent Kvos.
power consumption of the lter banks is calculated by [14]
P=PVdd critK
2
vos (3.2)
Figure 3.2 summarizes the energy-consumption and performance trade-os.
According to (3.1), the conventional method's power consumption scales
down approximately with the square of the supply voltage and exhibits no
error tolerance as its RMSE increases by two orders of magnitude under
slightly reduced supply voltage.
At the critical supply voltage, the total power of the bio-inspired method
is 31% of the conventional method. The majority of the power saving can
be attributed to the reduced hardware cost of the FIR lter banks. In the
bio-inspired method, the FIR lter has 16 taps, which is only 25% of the
tap numbers in the conventional method. The more complicated estimator
introduces power consumption overhead. When voltage scaling is applied to
the overlapping lter banks, more power saving can be achieved. However,
this leads to deterioration of the system performance in terms of RMSE. At
Kvos= 0.79, the bio-inspired method can reduce the power consumption by
19
up to 78% while keeping the RMSE under 500 Hz.
20
Chapter 4
Conclusion
4.1 Summary
Transistor feature size scaling along with subthreshold design provides sig-
nicant energy consumption benets for many sensor network applications
such as security and environmental sensing, which often take place in a highly
dynamic and variable environment. However, implementations using smaller
transistor size or operating under the subthreshold voltage often introduce
variations that can cause timing errors.
In this thesis, a novel algorithm for non-coherent frequency estimation of
a sinusoid in noise inspired by biological signal processing systems is devel-
oped. This method is shown to be energy-ecient and error-tolerant for the
canonical problem addressed.
The bio-inspired method can reduce the RMSE to 500 Hz compared to the
conventional method results of 5000 Hz RMSE. Moreover, the bio-inspired
method's energy consumption is only 22% of the conventional method.
In this work, hardware redundancy is mainly introduced by using low-cost
`sensors' which have overlapping passband and lower energy consumption
compared to the conventional design.
This method is mostly eective under weak or moderate voltage overscal-
ing. Under this condition errors happen occasionally.
4.2 Future Work
This work mainly has explored the frequency redundancy applied to this
canonical single-tone audio frequency problem. Further research can be ex-
panded in several aspects. First, we could explore redundancy in time, which
utilizes more samples in time for the same input frequency. This intro-
duces extra energy consumption while it could improve system performance
21
in terms of RMSE. Energy/performance trade-os are further explored by
altering the number of iterations.
In this work, hardware errors are introduced by deliberately overscaling the
supply voltage of the digital lter part of the system. Similarly, hardware
errors could also be introduced by other variations such as process, temper-
ature and aging. Other hardware error models such as Stuck-At fault can
also be explored. For the fusion part of the system, alternative architectures
for ANT lters can also be explored and the length of the ANT lters can
be optimized for dierent input frequencies.
Further research can expand the bio-inspired method with features such
as low-cost rough detectors/estimators with redundancy into other detection
and estimation problems mentioned in Chapter 1.4 to design energy-ecient
and error-tolerant algorithms for certain applications. To achieve this, the
key is to design low-cost and energy-ecient detectors/estimators compared
to the conventional design. Further, hardware redundancy can be introduced
given the low-cost detectors/estimators are available.
22
References
[1] J.M. Rabaey, D. Burke, K. Lutz, and J. Wawrzynek, \Workloads of
the future,"  IEEE Design and Test of Computers, July-August 2008, 
vol. 25, pp. 358{365.
[2] J.M. Rabaey, A. Chandrakasan, and D. Nikolic, Digital Integrated
Circuits, A Design Perspective,  Englewood Clis, NJ, 2002, Prentice-
Hall.
[3] C. Hu, \IC reliability simulation," IEEE Journal of Solid-State
Circuits, March 1992, vol. 27, pp. 241{246.
[4] K.-K. Kim and Y.-B. Kim, \A novel adaptive design methodology for
minimum leakage power considering PVT variations on nanoscale VLSI 
systems,"  IEEE Transactions on Very Large Scale Integration Sys-
tems, April 2009, vol. 17, pp. 517{527.
[5] J.M. Rabaey, Low Power Design Essentials, Integrated Circuits and
Systems, New York, NY, 2009, Springer.
[6] N. Verma, J. Kwong, and A. Chandrakasan, \Nanometer MOSFET vari-
ation in minimum energy subthreshold circuits,"  IEEE Transactions 
on Electron Devices, January 2008, vol. 55, pp. 163{174.
[7] B. W. Johnson, Design and Analysis of Fault Tolerant Digital Systems, 
Boston, MA, 1988, Addison-Wesley Longman Publishing Co.
[8] J.S. Plank, K. Li, and M.A. Puening, \Diskless checkingpoint," 
IEEE Transactions on Parallel and Distributed Systems, October 1998,
pp. 972{986.
[9] D. Ernst, N. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler,
D. Blaauw, T. Austin, K. Flautner, and T. Mudge, \Razor: A low-
power pipeline based on circuit-level timing speculation," in Proceedings
of the 36th International Symposium on Microarchitecture, November
2003, pp. 7{18.
[10] S. Narayanan, G. Varatkar, D. L. Jones, and N.R. Shanbhag, \Computa-
tion as estimation: Estimation-theoretic IC design improves robustness
23
and reduces power consumption," in Proc. IEEE Int. Conf. Acoust.,
Speech, Signal Processing (ICASSP), April 2008, pp. 1421{1424.
[11] G. Varatkar, S. Narayanan, N.R. Shanbhag, and D. L. Jones, \Variation-
tolerant, low-power pn-code acquisition using stochastic sensor NOC,"  
IEEE International Symposium on Circuits and Systems (ISCAS), 
May 2008, pp. 380{383.
[12] S. M. Kay, Fundamentals of Statistical Signal Processing, Volume 2:
Detection Theory,  Englewood Clis, NJ, 1998, Prentice-Hall.
[13] W. H. Rohrer, C. Lee, and D. L. Sparks, \Population coding of saccadic
eye movements by neurons in the superior colliculus,"  Nature, 1988, 
vol. 332, pp. 357{360.
[14] R. Hegde and N. R. Shanbhag, \A voltage overscaled low-power digi-
tal lter IC,"   IEEE Journal of Solid-State Circuits, February 2004, vol. 
39, pp. 388{391.
[15] E.P. Kim, D.J. Baker, S. Narayanan, D.L. Jones, and N.R. Shanbhag,
\Low power and error resilient pn code acquisition lter via statistical 
error compensation," IEEE Custom Integrated Circuits Conference 
(CICC), October 2011, pp. 1{4.
[16] R. Hegde and N. R. Shanbhag, \Soft digital signal processing,"  IEEE
Transactions on Very Large Scale Integration Systems, December 2001,
vol. 9, pp. 813{823.
24
