Intelligent Wireless Sensor Nodes for Human Footstep Sound
  Classification for Security Application by Mukhopadhyay, Anand Kumar et al.
Intelligent Wireless Sensor Nodes for Human 
Footstep Sound Classification for Security 
Application 
Anand Kumar Mukhopadhyay, Naligala Moses Prabhakar, Divya Lakshmi Duggisetty, Indrajit Chakrabarti, and 
Mrigank Sharad  
Abstract—Sensor nodes present in a wireless sensor 
network (WSN) for security surveillance applications should 
preferably be small, energy efficient and inexpensive with on-
sensor computational abilities. An appropriate data processing 
scheme in the sensor node can help in reducing the power 
dissipation of the transceiver through compression of 
information to be communicated. In this paper, authors have 
attempted a simulation-based study of human footstep sound 
classification in natural surroundings using simple time-
domain features. We used a spiking neural network (SNN), a 
computationally low weight classifier, derived from an 
artificial neural network (ANN), for classification. A 
classification accuracy greater than 85% is achieved using an 
SNN, degradation of ~5% as compared to ANN. The SNN 
scheme, along with the required feature extraction scheme, can 
be amenable to low power sub-threshold analog 
implementation. Results show that all analog implementation 
of the proposed SNN scheme can achieve significant power 
savings over the digital implementation of the same computing 
scheme and also over other conventional digital architectures 
using frequency domain feature extraction and ANN based 
classification. 
Keywords— wireless sensor node, artificial neural network, 
spiking neural network, security surveillance 
I. INTRODUCTION 
Wireless sensor networks (WSNs) are used in a wide 
range of applications in the field of health care, environment, 
agriculture, public safety, military, and industry and 
transportation systems [1]. The wireless sensor nodes in a 
WSN are designed according to the target application. 
Sensor nodes for security surveillance application for 
monitoring the presence of human beings in restricted zone 
fall under the category of passive supervision in which small-
sized static sensor nodes are to be deployed in large numbers 
with continuous monitoring. These resource constrained 
sensor nodes should be inexpensive and power efficient (few 
μW) for serving the purpose [2]. 
One of the constraint that the wireless sensor nodes suffer 
from, is the fact that they have limited battery life.  Often the 
most power-hungry component in a WSN is the transceiver 
unit. Hence, the concept of on-sensor computing is applied in 
the sensor nodes for minimizing  the communication power 
due to the transmission of large chunks of data. However, the 
intelligent sensor nodes having on-sensor computational 
abilities should consume less power, preferably much less 
than the transceiver unit. Therefore, there is a need for cross-
hierarchy low power system  design approach that spans 
from algorithm and architectural level, going all the way 
down to circuit level implementation of the application 
specific computing device.. 
With the advancement in the field of energy efficient 
brain-inspired neuromorphic computing algorithms, several 
low power computing systems have been envisaged and 
implemented, suitable for on-sensor computing. 
[4][5][6][7][8]. Spiking neural network (SNN) are inspired 
from biological neurons in which communication between 
neurons take place in the form of spikes (events). The 
membrane potential (vmem) of a neuron gets affected based on 
the presynaptic events associated with it. Whenever vmem 
exceeds the threshold potential (vth) of the neuron, a post-
synaptic spike is generated implying that the neuron is active 
for passing information to the next layer neurons connected 
to it. The learning rules for SNN algorithms can be 
supervised or unsupervised, rate based or spike-time based 
which can be chosen based on the application [9]. These 
concepts can be applied to real-time applications involving 
energy efficient neuromorphic hardware aiming for a 
reduction in the computational complexity with acceptable 
performance. Generally, the computations during the training 
of neural network frameworks are much more intensive 
compared to the feedforward classification process during 
testing. Hence, performing offline training with online 
classification is an encouraging approach. A method for 
training an ANN with the conventional back-propagation 
algorithm, and then using the trained parameters using a rate 
based SNN for classification is presented in [6]. This method 
can help in ensuring a balance between better performance 
and less computational complexity when compared to 
conventional ANNs. 
In this work, we present an integrated system design for a 
low power sensor node targeted for a surveillance 
application.   The node is supposed to  spot human presence 
in restricted areas by analysing the generated acoustic audio 
signals using simple low complexity time domain features 
followed by analog domain neuromorphic implementation 
[11][12]. We compare the algorithm level performance of the 
proposed scheme with some standard techniques to establish 
in terms of accuracy. We also provide architecture and 
circuit level design for the proposed scheme and compare it 
with some conventional schemes. The major contributions of 
the work are: 
 Design and optimization of time domain feature 
extraction from acoustic data for SNN based processing. 
 Design and optimization of SNN parameters for 
best accuracy 
 Comparison of proposed scheme with conventional 
approach like those involving frequency domain feature 
extraction followed by ANN classifier in terms of 
accuracy. 
 Analog implementation of proposed scheme and it’s 
power comparison with digital implementation of the 
proposed scheme as well as that of conventional 
approaches. 
The rest of the paper is organized as follows: In section 
II, a brief description of the application is given along with a 
 Fig. 1. The major blocks involved in a wireless sensor node. 
 
(a) 
 
    (b) 
Fig. 2 (a) Small portion of the sampled audio signal, (b) 2-D matrix 
representation of D and S.  
review of some  conventional approaches addressing similar 
applications. Section III presents the proposed scheme based 
on time domain feature extraction and SNN based 
classification along with its architectural implementation. 
Algorithm level optimization of the feature extraction 
scheme is presented in section IV. SNN based classifier and 
its algorithmic optimization is presented in section V. 
Section VI describes digital and analog implementation of 
the proposed scheme. . In section VII  simulation results are 
presented for the proposed solution along with its 
comparison with conventional approaches. Section VIII 
concludes the paper.  
II. APPLICATION OVERVIEW 
 We present the design of  a low power sensor node  
with on-sensor computing for surveillance application. The 
objective is to detect the presence of humans in a restricted 
zone. This kind of intelligent sensor has importance in 
military surveillance applications for monitoring human 
presence in sensitive  areas. A block level diagram of the 
neural network based sensor node is shown in Fig. 1 which 
consist of units for power management, sensor interface, 
filtering and analog to digital conversion (ADC), data 
processing, memory, and wireless communication. The 
sensor is supposed to capture the surrounding sound signals 
within its range, which would be processed by other blocks 
to finally get the result of classification, used by the 
transceiver to communicate to the base station.  
Conventional methods for audio signal classification 
usually rely on spectral and wavelet based features from the 
frequency domain in addition to time domain features such 
as mean, standard deviation, maximum value of the 
amplitude followed by a classifier [23-27]. 
A detailed review of acoustic signal processing along 
with the different stages involved (sound rejection, 
detection, localization, classification and cancellation) is 
presented in [23]. In [24] FFT features from the acoustic and 
seismic signals were considered along with baseline 
classifiers such as kNN (k nearest neighbor), maximum 
likelihood (ML) and support vector machine (SVM) for 
vehicle classification. In [25] Mel-Frequency cepstral 
coefficient (MFCC) features were used to classify different 
types of vehicle (heavy, medium, light), and it was found 
that artificial neural network (ANN) classifier achieves a 
better (>20%) classification accuracy compared to kNN. In 
[26] authors consider spectral features using linear 
prediction coding (LPC) along with a neural network for 
classifying seismic signals. In [27] both time and frequency 
domain features were considered and assessed using neural 
networks and genetic algorithms for classifying seismic 
signals. In order to avoid the computational energy and the 
difficulty in implementing in low-cost sensor nodes with 
constrained resources, authors in [12] presented a time 
domain signal processing for feature extraction followed by 
neural networks for classifying acoustic/seismic signals of 
vehicle types. 
These conventional methods suffer from  computational 
complexity involved in the feature extraction and 
classification process and hence, lead to relatively power 
hungry solutions. Therefore, energy efficient techniques for 
real-time feature extraction and classification is necessary. 
We approached the problem considering low weight time 
domain feature extraction and SNN. This is an approximate 
computing methodology amenable to analog 
implementation. 
III. PROPOSED SCHEME:TIME DOMAIN APPROXIMATE 
FEATURE EXTRACTION AND SNN BASED CLASSIFICATION 
SCHEME 
We propose a scheme involving low complexity time 
domain features followed by a SNN classifier as an alternate 
option for low power sensor node design for acoustic 
applications. The simple time domain features will reduce 
the complex computations involved in calculating the 
frequency domain features. In addition to that the SNN 
classifier would be a viable alternative to its counterpart 
ANN due to the spike (event) based processing of 
information. In a subsequent section we show the 
amenability of the proposed scheme to low power analog 
implementation. 
     (a) 
Figure 3: Network structure of the neural network (ANN/SNN) model. 
 
    (a) 
 
    (b) 
Figure 4 (a) Illustration of spike generation process (b): Schematic 
diagram of an integrate and fire neuron model. 
A. Low complexity time domain feature extraction 
Features derived directly from the time domain would be 
preferable in terms of lesser computation complexity when 
compared to frequency domain features. Authors in [12] 
have earlier shown that time encoded signal processing and 
recognition (TESPAR) has been successful in obtaining 
essential features for seismic signals and would be feasible 
for real-time processing. 
TESPAR is inspired by infinite clipping coding of a 
waveform which represents the duration between the zero 
crossings of the waveform [12]. The segment between two 
consecutive real zeroes is termed as an epoch. The 
intelligibility of the waveform can be enhanced by 
introducing the concept of complex zeroes in addition to the 
real zeroes. The complex zeroes represent the shape of the 
signal waveform in the form of local maxima and minima of 
the signal waveform. Therefore, a band limited signal can be 
approximated by computing the following parameters in 
each epoch as shown in Fig. 2(a): 
 
(a) Duration (D): It is the number of samples between 
real zeroes (within an epoch). 
(b) Shape (S): It is the number of local minima and local 
maxima for a positive and negative epoch respectively (S=0 
if no maxima/minima present). 
The D and S contain information of the fundamental 
frequency and harmonics of the signal respectively [12]. 
 
The 2-D D and S matrix representing a signal waveform 
is shown in Fig. 2 (b). Each location in the 2-D matrix will 
have the number of occurrences of the (D, S) combination 
present in the signal. The 2-D matrix is further expanded to 
a 1-D vector to form the final feature vector (FV) to be fed 
to the classifier. Optimization of different parameters related 
to it are discussed in Section IV. 
B. SNN based classification scheme 
The final feature vector is to be fed to a SNN classifier. 
The general network structure of a SNN is similar to that of 
the ANN as shown in Fig. 3. The difference lies in the 
functionality of the neurons present in the network. 
An illustration of the spike generation from feature 
vector within a specified time window is shown in Fig. 4(a). 
The instances of spike generation are of random nature. The 
number of spikes generated in the interval is directly 
proportional to the (FV) magnitude and the maximum input 
firing rate (r), which is a network parameter. This way the 
input layer neurons process spikes equivalent to the FV 
value. 
 For, the subsequent layers, integrate and fire (I&F) 
neurons are used to implement the spiking operation as 
shown in Fig. 4 (b). The I&F neuron receives input from the 
previous layer neurons whenever there is a pre-synaptic 
spike event. The neurons between two consecutive layers 
are connected via synapses of different strength representing 
the weights. Hence, the input current magnitude is scaled 
down according to the synaptic strength. After receiving the 
input, the membrane potential (vmem) of the neuron increases. 
Whenever vmem exceeds the threshold potential (vth) of the 
neuron, (i.e., vmem>vth) the neuron is activated and generates 
a post synaptic spike, after which vmem falls down to the 
neurons resting potential (vres). The equation depicting the 
vmem of a simple I&F neuron is in (1) 
( )
. ( )
i
mem
i res
i a S
dv t
W δ t a v
dt

    (1) 
Here, Wi and δ(.) represent the weight of the ith incoming 
synapse and delta function respectively. Si = [ti0, ti1, …..] 
stores the spike times of the ith presynaptic neuron. 
 
The optimal choice of network parameters for the ANN 
and the corresponding SNN network are discussed in 
Section V. 
The architectural diagram of the SNN classifier can be 
constructed using digital blocks as well as in analog fashion 
(refer Section VI).  
IV. OPTIMIZATION OF FEATURE EXTRACTION SCHEME 
We considered a window size of 5s to be qualified 
as a data sample from which essential information is to be 
identified. This essential information is known as the FV, 
which represents the data sample. FV can be extracted from 
the time/frequency domain of the signal. Selecting proper 
feature aids in better performance on the subsequent stages 
(classification). The complexity in realizing the FV in real 
time scenario is a critical factor to be considered. The target 
is to achieve an acceptable classification accuracy with 
                             (a)                                 (b) 
 
  (c)                       (d) 
 
  (e)                                   (f) 
Fig. 5. DS matrix 2-D representations (a) chirp of birds (D=50,S=20), (b) 
single person walking on gravel with chirp as background noise 
(D=50,S=20) (c) chirp of birds (D=20,S=5), (d)single person walking on 
gravel with chirp as background noise (D=20,S=5), (e) surface view of (c), 
(f) surface view of (d) 
minimal computational operations by the classifier on the 
sensor node. Hence, offline training with online 
classification becomes a feasible option. 
A. Dataset creation 
The surface where the sensor nodes are placed is 
considered to be a gravel surface. Due to unavailability of a 
significant number of data needed for addressing the 
problem, we created a suitable dataset using data 
augmentation methods such as rolling the files circularly and 
adding random noise to it [14]. The audio data of the footstep 
sound on gravel and different background noise were taken 
from different sources1 sampled at 44.1 KHz. From the 
frequency spectrum of the audio files, it was understood that 
the major portion of the power content lies in the frequency 
range of (0-5)KHz both for the background noise as well as 
footstep with background noise. Therefore, a cut-off 
frequency of fc=5KHz is chosen for filtering out the high-
frequency signals. The magnitude of power content depends 
on the intensity of background noise or footstep, which may 
vary in different instances. However, the frequency content 
of both footstep and background noise lie in a similar range 
(0-5KHz). Therefore, the originally recorded audio files were 
downsampled to fs=11025Hz (fs>2*fm), for further analysis.  
The task is to detect human presence on a fixed surface 
(gravel considered) in the presence of natural environmental 
sounds, four different background noises were considered, 
namely, 1) crickets, 2) birds chirping, 3) rain and 4) wind. 
For the human presence, audio files of a single person 
walking on gravel were acquired. The file was divided into 
5s segments, considering it as a suitable window length for 
real-time analysis. For emulating multiple people (3 
considered) walking on gravel, a different combination of 
files was mixed from the single person dataset with a 
random scaling factor between 0 to 1 for incorporating the 
effect of distance or intensity from the sensor node location. 
The final dataset created, consisted of 12352 data samples 
consisting of two classes, 1) background noises (6176 
samples), 2) single person and multiple people walking on 
gravel in the presence of background noises (6176 samples). 
The total dataset was divided into train set (9024 samples), 
validation set (1664 samples), and test set (1664 samples) 
for further analysis. The number of a single person and 
multiple people and the four different background noises 
were selected in a balanced manner for incorporating their 
effect in equal proportion during the training of the network. 
B. Choice of sampling frequency 
The footstep audio signal is initially filtered with a low 
pass filter with pass band frequency 5KHz and stop band 
frequency 6KHz as the range of footstep audio signal lies 
within the frequency range (0-5)KHz. Then, different 
sampling frequencies (fs) were chosen and its effect on 
classification accuracy was observed as shown in Table. I 
while setting the (D, S) parameters to (20, 5). It is to be noted 
that the test data samples for different frequencies are the 
same files but resampled to different frequencies. It is 
observed that the change in test accuracy is insignificant 
even at lower fs. Hence, fs = 11025Hz has been chosen to 
reduce the computational complexity for further analysis. 
C. Choice of D and S matrix dimension 
The two-dimensional matrix of the (D, S) combination for 
(50, 20) and (20, 5) as shown in Fig. 5 indicates that most of 
the essential features are concentrated in lower values of (D, 
S) combinations. The test accuracy observed for different 
(D, S) combinations at fs = 11025Hz is shown in Table. II 
from which it is evident that an acceptable accuracy can be 
obtained even at lower (Dmax, Smax) combination. The FV 
length is equal to D × S which determines the number of 
neurons in the input layer of the neural network. Lesser 
TABLE I 
EFFECT ON TEST ACCURACY FOR DIFFERENT SAMPLING 
FREQUENCIES WITH (D, S) = (20, 5) 
Sampling frequency 
(fs), Hz 
Test accuracy (%) 
11025 93.44 
12500 93.32 
13500 92.73 
14000 93.17 
15500 93.23 
44100 93.27 
 
 TABLE II 
EFFECT ON TEST ACCURACY FOR DIFFERENT (DMAX, SMAX) 
COMBINATIONS AT FS = 11025HZ 
(Dmax, Smax) 
FV dimension 
(number of input 
layer neurons) 
Test accuracy (%) 
(5,5) 25 92.30 
(10,5) 50 93.32 
(20,5) 100 93.44 
(30,5) 150 93.32 
(50,5) 250 91.40 
 
 
 (a) 
 
(b) 
Fig. 6. Variation in test accuracy for the trained ANN model by varying 
the number of neurons in (a) hidden layer 1, (b) hidden layer 1 and 2 
 
                         (a)                                                (b) 
 
           (c)                                                 (d) 
Fig. 6. Test accuracy over time for the SNN classifier (r = 1000Hz and 
Vt = 1mV(b)average number of spikes (spikeavg ) occurring at the 
input layer neurons, (c) hidden layer neurons, (d) output layer 
neurons. 
number of input neurons is preferable due to the reduction in 
the architectural complexity of the classifier. Therefore, a 
(D, S) combination of (10, 5), implying a feature vector of 
dimension 50 is considered which can provide an accuracy 
of ≈ 93.32%. 
  
V. OPTIMAL CONVERSION OF ANN TO SNN 
 
After choosing the optimum FV parameters (Section 
IV), viz., fs = 11025Hz, D = 10 and S = 5 we train an ANN 
with different network configurations. The following 
parameters were chosen during training of the ANN model, 
learning rate, α = 0.001 with a categorical cross entropy loss 
function suitable for classification problem, Adam 
optimizer, a dropout (d = 0.5) is considered for avoiding 
over fitting. Training was done using back propagation 
algorithm for minimizing the cross entropy cost function as 
given in (2) 
( )
1
log( )
N
L
n n
n
E y y

                                     (2) 
Here, {0,1}
n
ny  is the labels in one hot coding and 
( )Ly  is the output of the model. The loss function and 
accuracy for the training and validation set is shown in Fig. 
15. The variation in test accuracy for the trained ANN 
model by varying the network parameters, viz., 1) number 
of hidden layers, 2) number of neurons in the hidden layer, 
as shown in Fig 6. Henceforth, we choose the optimum 
network configuration, i.e., ([50(input)-80(hidden)- 
2(output)] neurons for further analysis. 
For the proper conversion of ANN to its corresponding 
SNN network, certain conditions, namely, 1) setting the bias 
to zero, 2) using ReLU activation functions were considered 
[6]. ReLU provides a better approximation to relate the 
firing rate of the integrate and fire (IF) neurons in the SNN 
model whereas a zero bias will eliminate the effect of 
external influence to the IF neurons, i.e. only the weights 
and the neuron threshold potential (Vt) will matter.  
Therefore, the rate based SNN classifier requires the 
trained weights obtained by the ANN network, which needs 
to be further optimized with the SNN parameters namely, 
maximum firing rate (r) and neuron threshold potential (Vt). 
A higher Vt implies that a neuron is less likely to spike as 
the neuron membrane potential will need more time to reach 
Vt. Hence, a proper combination of r and Vt will ensure 
higher accuracy, which has been determined by considering 
different combinations, as shown in Table. III. It can be 
observed that r = 1000Hz and Vt = 1mV acquires an 
accuracy of 86.75% within 0.1s duration with a step size of 
1ms as shown in Fig. 6 (a). The average number of spike 
occurrences for the IF neurons in the input, hidden and 
TABLE III 
EFFECT OF INPUT FIRING RATE (R) AND THRESHOLD 
POTENTIAL (VT) ON TEST ACCURACY FOR THE RATE BASED 
SNN CLASSIFIER 
rate, r  
(Hz) 
/ 
threshold, 
 Vt, (mV) 
200 500 1000 2000 3000 4000 
0.25 71 80.75 83.25 76 72.75 69.5 
0.5 69.5 80.25 85.25 77.25 71.25 68.25 
1 67.75 80.5 86.75 77.25 70.75 68.25 
2 68.5 77.5 86 76.25 69.75 68.75 
4 56.25 75.5 84.25 76 70.25 68.75 
10 50 52.5 65.75 61.75 61.25 60.75 
20 50 50 50.25 51 52.25 52.75 
 
 
 Fig. 7. Effect on test accuracy with respect to the number of fractional 
bits in the weight matrix coefficient 
 
Fig. 8. Top-level block diagram of the hardware architecture 
 
Figure 9: Hardware Implementation of D,S Value generator output layer are shown in Fig. 6 (b), (c) and (d) respectively. 
From the statistics of spike generation, the number of spikes 
generated in the input, hidden and output layer are 343, 330 
and 31 which sums up to ~ 700 spikes for the SNN 
operation, considering 100 steps. It is to be noted that the 
neurons which do not produce a spike during the 
classification process are in an OFF state and help in the 
reduction of power consumption which is not possible using 
ANN classifier. 
For a network architecture of [50-80-2], sequential 
operation of neurons will need ~ [(50x80) + (80x2) = 4160] 
MAC operations for computations. Considering the worst 
case (assuming all the neurons are spiking at every iteration) 
for SNN, we require ~ [{(50x80) + (80x2)}x Niter] addition 
operations. Considering the above statistics of average spike 
generation for Niter = 100 (i.e., ~6.86% (343/5000) for 
input layer neurons and ~4.125% (330/8000) for hidden 
layer neurons), the number of addition operations ~[(343x80 
+ 330x2) = 28100] for the duration of 100 iterations. 
To reduce the memory footprint due to the storage of 
weight coefficients, the effect on test accuracy is observed 
while truncating the floating point weight coefficients. From 
Fig. 7 it can be seen that the test accuracy falls from 93.26% 
to 60.87% when the number of fractional bits is reduced 
from 2 to 1. Apart from the fractional bits, 4 bits are 
considered for allocating the signed integer part. There is a 
drop in accuracy ~(3-6) % for the SNN classifier depending 
on the total number of time steps considered as shown in 
Fig. 7. Hence, a significant amount of computation and 
power consumption involving the memory can be avoided 
by restricting the number of bits used in the weights 
coefficients. 
VI. DIGITAL AND ANALOG IMPLEMENTATION OF THE 
PROPOSED SCHEME 
 
A. Digital implementation  
I) Architectural schemes for DS FV generation  
The top-level block diagram of the DS generator with 
SNN inference unit shown in Fig. 8 consists of sub-blocks, 
viz., i) DS FV generator, ii) slice storage, iii) spike 
generator, iv) spike address generator, v) weight memory, 
vi) IF neuron array. 
Two different schemes for generating the DS FV were 
analyzed. The DS FV is generated using registers, 
comparators, counters, and basic gates. 
In scheme I, sequential implementation of DS FV 
generator followed by ANN was analyzed. Fig. 9 shows the 
RTL diagram of the DS values generated every epoch. A 
zero crossing is detected by comparing the sign bit of the 
current and previous sample using XOR gate. For detecting 
S, three consecutive samples are compared. For calculating 
the number of D and S present in an epoch, counters are 
used. The Dmax and Smax values are fixed to 10 and 5 (refer to 
Section IV(c)), therefore the counter freezes after reaching 
the maximum D and S values. Upon detection of zero 
crossings (new epoch), the counters are reset. The D and S 
     (a) 
 
Figure 10: (a) Memory interface to store the DS values in memory 
(BRAM) 
 
    (a) 
 
    (b) 
Figure 11 (a): Feature vector generator top level (b) slice storage 
scheme 
 
 
 
 
Figure 12: (a) Spike generator by comparing 50 bit FV with 50 bit random 
number generated using LFSR, (b) pseudo-random number generator 
values calculated at every epoch are latched to registers 
which map to the address of the Block RAM used for 
storing the count of the number of D and S occurrences as 
shown in Fig. 10. In this scheme 11 BRAMs are used, each 
of which stores the DS FV for the 5s duration while having 
a throughput of 0.5s. 
In scheme II, a parallel DS FV followed by ANN/SNN 
was analyzed. The DS FV generator along with the slice 
storage scheme is shown in Fig. 11 (a) and (b) respectively. 
The DS generator computes and stores the DS FV every 
0.5s in a register file. The bit width of the register file is 
chosen to be 13 bits which can accommodate up to 8192 
occurrences of a (D, S) combination pair. The slice storage 
consists of 10 slices of 0.5s FV unit which are summed up 
every 0.5s to generate the FV of 5s duration which is the 
input of the inference unit. 
The spike generator (refer Fig. 8) is supposed to convert 
the input feature vector (FVI) to spike patterns according to 
the equation, c* FVI >rand() where c=dt*fmaxrate. Here, 
rand() is a random number between 0 to 1, fmaxrate is directly 
proportional to the number of spike occurrences and dt=1ms 
empirically [5]. A pseudo-random number generator is 
designed using an 11-bit linear feedback shift register 
(LFSR) for emulating the random number generation. Each 
feature value in the FV must be compared with a different 
random number. This is achieved by comparing the outputs 
of 50 such pseudo-random number generators with FV of 
dimension 50x1 to get the 50-bit spike vector in one clock 
cycle as shown in Fig. 12. 
There are two main advantages of the spike based 
operation, 1) It eliminates the multiplication operation 
between the input FV and weights as required in ANN, 
instead the synaptic weights are added if a spike is generated 
in the pre-synaptic neuron, 2) If the previous layer neuron 
does not produce spike, then the addition of the synaptic 
weight between the pre and postsynaptic neurons has to be 
skipped. Hence, the address location of the weights has to be 
accessed according to the spike pattern in the 50-bit FV for 
which a leading zero counter (LZC) [10] is used. The 
  
(a) 
 
    (b) 
Figure 13: (a) Illustration of a leading zero counter (LZC) and highest bit 
reset logic (b) Spike address generator with LZC and highest bit reset logic 
 
(a)                                                   (b) 
 
    (c) 
Figure 14: (a) IF Neuron (b) MAC neuron (c) IF Neuron array 
 
 
    (a) 
   
    (b) 
 
(c) 
Fig. xx8. (a) Histogram of % deviation of weights with standard 
deviation (σ=36.68%) for incorporating random Gaussian variation, 
(b)  Effect on test accuracy versus % standard deviation for ANN and 
SNN, (c) A representation of resistive crossbar network for computing 
output current in terms of input voltages and programmable weights. 
functioning of an LZC with the highest bit reset logic for an 
8-bit spike pattern is shown in Fig. 13 (a) which produces 
the output as the location of a spike in the FV and then 
resets it and scans the next location. The output of LZC is 
invalid when all the spikes present in the FV are reset 
according to the highest bit reset logic. The spike address 
generator as shown in Fig. 13 (b) uses the LZC for 
generating the spike address. 
The structure of an IF and MAC neuron is shown in Fig. 
14 (a) and (b) respectively. The weight coefficients between 
the IF neuron and the neurons in the previous layer 
producing spikes are added and stored in an accumulator. If 
the accumulated value is greater than the firing threshold 
(Vt), the IF neuron generates spike which is used by the 
address generator of the next layer. The structure of the IF 
neuron array for a parallel SNN is shown in Fig. 14 (c). 
 
B. Analog implementation 
i) Analog implementation of feature extraction 
 
ii) Variation analysis for analog design 
 
The neural network can be implemented using neural 
network in analog fashion. The transistors are used for 
representing the weights. We did a variation analysis of the 
trained weights (6 bits considered excluding sign bit) for 
assessing its feasibility for analog implementation. The 
variation in the trained weights in the form of random 
 Fig. 15. Effect on test accuracy with respect to the number of digits 
considered after decimal point in the weight matrix. 
 
Figure 17: ANN classifier architecture (Scheme I) 
Gaussian distribution is shown in Fig. xx8 (a) which 
corresponds to a percentage standard deviation of 
σ=36.68%. We further analyzed the effect on test accuracy 
versus the percentage standard deviation and find that both 
the models can tolerate up to σ=30% as shown in Fig. xx8 
(b). 
 
iii) Cross bar scheme for SNN, using multi-bit 
memristor weights 
The crossbar SNN circuit can be implemented using 
multi-bit memristor weights in an efficient manner [23, 24, 
27, 28, and 29]. A simple crossbar circuit which used 
programmable weights in the form of memristor is shown in 
Fig. xx8(c). From [26] it can be understood that memristor 
are compatible with CMOS systems, hence will be a viable 
option for compact implementation of the SNN classifier 
using resistive crossbar memory (RCM). For, further energy 
benefits, nanoscaled spintronic devices as mentioned in [25] 
can be used for representing the synaptic weights. 
 
VII. RESULTS AND PERFORMANCE COMPARISON 
A. Optimization of conventional digital scheme for best 
results 
Conventional methods involves the selection of 
frequency domain features. In Fig. 15 we have made a 
comparison between features derived from FFT, DS, and 
DWT. FV derived from FFT by three different approaches, 
viz., 1) by computing an N=65536 point FFT over the 5s 
duration and then averaging the bins for feature dimension 
reduction (Case I), 2) by considering a window size (200ms) 
with no overlap and computing N=4096 point FFT over the 
segment at  each window and selecting the maximum value 
of PSD among all windows followed by FV dimension 
reduction by bin averaging (Case II), 3) by considering 
window size (50 ms) throughout the 5s segment on which an 
N point FFT is performed (N=512, 256, 128, 64, 32, 16), 
only the max value of PSD among all windows is 
considered in the FV (Case III). The FV derived from the 
FFT (Case III) is convenient compared to the former two 
cases because N<512 point FFT is computed which in turn 
eliminates the FV dimension reduction process. 
  Apart from FV derived from FFT, FV is also derived 
from DS and discrete wavelet transform (DWT) which can 
be derived from the time domain. In Fig. 15 the comparison 
is done between the different FVs while keeping the FV 
dimension similar. The size of the FV decides the number of 
nodes at the input layer of the neural network. It is apparent 
that the computational complexity of FVs derived from 
frequency domain will be more as compared to the ones 
derived from the time domain. 
B. Comparison between 128 point FFT FV (optimized) 
and DS FV 
The performance of FV DS is more suitable in terms of 
performance and computational complexity. Comparison of 
power and resource utilization using D and S and 128-pt 
FFT feature vector (FV) targeting Zynq AP SoC XC7Z020 
CLG484-1 FPGA is shown in Table. IV from which it can 
be inferred that the FF FV takes ~3x more power compared 
to DS FV and also consumes more resources. Hence, low 
complexity time domain DS FV can be considered as a 
viable alternative to spectral based features derived from the 
frequency domain. Hence DS FV is considered for further 
analysis. 
C. Digital schemes for SD-ANN and SD-SNN  
System-level architectures of the DS FV generator along 
with ANN/SNN inference unit have been designed while 
targeting Zedboard with Zynq AP SoC XC7Z020-CLG484-
1 FPGA. We discuss some schemes for implementing them 
in this section. Two different schemes were considered to 
implement the design: (1) Scheme I: Sequential operation, 
(2) Scheme II: Parallel operation. 
Scheme I: The block level diagram of the system 
consisting of the DS FV generator and ANN is shown in 
Fig. 16. It consists of a DS generator, an address generator, 
memory banks, FIFO, ANN classifier and control circuit 
modules. The RTL of the ANN inference unit of dimension 
[50-80-2] (Section VI) is shown in Fig. 17. The operation 
frequency of the ANN block was chosen to be 8.32 KHz 
which will be able to complete 4160 MAC operations in 
TABLE IV 
COMPARISON OF POWER AND RESOURCE UTILIZATION BETWEEN D AND S AND 
128-PT FFT FEATURE VECTOR (FV) TARGETING ZYNQ AP SOC XC7Z020 
CLG484-1 FPGA 
FV type 
Power (mW) Resource utilization (%) 
Static Dynamic LUT FF BRAM DSP 
D and S 108 116 0.8 0.14 3.93 0 
FFT 115 531 6.38 1.88 6.07 6.36 
 
 
 Fig. 16.  System level diagram of FV (DS) generator unit and the 
fully sequential ANN (Scheme I). 
500ms time. The input FV from the FIFO unit is stored in a 
register array. The hidden layer MAC unit accesses the FV 
from the register array sequentially and multiplies it with the 
corresponding weights stored in memory for NI=50 MAC 
operations, after which the bias corresponding to the neuron 
is added. This process is repeated NH=80 times for 
computing the results of all the NH hidden neurons. The 
hidden layer output is passed through a ReLU activation 
block and then stored in a register array which acts as the 
input to the output layer MAC unit. The similar process is 
repeated to get the final output result through a comparator. 
The utilization report and the power consumption are 
mentioned in Table V and VI. 
Scheme II:A parallel implementation of DS FV followed 
by ANN/SNN was done (Section VI (A)) . The format of the 
trained weight coefficient consists of 10 bits out of which 6 
bits are for fractional, 1 bit for the sign and 3 bits for integer. 
The weights are stored in distributed ROM memory. The 
weights of the 1st layer are stored in 50 address locations, 
where each location corresponds to the neuron number in the 
input layer, i.e., the 1st location contains all the weight 
coefficients between the 1st input neuron with the 80 hidden 
neurons, resulting in 800 bits per address location. Similarly, 
the weight memory between the output layer neuron and the 
hidden layer consists of 80 locations with 20 bits per address 
location. The utilization report and power consumption 
report for scheme II are mentioned in Table VII and VIII 
respectively. The trained parameters are stored in distributed 
memory which will infer LUTs and FF. It is to be noted that 
the for similar network architecture, the ANN occupies 82 
DSP blocks due to multiplication operation, whereas in SNN, 
DSP blocks are not inferred. It can be understood that the 
major portion of the power is consumed by the Mixed Mode 
Clock Manager (MMCM) which converts the inbuilt 
100MHz clock to the desired clock frequency of operation. 
D. Synthesis results of DS-SNN (digital) 
The estimate of different modules in 65nm technology node 
is shown in Table IX. The computational block (refer Fig. 
TABLE V 
RESOURCE UTILIZATION FOR A SEQUENTIAL IMPLEMENTATION OF DS 
FV GENERATOR FOLLOWED BY ANN ARCHITECTURE (SCHEME I) 
Resource 
Utilization Available % utilized 
LUT 719 53200 1.35 
FF 485 106400 0.46 
BRAM 8.5 140 6.07 
DSP 3 220 1.36 
IO 19 200 9.50 
BUFG 5 32 15.63 
MMCM 1 4 25 
 
TABLE VI 
POWER CONSUMPTION DISTRIBUTION FOR SCHEME I. (SCHEME I) 
Power (mW) 
Type % consumed 
Static 107 (44%) 
Dynamic 
clocks <1 (<1%) 
Signals 9 (7%) 
Logic 6 (<5%) 
BRAM <1(<1%) 
DSP 3 (2%) 
MMCM 116 (85%) 
I/O <1 (<0%) 
Total 242 
 
TABLE VII 
RESOURCE UTILIZATION FOR A PARALLEL IMPLEMENTATION OF DS FV 
GENERATOR FOLLOWED BY ANN/SNN ARCHITECTURE (SCHEME II) 
Resource 
Utilization Available % utilized 
ANN SNN ANN SNN 
LUT 6784 6572 53200 12.75 12.35 
FF 8061 10811 106400 7.58 10.16 
DSP 82 0 220 37.27 0 
IO 23 23 200 11.50 11.50 
BUFG 3 3 32 9.38 9.38 
MMCM 1 1 4 25 25 
 
TABLE VIII 
POWER CONSUMPTION DISTRIBUTION FOR SCHEME II 
Power (mW) 
Type 
% consumed 
ANN SNN 
Static 125 (44%) 107 (55%) 
Dynamic 
clocks <1 (<1%) <1 (<1%) 
Signals <1 (1%) <1 (<1%) 
Logic <1 (<1%) <1 (<1%) 
DSP <1 (<1%) - 
MMCM 101 (99%) 101 (99%) 
I/O <1 (<0%) <1 (0%) 
Total 242 208 
 
 
11) updates the DS feature vector every 0.5s and produces 
an output DS vector of 5s which is the feature vector to the 
classifiers. It consumes ~6.8 µW. The input spike generator 
(refer Fig. 12) consisting of a PN generator for converting 
the input feature vector of size 50 to a corresponding 50 bit 
spike pattern is consume ~0.78 µW. 
The IF neuron consumes 0.26nW with an area of ~297µm2 
which is ~4.58x area efficient and ~5x power efficient than 
the MAC neuron followed by ReLU activation function for 
the 1st hidden layer. It is to be noted that the MAC neuron 
for the subsequent layers will consume more area and power 
due to increase in the size of the input bit width whereas the 
same IF neuron can be used in subsequent layers as the 
input bit width depend on the bit width of the trained 
parameters and the average number of strong connections in 
the previous layers which is found to be similar for our 
network configuration (50-80-2). The number of weights 
required for our network is equal to 4160. An SRAM 
memory for storing the weights will consume ~2.3 µW with 
an area of ~1915 µm2. The logic for selecting the address 
corresponding to the stored weights based on the spike 
patterns (refer Fig. 13) for the hidden and output layers 
consume 1864 µm2 area and ~150nW power. It can be 
concluded that there lies savings in terms of area and power 
for the SNN inference unit compared to the ANN inference 
unit when operated at the same clock frequency. 
E. Estimation of power consumption of the SNN classifier 
The SNN architecture used in this work can be mapped 
to low power neuromorphic processors having spike-based 
processing facilities. An approximate analysis can be done 
based on the average number of spike occurrences (Nspk) 
during the entire duration of the inference operation. 
Considering that a spike activity consumes β Joules of 
energy. In section 1V.e (Fig 6) we estimated the average 
number of spike generation to be Nspk = 700 for 100-time 
steps. Therefore, approximate energy consumption for the 
SNN inference based on different spike based processors in 
literature is listed in Table X. 
VIII. CONCLUSION 
In this work we have studied the classification of human 
footsteps on gravel in the presence of four different 
background noises consisting of wind, rain, birds chirp and 
crickets, using ANN/SNN model. An appropriate choice of fs 
and input feature vector, i.e., (D, S) combination was 
determined, such as to achieve an acceptable accuracy 
(>90%) with fewer computations involved in the feature 
extraction process. A time domain based feature (DS FV) 
showed ~3x power benefits in terms of power compared to 
frequency domain FFT FV. The training of the network was 
done using ANN with a proper choice of parameters to get 
an accuracy of ≈ 93.32 % by the ANN classifier and ≈ 
86.41 % by the SNN classifier. Also, the effect on accuracy 
due to approximating the weight coefficients was observed 
for both the ANN and SNN classifiers. Hardware-level 
assessment of the classifiers (ANN/SNN) used along with 
the DS FV generator were carried out targeting Zedboard 
with Zynq AP SoC XC7Z020-CLG484-1 FPGA. The later 
uses fewer resources compared to the former and dissipates 
~8% less power.  
As a future work the implementation of FV generation 
and SNN can be explored in analog fashion for better energy 
benefits.  
REFERENCES 
[1] F. Karray, M. W. Jmal, M. Abid, M. S. BenSaleh, and A. M. Obeid, 
“A review on wireless sensor node architectures,” in Reconfigurable and 
Communication-Centric Systems-on-Chip (ReCoSoC), 2014 9th International 
Symposium on. IEEE, 2014, pp. 1–8. 
[2] T. Rault, A. Bouabdallah, and Y. Challal, “Energy efficiency in 
wireless sensor networks: A top-down survey,” Computer Networks, vol. 67, 
pp. 104–122, 2014. 
[3] E. Popovici, M. Magno, and S. Marinkovic, “Power management 
techniques for wireless sensor networks: a review,” in Advances in Sensors 
and Interfaces (IWASI), 2013 5th IEEE International Workshop on. IEEE, 
2013, pp. 194–198. 
[4] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 
521, no. 7553, p. 436, 2015. 
[5] Y. Cao, Y. Chen, and D. Khosla, “Spiking deep convolutional neural 
networks for energy-efficient object recognition,” International Journal of 
Computer Vision, vol. 113, no. 1, pp. 54–66, 2015. 
[6] P. U. Diehl, D. Neil, J. Binas, M. Cook, S.-C. Liu, and M. Pfeiffer, 
“Fast classifying, high-accuracy spiking deep networks through weight and 
threshold balancing,” in Neural Networks (IJCNN), 2015 International Joint 
Conference on. IEEE, 2015, pp. 1–8. 
[7] J. H. Lee, T. Delbruck, and M. Pfeiffer, “Training deep spiking 
neural networks using backpropagation,” Frontiers in neuroscience, vol. 10, 
p. 508, 2016. 
[8] B. Han, A. Sengupta, and K. Roy, “On the energy benefits of 
spiking deep neural networks: A case study,” in Neural Networks (IJCNN), 
2016 International Joint Conference on. IEEE, 2016, pp. 971–976. 
[9] S. Ghosh-Dastidar and H. Adeli, “Spiking neural networks,” 
International journal of neural systems, vol. 19, no. 04, pp. 295–308, 2009. 
[10] S. Sharma, A. Mukherjee, A. Dongre, and M. Sharad, “Ultra low 
power sensor node for security applications, facilitated by algorithm 
architecture co-design,” in VLSI Design and 2017 16th International 
TABLE X 
ESTIMATION OF ENERGY CONSUMPTION DURING SNN INFERENCE 
BASED ON DIFFERENT SPIKE PROCESSORS IN LITERATURE 
Reference 
Energy/spike, 
(E) 
pJ/spike 
Energy consumed 
during SNN inference (E 
× Nspk), pJ 
2003([15] 2850 19,95,000 
2004([16]) 10.9 7630 
2011([17]) 45 31,500 
2012([18]) 0.4 280 
2015([19]) 9.3 6510 
2017([20]) 0.004 2.8 
2018([21]) 0.014-1.4 9.8-980 
2018([22]) 4.3 3010 
 
 
TABLE IX 
ESTIMATION OF POWER/AREA OF THE DS-SNN SYSTEM BASED ON ASIC 
SYNTHESIS RESULTS 
Module 
Area 
(µm2) 
Power 
(µW) 
Computational 
block (DS FV 
generator) 
89066 6.89 
IF neuron 298 0.026 
PN comparator 8155 0.78 
Memory 1915 2.3 
Spike address 
generator 
1864 0.15 
Control FSM 1050 0.063 
 
 
 
Conference on Embedded Systems (VLSID), 2017 30th International 
Conference on. IEEE, 2017, pp. 101–106. 
[11] M. George and R. King, “Time encoded signal processing and 
recognition for reduced data, high performance speaker verification 
architectures,” in International Conference on Audio-and Video-Based 
Biometric Person Authentication. Springer, 1997, pp. 377–384. 
[12] G. P. Mazarakis and J. N. Avaritsiotis, “Vehicle classification in 
sensor networks using time-domain signal processing and neural networks,” 
Microprocessors and Microsystems, vol. 31, no. 6, pp. 381–392, 2007. 
[13] Y. Guo and M. Hazas, “Localising speech, footsteps and other 
sounds using resource-constrained devices,” in Information Processing in 
Sensor Networks (IPSN), 2011 10th International Conference on. IEEE, 
2011, pp. 330–341. 
[14] J. Salamon and J. P. Bello, “Deep convolutional neural networks 
and data augmentation for environmental sound classification,” IEEE Signal 
Processing Letters, vol. 24, no. 3, pp. 279–283, 2017. 
[15] G. Indiveri, “A low-power adaptive integrate-and fire neuron 
circuit," in Proceedings of the 2003 International Symposium on Circuits 
and Systems, 2003.ISCAS’03. vol. 4. IEEE, 2003, pp. IV-IV. 
[16] Y. J. Lee, J. Lee, Y.-B. Kim, J. Ayers, A. Volkovskii, A. 
Selverston, H. Abarbanel, and M. Rabinovich, \Low power real time 
electronic neuron vlsi design using subthreshold technique," in 2004 IEEE 
International Symposium on Circuits and Systems (IEEE Cat. No. 
04CH37512), vol. 4. IEEE, 2004, pp.IV-744. 
[17] P. Merolla, J. Arthur, F. Akopyan, N. Imam, R. Manohar, and D. 
S. Modha,”A digital neurosynaptic core using embedded crossbar memory 
with 45pj per spike in 45nm," in 2011 IEEE custom integrated circuits 
conference (CICC).IEEE, 2011, pp. 1-4. 
[18] J. M. Cruz-Albrecht, M. W. Yung, and N. Srinivasa, “Energy-
efficient neuron, synapse and stdp integrated circuits," IEEE transactions 
on biomedical circuits and systems, vol. 6, no. 3, pp. 246-256, 2012. 
[19] X. Wu, V. Saxena, K. Zhu, and S. Balagopal, \A cmos spiking 
neuron for brain-inspired neural networks with resistive synapses andin situ 
learning,"IEEE Transactions on Circuits and Systems II: Express Briefs, 
vol. 62, no. 11, pp. 1088-1092, 2015. 
[20] I. Sourikopoulos, S. Hedayat, C. Loyez, F. Danneville, V. Hoel, E. 
Mercier, and A. Cappy, \A 4-fj/spike artificial neuron in 65 nm cmos 
technology," Frontiers in Neuroscience, vol. 11, p. 123, 2017. 
[21] V. Saxena, X. Wu, and K. Zhu, \Energy-efficient cmos 
memristive synapses for mixed-signal neuromorphic system-on-a-chip," in 
2018 IEEE International Symposium on Circuits and Systems (ISCAS). 
IEEE, 2018, pp. 1-5. 
[22] J. Shamsi, K. Mohammadi, and S. B. Shokouhi, \A hardware 
architecture for columnar-organized memory based on cmos neuron and 
memristor crossbar arrays," IEEE Transactions on Very Large Scale 
Integration (VLSI) Systems, no. 99, pp. 1-11, 2018. 
 
[23] Shin, Sangho, Kyungmin Kim, and Sung-Mo Kang. "Memristor-
based fine resolution programmable resistance and its applications." 2009 
International Conference on Communications, Circuits and Systems. IEEE, 
2009. 
[24] Berdan, R., T. Prodromakis, and C. Toumazou. "High precision 
analogue memristor state tuning." Electronics letters 48.18 (2012): 1105-
1107. 
[25] Sharad, Mrigank, Deliang Fan, and Kaushik Roy. "Ultra low 
power associative computing with spin neurons and resistive crossbar 
memory." Proceedings of the 50th Annual Design Automation Conference. 
ACM, 2013. 
[26] Kim, Kuk-Hwan, et al. "A functional hybrid memristor crossbar-
array/CMOS system for data storage and neuromorphic applications." Nano 
letters 12.1 (2011): 389-395. 
[27] Pershin, Yuriy V., and Massimiliano Di Ventra. "Practical 
approach to programmable analog circuits with memristors." IEEE 
Transactions on Circuits and Systems I: Regular Papers 57.8 (2010): 1857-
1864. 
[28] Hu, Miao, et al. "Dot-product engine for neuromorphic 
computing: Programming 1T1M crossbar to accelerate matrix-vector 
multiplication." Proceedings of the 53rd annual design automation 
conference. ACM, 2016. 
[29] Adhikari, Shyam Prasad, et al. "A circuit-based learning 
architecture for multilayer neural networks with memristor bridge 
synapses." IEEE Transactions on Circuits and Systems I: Regular Papers 
62.1 (2014): 215-223.
 
