Syracuse University

SURFACE at Syracuse University
Dissertations - ALL

SURFACE at Syracuse University

Winter 12-22-2021

Algorithm Hardware Codesign for High Performance
Neuromorphic Computing
Haowen Fang
Syracuse University

Follow this and additional works at: https://surface.syr.edu/etd
Part of the Computer Engineering Commons

Recommended Citation
Fang, Haowen, "Algorithm Hardware Codesign for High Performance Neuromorphic Computing" (2021).
Dissertations - ALL. 1381.
https://surface.syr.edu/etd/1381

This Dissertation is brought to you for free and open access by the SURFACE at Syracuse University at SURFACE at
Syracuse University. It has been accepted for inclusion in Dissertations - ALL by an authorized administrator of
SURFACE at Syracuse University. For more information, please contact surface@syr.edu.

Abstract
Driven by the massive application of Internet of Things (IoT), embedded system and
Cyber Physical System (CPS) etc., there is an increasing demand to apply machine intelligence on these power limited scenarios. Though deep learning has achieved impressive
performance on various realistic and practical tasks such as anomaly detection, pattern
recognition, machine vision etc., the ever-increasing computational complexity and model
size of Deep Neural Networks (DNN) make it challenging to deploy them onto aforementioned scenarios where computation, memory and energy resource are all limited. Early
studies show that biological systems’ energy efficiency can be orders of magnitude higher
than that of digital systems. Hence taking inspiration from biological systems, neuromorphic computing and Spiking Neural Network (SNN) have drawn attention as alternative solutions for energy-efficient machine intelligence.
Though believed promising, neuromorphic computing are hardly used for real world
applications.

A major problem is that the performance of SNN is limited compared

with DNNs due to the lack of efficient training algorithm.

In SNN, neuron’s output

is spike, which is represented by Dirac Delta function mathematically.

Becauase of

the non-differentiable nature of spike, gradient descent cannot be directly used to train
SNN. Hence algorithm level innovation is desirable.

Next, as an emerging computing

paradigm, hardware and architecture level innovation is also required to support new
algorithms and to explore the potential of neuromorphic computing.
In this work, we present a comprehensive algorithm-hardware codesign for neuromorphic
computing. On the algorithm side, we address the training difficulty. We first derive a
flexible SNN model that retains critical neural dynamics, and then develop algorithm to
train SNN to learn temporal patterns. Next, we apply proposed algorithm to multivariate

time series classification tasks to demonstrate its advantages. On hardware level, we develop
a systematic solution on FPGA that is optimized for proposed SNN model to enable high
performance inference. In addition, we also explore emerging devices, a memristor-based
neuromorphic design is proposed. We carry out a neuron and synapse circuit which can
replicate the important neural dynamics such as filter effect and adaptive threshold.

To my parents Minfang Zhao and Songshan Fang.

algorithm hardware codesign for
high performance neuromorphic
computing
By

Haowen Fang
B.S., Heilongjiang University, 2013
M.S., Syracuse University, 2018

DISSERTATION

Submitted in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Electrical and Computer Engineering

Syracuse University
December 2021

Copyright © 2021 Haowen Fang
All Rights Reserved

Acknowledgements
Finally this unexpected journey is over. I cannot forget the afternoon in November
2016.

After the final exam of VLSI design, I was asked if I were interested in pur-

suing a PhD. That moment has changed my life path.

I never thought it could be

this hard, however I am fortunate that I get generous support from so many people
along this tough journey.

This thesis could not have been possible without them.

Here I would like to express my most sincere gratitude.
Firstly, I would like to sincerely appreciate my advisor Dr. Qinru Qiu. She offered
me this opportunity and provided guidance throughout my research. Her questions always inspire me to explore new directions and ideas. And her enthusiasm constantly motivates me. I had a tough time as a novice, and was not able to deliver anything for a
long time, I appreciate her patience and encouragement. I remember that after I finished my first paper draft, she carefully revised it and provided lots of detailed and incisive feedback, almost everywhere of the paper was edited and marked with numerous
comments. That moment showed me what a true scholar is.
I also would like to express my appreciation to defense committee: Dr. Biao Chen, Dr.
C.Y. Roger Chen, Dr. Mustafa Cenk Gursory, Dr. Fanxin Kong, Dr. Lixin Shen, Dr.
Sucheta Soundarajan for their valuable feedback. I am also grateful to Dr. Yanzhi Wang,
who recognized me. His recommendation has changed my career path. I also learned a
lot from Dr. Wang’s inspirational and insightful talks.
Then, I would like to extend my sincere thanks to all collaborators, labmates and friends
I met in past few years: Dr. Khadeer Ahmed, Wentian Bai, Wentan Bai, Dr. Ruizhe Cai, Dr.
vi

Caiwen Ding, Zhao Jin, Dr. Hongjia Li, Mingyang Li, Dr. Yilan Li, Dr. Zhe Li, Ziru Li, Jiayang Liu, Dr. Yantao Lu, Chen Luo, Xiaolong Ma, Zaidao Mei, Dr. Krittaphat Pugdeethosapol, Dr. Ao Ren, Danial Patrick Rider, Dr. Amar Shrestha, Brady Taylor, Jindi Wu, Geng
Yuan, Dr. Tianyun Zhang, Dr. Ziyi Zhao. I feel lucky because of your support and company.
The precious memories and friendship during the years I spend in Syracuse will not perish.
My gratitude also goes to many friends on the other side of the earth, who have been
dumpsters of my negative emotion for years. Thank you for listening, encouragement and
tolerance, and also thank you for not having blocked me yet.
Lastly but most importantly, I would like to dedicate this thesis to my family, the
most important people in my life, my parents Songshan Fang and Minfang Zhao. Thank
you for enduring love, unconditional support and sacrifices.

Thank you for encourag-

ing me to make such a bold decision, to experience, to try and to explore. I owe you
too much. Thank you for everything I can think of.
This is the third winter since COVID-19 outbreak.
work, and research rather difficult.

I hope family members and friends can reunite,

and hope everyone I care stay safe, healthy and happy.

vii

The pandemic has made life,

Table of Contents

List of Figures

xi

List of Tables

xiii

1 Introduction

1

1.1

Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.2

Outline and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

2 Background

5

2.1

Spiking Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

2.2

Neural Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

3 Exploiting Neuron and Synapse Filter Dynamics for High Performance
SNN Training

13

3.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

3.2

Neuron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

3.3

Neuron and Synapse as IIR Filters . . . . . . . . . . . . . . . . . . . . . . .

17

3.4

Spatial Temporal Error Propagation

. . . . . . . . . . . . . . . . . . . . . .

21

3.5

Spike Function Gradient Approximation . . . . . . . . . . . . . . . . . . . .

23

3.6

Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

3.6.1

Associative Memory . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

3.6.2

Vision Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

viii

3.6.3

Time Series Classification . . . . . . . . . . . . . . . . . . . . . . . .

29

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

4 Multivariate Time Series Classification Using Spiking Neural Networks

31

3.7

4.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

4.2

SNN Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

4.3

Spatial Temporal Population Encoding . . . . . . . . . . . . . . . . . . . . .

37

4.4

Training Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

4.5

Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

4.5.1

Coding Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

4.5.2

Time Series Classification . . . . . . . . . . . . . . . . . . . . . . . .

44

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

4.6

5 Systematic Optimization for Spiking Neural Network in FPGAs

47

5.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

5.2

Neural Coding Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

5.2.1

Time Coding by LIF Neuron . . . . . . . . . . . . . . . . . . . . . . .

49

5.2.2

Optimizing Population-temporal Neural Encoding by Gradient Descent 51

5.3

Hardware-friendly and Unified Representation of SNN . . . . . . . . . . . . .

52

5.4

Hardware Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

5.4.1

Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . .

55

5.4.2

Performance Analysis and Optimization . . . . . . . . . . . . . . . .

57

5.4.3

End-to-end Framework . . . . . . . . . . . . . . . . . . . . . . . . . .

58

Experiment and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

5.5.1

Impact of Encoding Window . . . . . . . . . . . . . . . . . . . . . . .

59

5.5.2

Neural Coding Evaluation . . . . . . . . . . . . . . . . . . . . . . . .

61

5.5.3

Hardware Performance . . . . . . . . . . . . . . . . . . . . . . . . . .

64

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

5.5

5.6

ix

6 Neuromorphic Algorithm-hardware Codesign for Temporal Pattern Learning

68

6.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

6.2

Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

6.3

Training Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

6.4

Hardware Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

6.5

Experiments and Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . .

79

6.5.1

Spiking Dataset Classification . . . . . . . . . . . . . . . . . . . . . .

79

6.5.2

Spatial Temporal Pattern Association . . . . . . . . . . . . . . . . . .

81

6.5.3

Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

6.6

7 Conclusion

86

7.1

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

7.2

Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87

x

List of Figures
2.1

Spike Response Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

2.2

Shape of PSP kernels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

2.3

Spike train examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

2.4

Australian Sign Language dataset sample. . . . . . . . . . . . . . . . . . . .

10

2.5

Rate coding and temporal coding comparison. . . . . . . . . . . . . . . . . .

12

3.1

General neuron model as IIR filters. . . . . . . . . . . . . . . . . . . . . . . .

18

3.2

Spatial temporal data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

3.3

Spatial temporal input and output spike patterns of associative memory network. 24

3.4

Intermediate layer output spike rate and Van Rossum distance. . . . . . . . .

25

3.5

Learned synapse impulse response . . . . . . . . . . . . . . . . . . . . . . . .

27

3.6

Training performance comparison. . . . . . . . . . . . . . . . . . . . . . . . .

28

4.1

Spiking neuron model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

4.2

PSP kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

4.3

Coding method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

4.4

Articulary Word Recognition dataset sample.

. . . . . . . . . . . . . . . . .

43

4.5

Coding comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

46

5.1

Time encoding by LIF neuron. . . . . . . . . . . . . . . . . . . . . . . . . . .

50

5.2

Overall Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

5.3

PE design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

xi

5.4

Filter bank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

5.5

Framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

5.6

Accuracy with different encoding window. . . . . . . . . . . . . . . . . . . .

62

6.1

Synapse and adaptive threshold dynamic. . . . . . . . . . . . . . . . . . . . .

72

6.2

Unfolded network.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

6.3

System structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

77

6.4

N-MNIST and SHD dataset samples. . . . . . . . . . . . . . . . . . . . . . .

79

6.5

Pattern association input and output samples. . . . . . . . . . . . . . . . . .

82

6.6

Diagram of synapse filter, RRAM crossbar, and neuron circuit. . . . . . . . .

83

6.7

Circuit simulation results. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

6.8

Accuracy under different quantization level and process variation. . . . . . .

84

xii

List of Tables
3.1

Results on vision datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

3.2

Results on temporal datasets. . . . . . . . . . . . . . . . . . . . . . . . . . .

29

4.1

Coding efficiency of rate coding and temporal coding. . . . . . . . . . . . . .

44

4.2

Accuracy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

5.1

Dataset statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

61

5.2

Accuracy of different coding method. . . . . . . . . . . . . . . . . . . . . . .

63

5.3

Synthesis result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

5.4

Platform configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

5.5

Cross-platform Comparison on MNIST . . . . . . . . . . . . . . . . . . . . .

65

5.6

Cross-platform Comparison on MTSC. . . . . . . . . . . . . . . . . . . . . .

65

6.1

Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

78

6.2

Classification Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

xiii

Chapter 1
Introduction
Deep Neural Networks (DNNs) have shown superior performance in various applications
including computer vision, natural language processing, anomaly detection, speech recognition etc. However the ever-increasing scale and complexity of DNN also pose challenges to
conventional Von Neuman architecture. In addition, with the massive application of Internetof-Things (IOT), Cyber Physical Systems (CPS) and edge computing techniques, there is an
increasing demand to perform cognitive tasks such as objection detection, signal classification
etc. in these energy-limited scenarios. However the model size and computation complexity
of DNN make it difficult to deploy on energy-limited devices. Meanwhile, biological systems
have shown energy efficiency many orders of magnitude higher than digital systems on certain
cognitive tasks [1]. Such efficiency motivates the research of neuromorphic computing.
Inspired by human brain, neuromorphic computing covers a wide range of research topics, ranging from low level device technology to high level cognitive science. By studying
the biological and physical characteristics of neural system, neuromorphic computing aims
at developing novel algorithm, computing model, cognitive model, non-conventional architecture and devices for machine intelligence applications [2–13].
As the underlying computational model, Spiking Neural Network (SNN) is an important branch of neuromorphic computing. Compared with DNN, SNN incorporates more

1

biological characteristics. For example, DNN can generally be formulated as matrix-vector
multiplication with non-linear activation functions. Neuron in DNN is a highly abstract
mathematical model, essentially the dot product operation, and the connections (synapses)
are represented by weight matrix. In contrast, the neuron in SNN is closer to physical neuron and behaves like a dynamical system. In other words, a spiking neuron is stateful, its
states and output are dependent on both current input and previous states. The synapses
in SNN are not simply represented by weights matrices, they act as filters. [14,15] show that
an individual spiking neuron can classify tens of temporal patterns, and credit such ability
to the synapse filter effect. Another distinct feature of SNN is that the neuron’s output
is spike, information in SNN is represented and transmitted as discrete spike events [16].
Physically, a spike is an electrical pulse in biological systems, and it is modeled by the
Dirac Delta function in SNN mathematical model [17]. Spike enables low power information
transmission and simplified hardware design as it is just a one bit event.

1.1

Motivation

Spike, the unique feature of SNN, is non-differentiable, posing challenges to gradient descentbased training [5,18,19]. The performance of SNN in terms of accuracy is generally degraded
compared with DNN on various classification tasks due to lack of efficient training algorithms.
And how brain encodes and processes information is not well understood. As an emerging
research topic, there are open questions in SNN research, for example:
• Image, speech and physical stimulus are all real numbers, how can we accurately and
efficiently map values in the real number domain to the spike domain?
• The non-differentiable spike activation and model complexity pose challenges to training, how can a SNN model be trained efficiently?
• To utilize the feature of SNN to benefit hardware design, how can we jointly optimize
SNN model and hardware design?
2

In this work, we tackle above questions and provide comprehensive solutions, which
include efficient neural coding, novel SNN model, training algorithm and algorithm hardware
codesign.

1.2

Outline and Contributions

The organization and contributions are summarize as the follows:
1. In Chapter 2, we briefly review fundamentals of SNN, including basic concepts, various
SNN models, and neural coding.
2. In Chapter 3, we derive a general representation of SNN as a network of filters with
neuron non-linearity. This model captures neuron’s filter effect, which is critical for
temporal pattern learning. Next, a training algorithm is developed to train such SNN
model to learn both rate-coded and temporal coded patterns. The model and algorithm
are evaluated on various datasets including MNIST, neuromorphic MNIST, DVS128
gesture, and outperform state of the art approaches.
3. In Chapter 4, we propose a coding scheme to convert multivariate time series to sparse
spatial-temporal time spike trains, and apply the coding scheme and training algorithm
on time series classification applications. Experiments show that accuracies of the
resulting SNN are comparable to DNNs.
4. In Chapter 5, we propose a systematic optimization methodology including various
aspects of SNNs for FPGA: 1) A neuron coding scheme and an algorithm to optimize
neural encoder parameters; 2) A flexible model, which reduces model complexity and
retains critical SNN dynamics; 3) A novel hardware and an end-to-end framework to
optimize and deploy software model to FPGAs.
5. In Chapter 6, we present an algorithm-hardware codesign framework for memristor
based neuronmorphic system. A training algorithm for hardware friendly Leaky Inte3

grate and Fire (LIF) spiking neuron model is derived. The model and algorithm process
temporal patterns without recurrent connections, hence are friendly for crossbar-based
hardware designs. The algorithm achieves competitive accuracy on common SNN
datasets. We also design a neuron circuit for memristor-based architectures, it replicates the dynamics of proposed model.

4

Chapter 2
Background
In this chapter, we first briefly review basic concepts of SNN and different SNN models.
Next we discuss different neural coding strategies and their trade-offs.

2.1

Spiking Neural Network

Although SNN models are highly diverse by reflecting different levels of biological
details and incorporating different biological aspects, most SNN models share common principles:

1) Information is represented and processed as spikes; 2) Neurons

output spiking activities; 3) Neurons are stateful, the model contains at least one
state variable to characterize the temporal behavior.
A common practice to describe spiking neuron is to model it as a hybrid system, consisting
of Ordinary Differential Equations (ODE) and event-triggered update. The system contains
some state variables that evolve continuously according to ODEs. Incoming spike triggers
update of state variables. And when some state variables such as membrane potential exceeds
threshold, it triggers spike generation and update of state variables [20].
A typical Leaky Integrate and Fire (LIF) neuron defined in hybrid form is shown below:

5

Synapses

Membrane potential

PSP
Input
spikes

Output spikes

...

∑
LIF Neuron

Figure 2.1: Spike Response Model.

N

X
dv(t)
τ
= −v(t) +
wi xi (t)
dt
i
v(t) ← vrest

(2.1a)

if v(t) = ϑ

(2.1b)

where v(t) is neuron’s membrane potential, τ is a time constant, which controls the decay
speed of membrane potential, xi (t) is input of the ith synapse, wi is synaptic weight, M is
the synapse number, ϑ is threshold. When v(t) exceeds ϑ, v(t) is set to 0.
LIF model is widely used for neurmorphic computing research.

Though highly

simplified compared with biological neuron, it provides good approximation to biological neurons.

More importantly, it incorporates the most critical property,

the neuron nonlinearity i.e.

the spike [21].

Another way to describe the spiking neuron is Spike Response Model (SRM). SRM describes spiking neuron from the perspective of filters [17, 22]. In SRM, each input spike
induces a charge, which is called postsynaptic potential (PSP) on synapse, defined as:

6

P SP (t) =

ti <t
X

k(t − ti )

(2.2)

ti

where ti is the time of ith spike, K(t) is the PSP kernel. The neuron’s potential is defined
as [23]:

v(t) =

M
X

ti <t

wi

j
X

i

tij

k(t − tij ) − ϑ

X

h(t − tjs )

(2.3)

tjs <t

where v(t) is membrane potential, wi is the weight ith synapse, tij is the time of jth
input spike at ith synapse, ϑ is threshold. tjs ∈ {tjs : v(tjs ) = ϑ} is the time when the
neuron generates jth spike. k(t) is synapse kernel, h(t) is reset kernel. Usually, h(t) =
t

e− τs [22]. Some commonly used synapse kernel k(t) are [24]:

t

Exponential kernel: k(t) = e− τs
Alpha kernel: k(t) =

(2.4)

t − τt
·e s
τs
t

(2.5)
−t

Dual exponential kernel: k(t) = e− τm − e τs

(2.6)

where τm and τs are two time constants. The shape of these kernels are shown in Figure
2.2.
There are works utilizing SRM and showing that spiking neuron has the capability to
classify temporal patterns [14, 15, 25–28]. Equation 2.2 and 2.3 intuitively explain the neuron’s capability to recognize temporal pattern. Neuron and synapse act as low-pass filters,
v(t) and P SP (t) are explicitly expressed as function of entire previous input and output history, as if the historical information is stored in the PSP and membrane potential. [14] [15]
credit the neuron’s capability of recognizing temporal pattern to the filter nature.
In fact, SRM can be derived from ODE model. In Section 3.3, we discuss the connections
between SRM and the ODE model, and derive an flexible and hardware-friendly model from
7

1.0

Exponential kernel
Alpha exponential kernel
Dual exponential kernel

0.8

PSP

0.6
0.4
0.2
0.0
0

20

40

Time

60

80

100

Figure 2.2: Shape of PSP kernels.
SRM.

2.2

Neural Coding

Neural coding refers to how information is represented by spikes. In DNN, neurons’ input and output are continuous real values. Data such as image, speech, sensor readings
are all in real domain, hence can be easily handled by DNN. However, a distinct property of SNN is that it accepts and generates spikes, therefore information in SNN has to
be represented and processed as spikes. As the bridge between real domain and spike
domain, neural coding is critical for downstream SNN processing. A nature question is
how to efficiently and accurately encode information by spikes.
How human brain and sensory system encode information is not well understood yet,
neural coding is still an open question. There are various hypothesis, generally can be
divided in to two types, rate coding and temporal coding.
Spike has no unit, sign or magnitude, therefore an individual spike can hardly carry
information. Information has to be represented by a group of spikes in a time period. The

8

idea of rate coding is straightforward. It treats spikes statistically, the number of spikes in
a time window resembles a numerical value in DNN. Spike rate in the encoding window is
proportional to the numerical value to be encoded. Rate coding has been widely adopted for
simplicity. Many SNN models use rate coding for image classification. Using spike count to
represent pixel value is a common practice [18,29–34]. However, it has significant drawbacks:
• It introduces latency. The information is only fully represented at the end of the
window. The spike rate cannot be accurately estimated until sufficient number of
spikes have been received. Particularly for weak stimulus, large window is required to
accumulate enough spike to get accurate estimation [21].
• Rate coding represents large values by dense spike activities, which impairs energy
efficiency of computing hardware, and may even poses challenge to neuromorphic chip
design [35–38].
• Rate coding may suffer from quantization error. The precision of rate coding is proportional to window size. With window size T , rate coding can only represent T different
magnitudes. Thought increase window size leads to higher precision, it is at the cost
of increased latency [39].
• Rate coding is inefficient to encode time-varying input as spike count in an encoding
window only represents one numerical value. This limits the application of SNN in real
time applications where inputs are continuous data streams.
Though rate coding agrees with the observation that the sensory system’s spike rate increases as the input stimuli intensity increases, biological studies also have shown that spike
rate does not account for all information, spike timing also conveys information, enabling
reliably encoding with low latency [17,40–42]. Temporal coding takes spike timing and spike
train temporal structure into account. Figure 2.3 illustrates the difference between rate coding and temporal coding. In Figure 2.3, an arrow indicates a spike event. In window of size 10,
9

0.5 ≈

0.5 ≈

0

10

0

10
Figure 2.3: Spike train examples.

0.6

Value

0.4
0.2
0.0
0

100

200

Time

300

400

Figure 2.4: Australian Sign Language dataset sample.
the two spike train have same number of spikes. From the perspective of rate coding, the two
spike train both represent value 0.5. However, from the perspective of temporal coding, these
two spike trains may represent different things as they have different temporal structure.
Temporal coding has some advantages over rate coding. It allows designer to accurately encode information with sparse spikes as information capacity of a spike train is
significantly increased when timing is taken into account [43]. Temporal coding also reduces latency as short window can still faithfully encode input [21]. In addition, temporal coding is naturally suitable for time-varying input.

10

We use Australian Sign Language dataset [44] as an example to demonstrate the advantage of temporal coding. This dataset consists of multivariate sensor data recorded by data
glove, which is a device to track hand movements.It records 22 attributes including X, Y,
Z positions, roll, pitch and yaw etc. An example of this dataset is shown in Figure 2.4,
each curve represents a time varying input. We use both rate coding and temporal coding
method in [39] to convert the time series to spikes. The spike patterns are shown in Figure
2.5. Each dot in figure 2.5 represents a spike. Due to the incapability of rate coding to
represent temporal information, the time series have to be flattened, resulting in 990 spike
trains. For clarity, only the first 100 spike trains are shown. In temporal coding, for each
variate, we use 5 neurons to encode. It can clearly be seen that the temporal population
coding is sparser. In addition, it encodes the input with 110 spike trains, which is significantly smaller than the number of spike trains obtained by rate coding. This is beneficial
to reduce the SNN model size. However, utilizing spike timing information is not trivial. It
requires novel training algorithm, we will address this challenge in Chapter 3.

11

(a) Rate-based input.

100

Neuron Index

80
60
40
20
0

0

100

200

300

400

Time
(b) Timing-based input

Figure 2.5: Rate coding and temporal coding comparison.

12

500

Chapter 3
Exploiting Neuron and Synapse Filter
Dynamics for High Performance SNN
Training
3.1

Introduction

Spiking neural networks have demonstrated their capability in signal processing and pattern detection by mimicking the behavior of biological neural systems. As mentioned in
Section 2.1, Neuron is a dynamic system, which is capable of spatial temporal information
processing. The network can memorize and detect spatial temporal patterns with an ability
superior to conventional artificial neural network (ANN) [18].
However, The potential of SNNs has not been fully explored. First of all, due to the lack of
unified and robust training algorithms, the performance of SNNs is still not comparable with
DNNs. Directly adapting backpropagation is not feasible because their output is a sequence
of Dirac delta functions, hence is non-differentiable. Secondly, most SNN models and training
algorithms use rate coding, representing a numerical value in DNN by spike counts in a time
window, and consider only the statistics of spike activities. Temporal structure of spike

13

train and spike timing also convey information [27]. Spike trains with similar rates may
have distinct temporal patterns representing different information. To detect the temporal
pattern in the spike train, novel synapse and neuron models with temporal dynamics are
needed. However, synapse dynamics are often ignored in the computational models of SNNs.
To address the problem with non-differentiable neuron output, one approach is to train an
ANN such as a multi-layer perceptron (MLP) and convert the model to an SNN. This method
is straightforward, but it requires additional fine-tuning of weights and thresholds [31]. There
are also works that directly apply backpropagation to SNN training by approximating the
gradient of the spiking function [35, 38, 45], or utilizing gradient surrogates [18, 46]. Other
approaches include using derivatives of soft spike [47] or membrane potential [48].
The ability of capturing temporal patterns relies on neuron and synapse dynamics [14].
Synapse function can be modeled as filters, whose states preserve rich temporal information.
The challenge is how to capture the dependencies between the current SNN states and
previous input spikes. This challenge has been addressed by some existing works. [14] and [15]
train individual neuron to classify different temporal spike patterns. [27] is capable to train
neurons to associate an input spatial temporal pattern with a specific output spike pattern.
However, the aforementioned works cannot be extended to multiple layers and therefore are
not scalable. Some recent works utilize backpropagation through time (BPTT) to address
the temporal dependency problems. [18] proposed simplified iterative LIF neuron model. [49]
derived an iterative model from a current based LIF neuron. Based on the iterative model,
network can be unrolled hence BPTT is possible. However, these works only consider the
temporal dynamics of membrane potential, the synapse dynamics and the filter effect of SNN
are ignored. There are also works that introduced the concept of filters into Multi Layer
Perceptron (MLP) [50, 51], which enabled MLP to model time series.
1. The dynamic behavior of LIF neuron is formulated by Infinite Impulse Response (IIR)
filters. We exploit the synapse and neuron filter effect, derive a general representation
of SNN as a network of IIR filters with neuron non-linearity.
14

2. A general algorithm is proposed to train such SNN to learn both rate-based and spatial
temporal patterns. The algorithm does not only learn the synaptic weight, but is
also capable to optimize the impulse response kernel of synapse filters to improve
convergence. The similar learning behavior has been discovered in biological systems
[52]. Our training algorithm can be applied to train simple LIF, and neurons with
more complex synapses such as alpha synapse, dual-exponential synapse etc.
3. Our algorithm is tested on various datasets including MNIST, neuromorphic MNIST,
DVS128 gesture, TIDIGITS and Australian Sign Language dataset, and outperform
state of the art approaches.

3.2

Neuron Model

Without loss of generality, we consider a LIF neuron with dual exponential synapse for
its biological plausibility, this model has been proved successful in various temporal pattern
learning tasks [14,15,49]. The neuron can be described as a hybrid system, i.e. the membrane
potential and synapse status evolve continuously over time, depicted by ODEs, while a spike
event triggers the update of the state variables as the following [20]:

M
η X
dv(t)
η−1
= −(v(t) − vrest ) + η
wi xi (t)
τm
dt
i

τs

dxi (t)
= −xi (t)
dt

(3.1a)
(3.1b)

xi (t) ← xi (t) + 1, upon receiving spike

(3.1c)

v(t) ← vrest , if v(t) = Vth

(3.1d)

Where xi is the state variable of the ith synapse, wi is the associated weight, and M is the
total number of synapses. τm and τs are time constants, and η = τm /τs . v and vrest are the
neuron membrane potential and rest potential. For simplicity, we set vrest = 0. Every synapse
15

has its own potential, which is called postsynaptic potential (PSP). Neuron accumulates PSP
of all input synapses. The membrane potential resets when an output spike is generated.
The ODE system is linear time-invariant (LTI). It can also be interpreted as the convolution of an impulse response of a filter with the input spike train, which leads to the
spike response model [17]. The relation between the v(t), O(t) and the historical spike
input can clearly be seen in the spike response model. We denote the input spike trains
P
as a sequence of time-shifted Dirac delta functions, Si (t) = j δ(t − tji ), where tji denotes
the jth spike arrival time from the ith input synapse. Similarly, output spike train can
P
be defined as O(t) =
δ(t − tf ), tf ∈ {tf : v(tf ) = Vth }. To simplify the discussion,
we consider only one synapse. The impulse response kernel k(t) of a neuron described
by above ODE system is obtained by passing a single spike at time 0 at the input, such
that the initial conditions are x(0) = 1 and v(0) = 0. By solving equation 3.1a and 3.1b,
η

−t

−t

we have k(t) = η η−1 (e τm − e τs ). Given the general input S(t), PSP is the convolution
of k(t) and S(t). For a neuron with M synapses, without reset, the sub-threshold membrane potential is the summation of all PSPs, such that:

vsub (t) =

M
X
i

Z

∞

k(s)Si (t − s)ds

wi

(3.2)

0

In hybrid model, the reset is modeled by simply setting v to vrest , and regarding the reset
as the start of the next evaluation and discarding the neuron’s history information. A more
biological way is to treat reset as a negative current impulse applied to the neuron itself [17].
−t

The reset impulse response is h(t) = −Vth e τr , where τr controls the decay speed of reset
impulse. Such that the membrane potential is the summation of all PSPs and reset voltage:
Z
v(t) = −

∞

h(s)O(t− s)ds+
0

M
X
i

Z
wi

∞

k(s)Si (t− s)ds

(3.3)

0

Treating reset as a negative impulse enables adaptive threshold, which is observed in
biological neurons. Neuron’s threshold depends on its prior spike activity. With adap-

16

tation, frequent spike activity increases the reset voltage, which inhibits the neuron activity, preventing SNNs from over-activation. Such that additional tuning methods such
as weight-threshold balancing [31] is not necessary.
Above equations reveal the filter nature of the biologically realistic neuron model. Each
synapse act like a low pass filter. Synapse filter is causal, and the kernel is defined to decay
over time, hence the current state of the PSP is determined by all previous input spikes up to
current time. The temporal dependency calls for temporal error propagation in the training.

3.3

Neuron and Synapse as IIR Filters

In practice, for computational efficiency, spiking neural network are usually simulated
in discrete time domain and network states are evaluated for every unit time.

The

discrete time version of equation 3.3 can be written as:

v[t] =

X

h[s]O[t − s] +

s

M
X

wi

X

k[s]Si [t − s]

(3.4)

s

i

where t ∈ Z≥0 . It is clear that v[t] is a combination of a reset filter and multiple
synapse filters.

However, the above form is not practical for implementation be-

cause of infinite convolution coefficients.

We express the above system using Linear

Constant-Coefficient Difference Equations (LCCD):

v[t] = −Vth r[t] +

M
X

wi fi [t]

(3.5a)

r[t] = e τr r[t − 1] + O[t − 1]

(3.5b)

fi [t] = α1 fi [t − 1] + α2 fi [t − 2] + βx[t − 1]

(3.5c)

i
−1

where fi [t] denotes the ith synapse filter, which is a second order IIR filter, r[t] is the
−1

−1

reset filter, α1 = e τm + e τs , α2 = −e−
17

τm +τs
τm τs

−1

−1

, β = e τm − e τs .

Synapse filter 1

S1[t]
Z-1

×

F1[t]

×
βq

×
···

-ϑr

Z-1

β1

w1

...

Neuron filter

Synapse filter j

Sj[t]
Z-1

×

···

βq

λ

Fj[t]

Z-1
Z-1

wj

×
Z-1

×

× αp

×

α0

×θ

Z-1

×
Z-1

-R[t]

× αp

···

α0

Reset filter

×
···

O[t]

-1

Z

β1

V[t]

U[t]

Figure 3.1: General neuron model as IIR filters.
Introducing synapse dynamics could cause significantly large computation overhead because the number of synapses is quadratic to the number of neurons. Maintaining such
large number of synaptic states is infeasible. In a biological system, spikes are transmitted
through axons, an axon connects to multiple destination neurons through synapses. Therefore, the synapses that connect to the same axon have identical spike history hence same
states. Based on this observation, tracking the states of synapses that have the same fan-in
neuron is unnecessary as these synapses can share the same state and computation.
Neuron itself can also be a filter and v[t] may also rely on its previous states. We
can extend equation 3.5a - 3.5b to a more general form, such that the SNN can be interpreted as a network of IIR filters with non-linear neurons:

18

Vil [t] = λVil [t − 1] + Iil [t] − Vth Ril [t]

(3.6a)

Nl−1

Iil [t]

=

X

l
wi,j
Fjl [t]

(3.6b)

j

Ril [t] = θRil [t − 1] + Oil [t − 1]
Fjl [t]

=

P
X

l
αj,p
Fjl [t

− p] +

Q
X

l
βj,q
Ojl−1 [t − q]

(3.6c)
(3.6d)

q=0

p=1

Oil [t] = U (Vil [t] − Vth )

(3.6e)

U (x) = 0, x < 0 otherwise 1

(3.6f)

Where l and i denote the index of layer and neuron respectively, and j denotes input
index and t is the time, Nl is number of neurons in lth layer. Vil [t] is neuron membrane
potential. Iil [t] is weighted input. Ril [t] is reset voltage, Flj [t] is PSP. OiL [t] is spike function,
and U (x) is a Heaviside step function. P and Q denote the feedback and feed forward
l
l
are coefficients of neuron filter, reset filter and synapse filter
and βj,q
orders. λ, θ, αj,p

respectively. 3.6d is a general form of IIR filters, it allows PSP to be arbitrary shapes. The
above formulation is not specific to neuron models and it provides a flexible and universal
representation, it is capable of describing more complex spiking neuron models than LIF
neuron. For example, by setting α1 = 2e

−1
τ

, α2 = −e

−2
τ

1

, αp = 0, p ∈ {2, 3, ..., P }, β1 = τ1 e− τ

and βq = 0, q ∈ {0, 2, 3, ..., Q}, it models neuron with alpha synapse. By setting αp = 0, p ∈
{1, 2, ..., P }, β0 = 1, βq = 0, q ∈ {1, 2, ...Q}, the synapse filter is removed, the model becomes
simple LIF neuron as in [31, 49]. Based on 3.6a – 3.6f, a general model of spiking neuron
can be represented as a network of IIR as shown in Figure 3.1. Axonal delay is explicitly
l
modeled in equation 3.6d by delayed input βj,q
Ojl−1 [t − q], hence it enables more complex

and biologically plausible temporal behavior. Neurons can also have heterogeneous synapses,
i.e. the synapses’ feed forward order and feedback order can vary across layers. To avoid

19

...

...

...

F[t]

F[t+1]

O[t-1]

O[t]

O[t+1]

U[t-1]

U[t]

U[t+1]

V[t-1]

V[t]

V[t+1]

R[t-1]

R[t]

R[t+1]
I[t+1]

F[t-1]

F[t]

F[t+1]

O[t-1]

O[t]

O[t+1]

Time t-1

Spatio path

...

I[t]

...

I[t-1]

...

Layer n+1
Layer n
Layer n-1

F[t-1]

Time t

Time t+1

Temporal path
Figure 3.2: Spatial temporal data flow.

notation clutter, we assume that all neurons in this paper have homogeneous synapse types.
Equation 3.6a to 3.6f provide an explicitly iterative way to model synapse and
neuron dynamics, hence it is possible to unfold the network over time and apply
BPTT. The spatial and temporal data flow and unfolded network with second order
synapse filter are shown in Figure 3.2.

Similar formulations can be found in [18, 49].

However they are aimed at specific neuron models.

20

3.4

Spatial Temporal Error Propagation

We discuss the spatial temporal backpropagation in the context of two learning tasks.
In the first, the neuron that fires most represent the correct result.

Since this is a

classification task, we use cross-entropy loss and spike count of the output neuron
represents the probability.

Loss is defined as:

Erate = −

NL
X

yi log(pi )

(3.7)

i

pi is given by:
P
exp ( Tt OiL [t])
pi = PNL
PT L
t Oj [t])
j=1 exp (

(3.8)

where yi is the label, L is number of layers, OiL [t] denotes output of last layer.
In the second learning task, the goal is to train SNN to generate spikes at specified times
such that the output spike pattern O[t] is spatially and temporally similar as the target spike
pattern Starget [t]. We refer to it as temporal learning. The loss function of the learning is
the distance between the actual output spike trains and the target spike trains. Inspired
by Van Rossum distance, we pass the actual and target spike train through a synapse filter
k[t], to convert them to continuous traces. The loss is defined as:

Edist

NL X
T
1 X
i
=
(k[t] ∗ OiL [t] − k[t] ∗ Starget
[t])2
2T i=1 t=1

(3.9)

i
where Starget
[t] is the ith spike train of target spike patterns.

For both tasks, we define: δil [t] =

∂E
,
∂Oil [t]

li [t] =

∂U (Vil [t]−Vth )
,
∂Vil [t]

κli [t] =

∂Vil [t+1]
.
∂(Vil [t])

Please

note that the spike activation function U (x) is not differentiable. Its approximation will be
discussed in section 3.5. By unfolding the model into spatial path and temporal path as
shown in Figure 3.2, BPTT can be applied to train the network. κli [t] can be computed as:

21

κli [t] =

∂Vil [t + 1]
= λ − Vth li [t]
l
∂(Vi [t])

(3.10)

δil [t] an be computed recursively as follows:

δil [t]

=

Q Nl+1
X
X
q=0

j

∂Ojl+1 [t + q]
∂E
∂E
∂Oil [t + 1]
+
∂Ojl [t + q] ∂Oil [t]
∂Oil [t + 1] ∂Oil [t]

(3.11)

where
∂Oil [t + 1]
∂Oil [t + 1] ∂Vil [t + 1] ∂Ril [t + 1]
=
∂Oil [t]
∂Vil [t + 1] ∂Ril [t + 1] ∂Oil [t]

(3.12)

= −Vth δil [t + 1]li [t + 1]

(3.13)

∂Ojl+1 [t + q] ∂Vjl+1 [t + q] ∂Ijl+1 [t + q]
∂Ojl+1 [t + q]
=
∂Oil [t]
∂Vjl+1 [t + q] ∂Ijl+1 [t + q] ∂Oil [t]
l+1 l+1
l+1
= βj,q
δj [t + q]l+1
j [t + q]wj,i

(3.14)

Where δil [t + q] = 0 for t + q > T . Unlike LSTM/RNN, or SNN such as [18, 49], there
may be dependency from layer l + 1 to layer l at multiple time steps due to axonal delay.
Based on above equations, error can propagate recursively.
In real biological system, synapses may respond to spike differently. The PSP kernels
can be modulated by input as part of the synaptic plasticity [52]. It is possible to employ
gradient descent to optimize the filter kernels in equation 3.6d [51].
Above learning rule assumes the SNN to be an LTI system. The loss calculation, error
propagation, filter coefficients and synaptic weights update are performed at the end of each
training iteration. Therefore, within one iteration, the SNN is still linear time-invariant.

22

3.5

Spike Function Gradient Approximation

The non-differentiable spike activation is a major road-block for applying backpropagation. One solution is to use a gradient surrogate [47]. In the forward path, a spike is
still generated by a hard threshold function, while in the backward path, the gradient of
the hard threshold function is replaced by a smooth function. One of such surrogates can
be spike probability [45, 47]. Although the LIF neuron is deterministic, stochasticity can
be obtained from noise [53]. Under Gaussian noise of mean 0 and variance σ, in a short
interval, LIF neuron can behave like a Poisson neuron such that the spike probability is
a function of the membrane potential v as follows:
Vth − v
1
)
P (v) = erfc( √
2
2σ
where erfc(x) represents a complementary error function.

(3.15)
With this replacement,

the gradient of U (x) can be approximated as:
(Vth −v)2

∂P (v)
e− 2σ2
∂U (v)
≈
= √
∂v
∂v
2πσ

3.6

(3.16)

Experiments

Proposed model and algorithm are implemented in PyTorch. We demonstrate the effectiveness using three experiments; the first experiment is a non-trivial generative task using associative memory; the second is vision classification, and the third is to classify
temporal patterns.

In following experiments, we use Adam optimizer, learning rate is

set to 0.0001, batch size is 64. We employ synapse model depicted by equation 3.5c, in
−1

which τm = 4, τs = 1, λ = 0, θ = e τm , Vth = 1.

23

Time

Neuron index

Time

Neuron index

Input spike train index

Time

Output (epoch = 50)

Neuron index

Noise

Output (epoch = 5)

Neuron index

Input spike train index

Input (jitter + noise + deletion)

Deletion

Time

(a) Original patterns

Time

Time

(b) Input and output spike trains

Figure 3.3: Spatial temporal input and output spike patterns of associative memory network.

3.6.1

Associative Memory

An associative memory network retrieves stored patterns that most closely resembles the
one presented to it. To demonstrate the capability of our approach to learn complex spatial
temporal spike patterns, we train a network of structure 300-500-200-500-300. We generate 10 spatial temporal spike train patterns, each contains 300 spike trains of length 300,
samples of these patterns are shown in Figure 3.3a. Each dot corresponds to a spike event,
the x-axis represents the time, and the y-axis represents the spike train index. The SNN
is trained to reconstruct the pattern. First column of 3.3b shows two noisy sample inputs.
Noisy samples are formed by adding random noise, which includes obfuscation and deletion
of some part of the patterns, jitter in input spikes’ timing following a Gaussian distribution and random background spikes. After 50 epochs of training, the network is able to
reconstruct the original patterns and remove background noise. Corresponding outputs at
epoch 5 and 50 are shown in 3.3b. Such a task is difficult for rate-based training methods as they are not capable of capturing temporal dependencies. It is noteworthy that the
intermediate layer has 200 neurons, which is smaller than the input layer. And the intermediate layer is learning the spatial and temporal representation of the input patterns.

24

Input sample index

Label

Neuron index

(a) Intermediate layer spike rate
Input sample index

Label

Input sample index

(b) van Rossum distance
Figure 3.4: Intermediate layer output spike rate and Van Rossum distance.
Thus, this network also acts like a spatial temporal auto-encoder.
We drove the input of the network with 64 different testing samples and record output
of 200 neurons in the intermediate layer. Figure 3.4a color codes the spiking rate of those
neurons. The x-axis gives the index of the neurons, and y-axis gives the index of different
testing samples. Those samples belong to 10 different classes, and are sorted so that data
of the same class are placed close to each other vertically. The 10 different colors on the
left side bar indicate each of the 10 classes. The pixel (x, y) represents the spiking rate of
neuron x given testing sample y. Spiking rate of any neuron is almost a constant regardless
25

of which class the testing sample belongs to. Figure3.4b shows the Van Rossum distances
between the 200 neurons’ output spike train. x-axis and y-axis give the input sample index.
The color intensity of pixel (x, y) is proportional to the Van Rossum distance between the
200 neurons’ output when given input sample x and y respectively. Similar as 3.4a, the
color bar on left side indicates the class of each sample. It can clearly be seen from figure
3.4b that the temporal structure of these 200 neurons’ outputs are significantly different.
The fact that our model is able to take those 64 sets of spike trains with almost the same
firing rate and generate 10 different classes indicates that it is capable of utilizing features
in the temporal distribution of the spikes in addition to the spike rates.

3.6.2

Vision Tasks

We evaluated our method on three vision datasets. Results and comparisons with state-ofthe-art works in the SNN domain are shown in Table 3.1. For MNIST, we utilize rate-based
encoding to convert the input image into 784 spike trains where number of spikes in the spike
train is proportional to the pixel value. With a convolutional SNN with the structure 32C332C3-64C3-P2-64C3-P2-512-10, our model achieves state-of-the-art accuracy in the SNN
domain. The work next in terms of accuracy (99.42 %) [5] employs ensemble learning of
64 spiking CNNs. Compared to conversion-based approaches that require hyper-parameter
search and fine tuning [31], our approach does not require post-training processing. It directly
trains SNN using BPTT and obtains models with comparable quality as DNN.
Unlike MNIST, which consists of static images, Neuromorphic MNIST (N-MNIST) is a
dynamic dataset which consists of spike events captured by DVS camera and is a popular
dataset for SNN evaluation. An N-MNIST sample is obtained by mounting the DVS camera
on a moving platform to record MNIST image on the screen. The pixel change triggers spike
event. Thus, this dataset contains more temporal information. With a convolutional network
of size 32C3-32C3-64C3-P2-64C3-P2-256-10, our model outperforms the current state-of-theart. The results are shown in Table 3.1. [35] introduced additional winner-take-all (WTA)
26

1.0

PSP

0.8
0.6
0.4
0.2
0.0

0

20

40

Time

60

80

100

Figure 3.5: Learned synapse impulse response
circuit to improve performance. [30] gets 99.35% accuracy with a very large network, the
structure is 128C3-256C3-AP2-512C3-AP2-1024C3-512C3-1024FC-512FC-Voting. There is
also additional voting circuit at output layer. We use a significantly smaller network to
achieve the same accuracy, and no additional voting layer or WTA circuits are required.
DVS128 Gesture Dataset contains 10 hand gestures such as hand clapping, arm rolling
etc. collected from 29 individuals under 3 illumination conditions using DVS camera. The
network is trained to classify these hand gestures. This dataset contains rich temporal
information. For this task, we utilize a network with 64C7-32C3-32C3-256-10 structure.
The advantage of our work is clearly seen in the third column of Table 3.1. We achieved
96.09 % accuracy, which is state-of-the-art in the spiking domain, while other works, such
as [54], requires additional filters for data preprocessing and WTA circuit at the output
layer. Our model and learning algorithm doesn’t need specialized neuron circuits or any
data preprocessing techniques as the spike streams are directly fed into the network.
We also studied the effect of training the synapse response kernels. The learned synapse
kernels are shown in Figure 3.5. The solid red line represents the original kernel. The decay
speed of synapse response of the learned kernel diverges from original kernel. Slower decay
speed indicates the synapses are capable of remembering information for a longer time. Such
behavior is similar to the gates in an LSTM. The accuracy with and without training synapse
filter kernel are shown in Figure 3.6. No improvements are observed for MNIST dataset, the
27

99.0
98.5

With traning filter
without training filter

98
97.5

100
90
80
70
60
50
40
30

Accuracy

Accuracy

99.5

1

21

41

61

81

Epoch
(a) NMNIST accuracy

With traning filter
Without traning filter
1

21

41

61

81 101 121 141

Epoch
(b) DVS128 accuracy

Figure 3.6: Training performance comparison.
Method
[18]
[5]
[30]
[35]
[49]
[34]
[46]
[55]
[56]
[54]
This work

MNIST
99.42
99.42
99.31
98.60
97.20
99.36
98.77
99.46

NMNIST
98.78
98.84
99.35
98.66
99.2
99.39

IBM-DVS128
93.64
94.18
92.7
91.77
96.09

Table 3.1: Results on vision datasets.
accuracy with training and without trained kernel are 99.46% and 99.43% respectively. This
is because MNIST is a static dataset, hence no temporal information. There is slight improvement in NMNIST by training synapse filter kernel, the accuracy increases from 99.24%
to 99.39%. In DVS 128 dataset, the advantage of training the synapse filter kernel is clearly
seen, the model not only converges faster, the accuracy also increases from 94.14% to 96.09%.

28

Method
[25]
[42]
[57]
[38]
[58]
Vanila LSTM
This work

Architecture
SNN
SNN-SVM
MFCC-HMM
SNN-STDP
LSTM-CNN
LSTM
SNN

TIDIGITS
97.6
94.9
99.7
97.9
99.13

Sign language
97.5
97.00
96.7
98.21

Table 3.2: Results on temporal datasets.

3.6.3

Time Series Classification

Our work also shows advantages in time series classification. We evaluated our work in
TIDIGITS and Australian Sign Language [44] dataset. TIDIGITS is a speech dataset that
consists of more than 25,000 digit sequences spoken by 326 individuals. For training and
testing, we extracted MFCC from each sample, resulting 20 time series of length 90. The
Australian sign language dataset [44] is a multivariate time series dataset, collected from 22
data glove sensors that track acceleration and hand movements such as roll, pitch etc. Each
recorded hand sign is a sequence of sensor readings. The average duration of a hand sign
is 45 samples. The dataset has 95 classes of hand signs, To convert time series into spike
trains, we use current-based LIF neuron as encoder. It accumulates input data as current
and converts time varying continuous values to time varying spike patterns.
Networks to classify TIDIGITS and Australian Sign Language have a structure 300300-11 and 300-300-95 respectively. We trained two-layer stacked LSTM of unit size 300 as
baseline. Results are shown in Table 3.2. The best accuracy in TIDIGITS is achieved by [57],
however, it is a non-spiking approach. In Australian Sign Language dataset, we outperformed
vanilla LSTM and DNN based approaches. [38] uses EMSTDP to train an SNN to classify 50
classes of the hand signs, the network size is 990-150-150-50. It buffers the entire sequence and
flattened the time series into a vector. While our work is trained to classify all 95 classes, and
it processes the time series in a more efficient and natural way, the input data is converted into
spikes on the fly. Since flattening is no longer necessary, the input dimension is also reduced.
29

3.7

Conclusion

In this work, we proposed a general model to formulate SNN as network of IIR filters with
neuron non-linearity. The model is independent of neuron types and capable to model
complex neuron and synapse dynamics. Based on this model, we derived a learning rule to
efficiently train synapse weights and synapse filter impulse response kernel. The proposed
model and method are evaluated on various tasks, including associative memory, MNIST,
NMNIST, DVS 128 gesture, TIDIGITS etc. and achieved state-of-the-art accuracy.

30

Chapter 4
Multivariate Time Series
Classification Using Spiking Neural
Networks
4.1

Introduction

The function and behavior of SNN are derived from its inspiration, i.e. the brain, which
is capable of performing numerous cognitive tasks with minimal energy requirements. The
potential of SNN has drawn various research interests including emerging device, algorithm
and applications [11, 59–61]. In general, the brain’s capability comes from the complex
dynamics of the networks of spiking neurons and the plastic synapses connecting them. These
dynamics can capture complex spatial temporal features of input encoded as sparse temporal
spiking activity. Despite the biological inspiration, majority of the existing models of SNNs
are unable to replicate such dynamics to encode, learn and decode temporal information.
The limitations of existing SNNs are multifold. Firstly, most SNN models and training
algorithms consider only the statistics of spike activities. A numerical value is represented by
spike counts in a time window [62]. Though, this type of SNN has demonstrated state-of-art

31

performance in various tasks [7], it suffers from high spike activities [63]. Thus, it cannot fully
benefit from event-driven computation. Secondly, directly adapting backpropagation is not
feasible because spiking neuron’s output is a Dirac delta function. One approach to address
the problem is to train an ANN such as MLP and map the weights to SNN. However,
it suffers form accuracy degradation, additional fine-tuning of weights and thresholds is
required to minimize the performance penalty [31]. Recently gradient surrogate is proposed
to approximate the gradient of the spiking function, enabling backpropagation [18, 19, 35,
38, 45, 46]. [45] derived a cumulative error function as gradient surrogate. [18] derived a
simplified model from LIF neuron, and proposed four functions as gradient surrogates. Other
approaches include replacing hard threshold function by a differentiable soft spike [47] [64].
However, it compromises SNN’s most distinct feature, binary spike.
While most SNN models and training algorithm use spike counts to resemble numerical
value, it is observed that in biological neural networks, temporal structure of spike train also
conveys information [41]. Two spike trains of same spike rates can have distinct temporal
patterns, hence the represented information is different. Such temporal encoding can efficiently encode information using extremely sparse spikes-events [63]. There are some existing
works to train neurons to detect temporal spike patterns. Tempotron [14] trains individual
neuron to perform binary classifications for different spatial temporal input spike patterns.
Neuron generate at least one spike for positive pattern, and remain inactive for other patterns. Based on Tempotron, [15] proposed an algorithm to adjust synaptic weights such that
neuron can generate desired number of spikes given a specific input pattern. SPAN [27] trains
an individual neuron to associate a spatial temporal input pattern with a specific output
spike pattern. However, these works aim at training individual neurons, cannot be extended
to multiple layers, therefore the performance is limited. There are also recent works utilize
Backpropagation Thtough Time (BPTT) to address the temporal dependency problems. [46]
proposed a training rule to reassign errors in time. [49] proposed a novel loss function and
derived an iterative model from Tempotron. Based on the iterative model, network can

32

be unrolled hence BPTT is possible. [48] captures the temporal dependency on membrane
potentials and use membrane potential as objective function to learn temporal patterns.
Existing works have achieved comparable performance with DNN in vision tasks such
as static image or event-based data classification, however few SNN models address the
time series classification tasks. The first challenge is the limitation of rate coding, since it
treats spikes statistically, spike coding cannot represent temporal information. Though it
is possible to flatten the time series into a 1-D array and then represent it by rate coding,
this increases the input size. In addition, for real time applications, flattening input requires
buffering, which increases computation latency. Secondly, unlike images, such as MNIST
images, where all value’s range, precision and scale are identical, multivariate time series
may be collected from different sensors, therefore their precision and range may vary. Rate
coding has to guarantee the precision for the most high-resolution input. Such ”design for
worst case” coding scheme lacks flexibility and hence is not efficient. New coding methods
to represent time series are required to exploit the potential of SNNs.
In this work, our contributions are summarized as follows:
• We present a coding method that can convert time series into sparse spatial temporal
spike patterns.
• We derive an iterative SNN model from Spike Response Model, such that the BPTT
is possible.
• We formulate a backpropagation rule for the iterative SNN model and propose a training algorithm to train the model on spatial temporal patterns.
• We evaluate the proposed method on multiple datasets, and achieve comparable accuracy with DNN. To the best of our knowledge, this is the first work applying SNN for
multivariate time series classification.

33

Synapses

Membrane potential

PSP
Spikes

w1

PSP

w2

Spikes

Output spikes

...

∑

wN

LIF Neuron

PSP
Spikes

-Vth
Reset voltage

Figure 4.1: Spiking neuron model.

4.2

SNN Model

Without loss of generality, we adopt the a widely used LIF neuron defined by Spike Response Model [14, 23]. Each input spike induces a charge in the neuron’s membrane potential, which is called a Postsynaptic Potential (PSP):

P SP (t) =

ti <t
X

K(t − ti )w

(4.1)

ti

where ti denotes the arrival time of ith input spike. K(t) is synapse kernel. Neuron accumulate all input PSPs, such that the membrane potential v(t) is defined as:

v(t) =

N
X

wi P SPi (t) − Vth

i

X

e−

j
t−ts
τ

(4.2)

tjs <t

where N is the number of input synapse. tij is the arrival time of jth spike at ith input
synapse. τ is a time constant of neuron. wi is the weight associated with each input

34

PSP Kernel

0

T

Time
Figure 4.2: PSP kernel.

synapse. tjs < t is the time when the neuron generates an output spike and the rightmost
term can be interpreted as a negative voltage applied to the neuron itself such that the
membrane potential is decreased by a factor of the threshold voltage Vth . This serves as
the reset mechanism at the time of spike. Thus, the neuron’s potential is the summation of
all weighted input PSP plus the negative voltage given by rightmost term [23]. The neuron
model with N inputs is illustrated in figure 4.1. PSP kernel K(t) is defined as:
t

−t

K(t) = V0 (e− τm − e τs )

where τm and τs are two time constants.

(4.3)
η

V0 = η η−1 is a normalization factor

which scales the maximum value of K(t) to 1, and η =

τm
τs

[15].

The shape

of the PSP kernel is shown in figure 4.2.
At time t, PSP and membrane potential are determined by all previous inputs. However,
it is not feasible to directly implement the SNN model defined by Equation 4.2. At any
given time t, v(t) has to be computed by recursively convolving input spike trains with K(t),
thus, incurring significant computation overhead. To address this issue, an incremental way
to update the PSP can be derived from the SRM model in discrete time domain.
35

More formally, input spike train S[t] can be defined as a sequence of time shifted Dirac
Delta functions:

S[t] =

t
X

x[n]δ[t − n]

(4.4)

n

where xi [t] = 1 denotes a spike received at time t, otherwise xi [t] = 0.

Simi-

larly, output spike train O[t] can defined as:

O[t] =

t
X

y[n]δ[t − n]

(4.5)

n

where y[t] satisfies (v[t] < Vth → y[t] = 0) ∩ (v[t] > Vth → y[t] > 0). To derive the
P
P
n
n
incremental model, We define M [t] = n e− τm S[t − n], H[t] = n e− τs S[t − n], such that
the P SP can be expressed as the combination of M [t] and H[t] [65]:

P SP [t] = V0 (M [t] − H[t])

(4.6)

M [t] and H[t] can be computed incrementally:
−1

M (H)[t] = e τm (s) M (H)[t − 1] + S[t]

(4.7)

Similarly, we can compute reset voltage R[t]:
−1

R[t] = e τs R[t − 1] + O[t − 1]

36

(4.8)

Such that the SNN defined in Equation 4.2 can be equivalently expressed as:

Vil [t] = Iil [t] − Vth Ril [t]

(4.9a)

Ml−1

Iil [t]

= V0

X

l
wi,j
(Mil [t] − Hil [t])

(4.9b)

j

Mjl [t] = αNjl [t − 1] + Ojl−1 [t]

(4.9c)

Hjl [t] = βHjl [t − 1] + Ojl−1 [t]

(4.9d)

Ril [t] = γRil [t − 1] + Oil [t − 1]

(4.9e)

OiL [t] = U (Vil [t] − Vth )

(4.9f)

where indexes l, i, j denote layer index, neuron index and input index respectively. Nl
denotes the number of neurons in lth layer. Iil [t] is input current, R[t] is reset voltage, and
−1

−1

Oil [t] is neuron output. α = e τm , β = e τs , γ = e

−1
τ

are three decay factors. More specifically,

l = 0 denotes the encoding layer, which will be discussed in section 4.3, L is the number of
layers in the network and l = L denotes output layer. U (x) is a Heaviside step function:

U (x) = 0, if x < 0, otherwise 1

(4.10)

In the above model, the temporal dependency can be clearly seen in Equation 4.9a - 4.9f.
At each time t, the PSP, membrane potential and output can be computed based on time
t − 1, hence by unfolding the network, BPTT can be used to train the network. Note
that the gradient of U (x) is a Dirac Delta function, therefore backpropagation cannot be
directly applied. Its approximation will be discussed in 4.4.

4.3

Spatial Temporal Population Encoding

Rate coding represents a numerical value by the activity of an individual neuron that fires
at a particular rate. For example, in vision tasks, to encode one pixel, a spike train’s
37

spike count C in a given time window T is proportional to the pixel value. There are
several drawbacks of rate coding. 1) the precision is limited because the value represented
by rate coding is quantized by bin size 1/T . Though higher precision can be obtained
by increasing T , the computational latency increases as well; 2) it is unable to represent
temporal information as it treats spike activity statistically. Time series have to be flattened
and then converted to spike trains. In real time scenarios, it requires data stream to be
buffered, which causes additional latency; 3) Individual neuron is too noisy due to stochastic
nature, thus it introduces additional noise; 4) it causes high spiking activity, as larger value
has to be represented by more spikes, which deprives the SNN energy efficiency. 5) It is
incapable of representing negative values, which are common in sensor inputs.
To address the above issues, we employ a coding method suitable for encoding time series
by combining population coding and temporal coding [39]. In population coding the information is represented by the activity of a group of neurons. Inside a population, each neuron has
its favorable input, i.e. each neuron responds to a particular input and remains relatively inactive for other inputs. In temporal coding, the spike train patterns also convey information.
We utilize a population of Current-based Integrate and Fire(CUBA) neurons as encoder.
A CUBA neuron is defined as a hybrid system [20]:
1
dV
= − V + g · Iext (t)
dt
τ
V ←0

(4.11)

if V > Vth

where Iext (t) is the external time-varying input current, τ is the membrane time
constant, which determines the decay speed of membrane potential, V (t) is the neuron membrane potential, g is the gain.

Neuron accumulates the input current and

updates the membrane potential continuously.

When the V (t) exceeds Vth , a reset

is triggered, membrane potential is forced to 0.
In practice, CUBA neuron model is simulated in discrete time, V (t) is evaluated on a
time grid and the interval of the grid is dt [66]. Iext (t) is also sampled at each time grid.
38

Time series

Encoder

Spikes

...

...

...

Figure 4.3: Coding method.
Such that the discrete version of the model represented by Equation 4.11 is:
dt

V [t + 1] = e− τ V [t] + g · Iext [t]
(4.12)
V [t + 1] = 0

if V [t] > Vth

With this coding scheme, a univariate time series, for example sensor data is treated as
input current and connected to a population of E encoder neurons. Each neuron may have
different time constant τ and gain g, so that each encoder responds to the input differently.
In addition, by setting g to negative, neuron can also respond to negative input, which overcomes the drawback of rate coding. We utilize Neural Engineering Framework (NEF) [21] to
pre-train the encoder. Using this approach, time varying signal is converted into time varying
spike patterns. For multivariate time series of C channels, each channel can be encoded using

39

the above approach. Unlike vision tasks in which all input dimensions have identical resolution and range, multivariate time series maybe collected from different sensors, therefore the
resolution, precision and range may vary. With this coding method, the population size and
tuning curve can be adjusted to provide just enough precision for all the channels [39, 67].

4.4

Training Algorithm

For the classification task, the neuron in the output layer that fires most frequently represents the result, we use cross-entropy loss defined as:

E=−

NL
X

yi log(pi )

(4.13)

i

where pi is the probability of each class calculated by softmax, pi given by:
P
exp ( Tt OiL [t])
pi = PNL
PT L
t Oj [t])
j=1 exp (

(4.14)

where yi is the label, L is number of layers, OiL [t] denotes output of last layer,
and NL is the neuron number of last layer.
Equation 4.9a - 4.9f provide an explicit way to update SNN states and outputs.
δil [t] =

By unfolding the network, BPTT can be used for training.
∂L
,
∂Oil [t]

li [t] =

∂U (Vil [t]−Vth )
∂Vil [t]

and κli [t] =

First, we define

∂Oil [t+1])
.
∂Oil [t]

κli [t] is given by:
κli [t] = −Vth γli [t]

40

(4.15)

At last layer L, δiL [t] can be computed as:
P
∂( Tk=1 OiL [t])
∂E
∂E
=
= PT
∂OiL [k]
∂OiL [t]
∂( k=1 OiL [t])
T
X
∂OiL [k]
= (pi − yi )(
)
L
∂O
[t]
i
k=t

δiL [t]

(4.16)

For hidden layer l < L, δil [t] can be computed recursively from output layer
L and time T to input layer and time 0:
Nl+1
l,i

δ [t] =

X
j

∂Ojl+1 [t]
∂E
∂E ∂Oil [t + 1]
+
∂Oil [t] ∂Oil [t]
∂Ojl+1 [t + 1] ∂Oil [t]

= −Vth δil [t + 1]li [t + 1]
Nl+1

+

X

wji δil+1 [t + 1]l+1
i [t + 1](α − β)

(4.17)

j

Heaviside step function U (x) is non-differentiable. We employ gradient surrogate [47] to
address this issue. In forward path, the spike generation mechanism remains unchanged,
while in the backward path, the derivative of U (x) is replaced by the derivative of a smooth
function. We use a sigmoid function proposed by [18] as the gradient surrogate in the
backward path, such that the gradient of U (x) is approximated as:
∂U (v)
eVth −v
≈
∂V
(1 + eVth −v )2

(4.18)

Based on above equations, the gradient of weight can be computed as:
T

X ∂E ∂O l ∂V l [t] ∂I l [t]
∂E
=
∂wl
∂O l [t] ∂V l [t] ∂I l [t] ∂wl
t=1
=

T
X

V0 · δ l [t]l [t](M l [t] − N l [t])

t=1

41

(4.19)

Algorithm 1: Training process of one iteration
Input: Time-varying input Iext [t]
Output: Optimized weights W l
// Forward
for t = 1 to T do
// Encoding
−1
V 0 ← e τ · V 0 + Iext [t]
if V 0 > Vth then
V 0 ← 0 // Reset
O 0 ← 1 // Generate spike
else
O0 ← 0
for l = 1 to L do
// Update states Eq.4.9a -4.9e
(M l , H l , Rl , V l , I l ) ← Update(M l , H l , Rl , O l−1 , O l )
O l ← SpikeFunction(V l ) // Eq.4.9f
// Calculate loss
E = Loss(O L [1], ..., O L [T ]) // Eq.4.13-4.14
// Backward
for t = T to 1 do
(δ L [t − 1], κL [t − 1]) ← BackProp(E, δ L [t], κL [t])
for l = L to 1 do
(δ l−1 [t − 1], κl−1 [t − 1]) ← BackProp(δ l [t], κl [t]) // Eq.4.15,4.17

4.5

Experiments

The proposed network model and algorithm are implemented in PyTorch. We demonstrate
the effectiveness of our work in two experiments. In the first experiment, we compared
the coding efficiency of rate coding and temporal population coding in terms of spike rate
and input size. In second experiments, the proposed network and algorithm is evaluated
on various multivariate time series classification tasks.

4.5.1

Coding Efficiency

First, we study the efficiency of the proposed coding method by utilizing the Articulary Word
Recognition dataset collected by UEA & UCR Time Series repository [68] as an example.
This dataset consists of multivariate sensor data recorded by Electromagnetic Articulograph

42

Value

Time
Figure 4.4: Articulary Word Recognition dataset sample.
(EMA), which is a device to track the motion of speakers’ tongue and lips. Each sample
contains 9 variates of length 144. An example of this dataset is shown in figure 4.4, each line
represents a time varying input. We use both rate coding and temporal population coding to
convert the time series to spikes. The spike patterns are shown in figure 4.5. Each dot in figure 4.5a represents a spike, and in figure 4.5b a spike is represented by a vertical line. We use
an input window of 300. Due to the incapability of rate coding to represent temporal information, the time series have to be flattened, resulting in 1296 spike trains. For clarity, only the
first 100 spike trains are shown in figure. In temporal coding, for each variate, we use 5 neurons to encode. It is clearly seen that the temporal population coding is sparser. In addition,
it is encoding the input with 45 spike trains, which is significantly smaller than the number
of spike trains obtained by rate coding. This is beneficial to reduce the SNN model size.
We tested our coding method with rate coding on four multivariate datasets, details
including average spike count, spike rate, input size are shown in table 4.1. Temporal in
table 4.1 refers to temporal population coding. The spike rates are significantly lower than
rate-based coding. As can be seen in last column, the input size of our coding method is
also significant less. Particularly for long time series, such as Atrial Fibrillation dataset.

43

It consists of two variates, and the length is 640, therefore flattening operation resulting
a large number of inputs. Buffering such long time series also causes significant latency
in real time applications. While temporal population coding can convert input on the fly,
not only input size is reduced, buffering is also no longer required.
Table 4.1: Coding efficiency of rate coding and temporal coding.

4.5.2

Dataset

Coding
Method

Spike
Count

Spike
Rate

Input
Size

Articulary Word
Recognition
Basic
Motions
Finger
Movements
Atrial
Fibrillation

Rate
Temporal
Rate
Temporal
Rate
Temporal
Rate
Temporal

65653.8
518.7
15061.7
122.4
1069.0
471.3
23041.4
164.5

16.90
3.84
8.36
1.13
25.45
1.12
10.77
4.76

1296
45
600
30
1400
140
1280
10

Time Series Classification
Table 4.2: Accuracy.
Dataset
Articulary Word
Recognition
FaceDetection
BasicMotions
Heartbeat
Spoken Arabic
Digits
JapaneseVowels
RacketSports

ED
[68]

DTW
[68]

TapNet
[69]

WEASEL
[70]

This
work

0.97

0.98

0.987

-

0.98

0.519
0.675
0.62

0.513
1
0.659

0.556
1
0.751

-

0.57
1
0.72

0.967

0.96

0.983

0.992

0.98

0.924
0.868

0.959
0.842

0.965
0.868

0.976
0.934

0.97
0.87

To demonstrate the temporal pattern learning capability of our approach, we apply the algorithm to a proof-of-concept application, time series classification.
optimizer is used and the learning rate is 0.0001.
eral existing approaches.

We compare our model with sev-

The results are shown in Table 4.2.
44

Adam

ED and DTW refer

to 1-Nearest Neighbor with Euclidean Distance and Dynamic Time Warping respectively [68].

TapNet is a DNN-based approach for time series classification [69].

No

accuracy in SNN domain is listed, to our best knowledge, there is no previous work that
comprehensively focusing on time series classification with SNN.
In all the 7 datasets, our method outperforms the 1-Nearest Neighbor classifier, which
is a standard classifier for time series classification. Notice that this is a proof-of-concept
work to show SNN is capable to classify time series, we do not expect results can outperform
DNNs, but our SNN model still achieves comparable performance.

4.6

Conclusion

In this work, we presented an iterative SNN model and training algorithm for spatial
temporal spike pattern classification. A coding method to convert continuous time series to discrete spikes is also proposed. Our coding method is able to represent information by sparse spike patterns, such that the computation overhead can be significantly reduced. We evaluate our algorithm and coding method on various multivariate time series
dataset, and outperform the standard 1-Nearest Neighbor classifiers and also show competitive performance with DNN based approaches.

45

Spike train index

(a) Rate coding

Time
(b) Temporal population coding.

Figure 4.5: Coding comparison

46

Chapter 5
Systematic Optimization for Spiking
Neural Network in FPGAs
5.1

Introduction

Functioning and energy efficiency of the brain have been the inspiration for computational
models neural networks. In SNN, information is transmitted and processed as discrete
spike events rather than continuous values, which helps reduce hardware design complexity.
SNN is inherently a dynamic and stateful system, which processes temporal information
in a nature way. However, these advantages are not fully exploited by existing neuromorphic hardware due to the lack of systematic design optimization. To maximize the potential of SNN, various aspects including neural coding of the input signals, computational
model, hardware architecture should be jointly optimized.
Neural coding refers to how information is represented by spikes. There are various
forms of neural coding which can be divided into two types: rate coding and temporal coding. Rate coding treats spikes statistically. The number of spikes in an encoding window
is proportional to the numerical value to be encoded. It has significant drawbacks. The
information is fully represented only at the end of the encoding window, thus introducing

47

latency inherently. As spike count in an encoding window only represents one numerical
value, it is inefficient to encode time-varying input, although neuromorphic systems are believed to be promising in real-time applications [71], where inputs are continuous streams
of time sequences. Furthermore, the precision of rate coding is limited by the length of the
encoding window. Such limitations have been extensively discussed in [17, 40]. Biological
studies have revealed that the precise timing of spikes also conveys information [40, 43], enabling reliably encoding with low latency. Intuitively, a spiking neuron accumulates time
varying input, the inter-spike interval (ISI) changes over time, hence the information can
be represented by ISIs. There are works utilize this approach [39, 72, 73] but lack theoretical proof that neuron can faithfully encode input by ISI.
There are various works that utilize the information represented by spike timings. Tempotron [14, 15] trains individual neuron to classify temporal spike patterns. Recent advancements include [18, 46], which employ backpropagation through time to train SNN to classify
temporal patterns. All these models use complex neuron models such as Spike Response
Model whose neuron dynamics enable spiking neuron to learn temporal information [14].
Though these works outperform prior rate-based models and also enable novel SNN applications, their models are computationally intensive and also expensive for hardware implementation. Thus, neuromorphic hardware mainly employs simplified IF models as used by [45].
These simplified SNN models ignore temporal neuron dynamics and reduce the complexity
at the cost of degraded model capacity. The trade-offs between SNN’s complexity and its
model capacity poses challenges in designing high performance neuromorphic hardware.
Based on the above discussion, we identify three factors that are critical to the application of neuromorphic hardware. 1) Efficient and accurate neural coding of the input signals.
In the real world, most input values are continuous, while in SNN domain, information
is conveyed using discrete spikes. There is a gap between these two representations. As
the bridge between continuous and spike domains, quality of neural coding is critical for
downstream processing; 2) Flexible and powerful SNN model. Unlike DNNs, which are gen-

48

erally formulated as matrix multiplications, SNNs model and algorithms are highly flexible,
the choice of the specific model determines computational power and hardware complexity.
A unified SNN model supported by a general hardware architecture is highly desirable. 3)
Versatile and high-performance hardware architecture. As the underlying computing engine,
the hardware should support a flexible computational model without impairing performance.
As these three factors are tightly coupled, we propose to systematically optimize for these
design aspects. The contributions of this work are summarized as follows:
• We propose a temporal population neural coding and an algorithm to optimize neural
encoder parameters.
• We proposed a flexible model to implement SNNs as a network of Infinite Impulse
Response filters. It reduces model complexity and retains critical SNN dynamics.
• We developed a novel hardware and an end-to-end framework to optimize and deploy
software model to FPGAs. The implementation achieves up to 38.72× speedup and
33.6× energy efficiency improvement compared with state-of-the-art GPU and neuromorphic ASICs.

5.2
5.2.1

Neural Coding Optimization
Time Coding by LIF Neuron

In this section, we proposed a neural coding scheme that can encode time-varying continuous
input into a sparse spike train using Leaky Integrate and Fire (LIF) neurons. We also present
a gradient descent based approach to optimize neuron encoder parameters.
Intuitively, a spiking neuron accumulates time varying input, the inter-spike interval
(ISI) changes over time, hence the information can be represented by ISIs. There are works
utilize this approach [39, 72, 73]. More formally, temporal coding by spiking neuron can
be interpreted as mapping a continuous input x(t) a sequence of time shifted Dirac delta
49

Figure 5.1: Time encoding by LIF neuron.
functions S(t) =

P

i

δ(t − ti ), i ∈ Z, where ti is the time of ith spike. Such mapping is

similar as Time Encoding proposed by [74], which transforms amplitude information to
a sequence of strictly increasing time sequence (ti ).
Figure 5.1 gives an example of the time encoding by LIF neuron. The blue curve is
an arbitrary input, the vertical dash line indicates the spike time. Neuron fires more frequently as the amplitude increases. Amplitude information is represented by ISI. Essentially
this is a nonuniform sampling technique. Proved by [75], nonuniform sampled signals that
are bandlimited to [−Ω, Ω] can be reliably recovered from sequence of shifted Dirac delta
functions filtered by ideal low pass filter with appropriate weights:

x(t) = h(t) ∗

X
i

wi δ(t − ti ) =

X

wi h(t − ti )

(5.1)

i

Where h(t) = sin (Ωt)/(πt)) is the ideal low pass filter, in which Ω is the bandwidth of the input signal x(t), ∗ denotes convolution.
the weighted low-pass kernel method proposed by [75].

50

Weights can be computed by

5.2.2

Optimizing Population-temporal Neural Encoding by Gradient Descent

How to fine-tune the LIF neuron to find the most efficient coding, and how to process
the encoded information are two questions that we are going to answer in the next. We
start from the neuron model in Section 5.2.1, propose a coding method and optimization
techniques addressing challenges in realistic applications. Usually neuromorphic applications use one spike train to encode one input, we refer it as 1-to-1 coding, e.g. a 28 × 28
image results to 784 independent spike trains. Another common practice is to use identical encoder (i.e. neuron) to map each input to a spike train. Such homogeneous 1-to-1
mapping has a few drawbacks: 1) Spikes get sparser and the ISI becomes longer when
it comes to represent the weak stimulus. This delays the pattern detection and system
response. 2) Negative stimulus has to be mapped to positive range by a bias, resulting
more spikes, even leading to neuron over-activation. 3) Homogeneous mapping is not effective for heterogeneous stimulus, such as sensor data streams or multivariate time series.
Heterogeneous stimulus has different physical meaning, resolution, and range in each dimension, while neuron can only effectively encode input in a specific range [67]. Stimulus
that are out of the optimal range leads to information loss.
A simple solution is to use two complementary neurons, one is active as the stimulus increases (On-neuron), another one is active when the stimulus decreases (Off-neuron).
Generalization of this idea leads to population coding, in which inputs are represented by the
activities of a heterogeneous neuron population jointly. While rate coding or time encoding
assumes one-to-one mapping between the spike train and the stimulus being represented, and
identical mapping functions across the stimulus-spike train pairs, in population coding, each
neuron integrates multiple input stimuli and responds to different input diversely, each neuron has its preferred input, such that multiple stimuli are encoded into multiple spike trains
jointly. The population encoding has solid biological foundation. And neuron heterogeneity
improves accuracy and encoding capacity [21, 76, 77]. To model such population encoding
51

and neuron heterogeneity, gain g and encoding vector e are introduced to equation 5.2:

τ

dvi (t)
= −vi (t) + gi (ei · x(t)) + bi
dt

(5.2)

Where i is the index of the heterogeneous neuron, x(t) is the vector of N inputs, ·
indicates dot product, ei determines preferred input, gi is gain, and bi controls the threshold
of stimulus that can drive neuron to fire. A critical question is how to set gain g, bias b and
encoding vector e such that the population can produce good representations. [21] suggests
that uniformly distributed parameters are generally good. We proposed to use gradient
descent to optimize these parameters. The discrete version of the neuron model is:

vi [t] = e

−1
τ

vi [t − 1] + (1 − e

−1
τ

)(gi (ei · x[t]) + bi )

(5.3a)

vi [t] ← 0 if vi [t − 1] > ϑ

(5.3b)

oi [t] = u(vi [t] − ϑ)

(5.3c)

where oi [t] is the output at time t, u(x) is Heaviside step function. Above model can be
mapped to the model proposed in Section 3.3 Equation 5.4a to 5.4f. Therefore the training
rule in Section 3.4 can be applied to optimize gain g and encoding vector e. 1-to-1 coding
is a special case of population coding; therefore, it can also be be optimized by the above
rule. In section 5.5.1, we will study the impact of different coding schemes on accuracy,
and provide general guideline to select appropriate coding scheme.

5.3

Hardware-friendly and Unified Representation of
SNN

In Section 3.3, we derived Equation 3.6a to 3.6f to model SNN. These equations are listed below for convenience:
52

Vil [t] = λVil [t − 1] + Iil [t] − Vth Ril [t]

(5.4a)

Nl−1

Iil [t]

=

X

l
wi,j
Fjl [t]

(5.4b)

j

Ril [t] = θRil [t − 1] + Oil [t − 1]
Fjl [t]

=

P
X

l
αj,p
Fjl [t

− p] +

Q
X

l
βj,q
Ojl−1 [t − q]

(5.4c)
(5.4d)

q=0

p=1

Oil [t] = U (Vil [t] − Vth )

(5.4e)

U (x) = 0, x < 0 otherwise 1

(5.4f)

Notations are explained as follows:
• l, i, j, t: layer index, neuron index, input index, time step.
• Nl : number of neurons in lth layer.
• Vil [t]: neuron potential, a first order low pass filter.
• λ: decay factor/coefficient of neuron filter.
• Fjl [t]: synapse filter.
• P , Q: feedback and feed forward order of synapse filter.
l
l
• αj,p
, βj,q
: coefficients of synapse filter.

• Iil [t]: summation of weighted PSP.
• Ril [t]: reset filter.
• θ: decay factor/coefficient of reset filter.
• OiL [t]: output spike function.
53

• U (x): Heaviside step function.
Equation 5.4a to 5.4f provide an explicit way to update the state of SNN based on
difference equations, hence it is convenient to implement in FPGAs. It guarantees flexibility
with reduced hardware design complexity. As discussed in Section 3.3, various PSP kernels
and neuron models can implemented by setting different coefficients. In principle, by using
signal processing tools such as Z-transform or Laplace transform, a LTI SNN model defined
by ODEs can be mapped to this form [20, 78]. Setting all coefficients in equation 5.4d to 0,
it reduces to a simple LIF neuron widely adopted by neuromorphic hardware and rate-based
SNN training algorithms [30, 31]. The encoder in section 5.2.2 can be mapped to this form
by setting λ = e

−1
τ

, and removing synapse filters. b(1 − e

−1
τ

) and g(1 − e

−1
τ

)e in equation

5.3a can be implemented as an additional constant input and weights respectively.

5.4

Hardware Design

Equations 5.4a to 5.4f provide general formula of SNNs as a network of IIR filters with neuron non-linearity.
primitives:

They can be decomposed into 3 basic computing

1) IIR filter (equation 5.4a, 5.4c 5.4d); 2) Dot-product (equation 5.4b),

and 3) Thresholding (equation 5.4e, 5.4f).
Based on the general SNN model and the fact that complex SNN model can be realized using basic building blocks, we proposed a universal SNN hardware architecture and a modular
approach to construct complex SNNs. This enables notable advantages: 1) Rather than develop model-specific FPGA implementation, various SNN models can be implemented using
the same architecture by setting appropriate filter coefficients. 2) Synapse dynamic, which
has been proved critical for temporal pattern learning tasks but often ignored by existing
SNN hardware design, is explicitly supported. This enables a set of new applications that are
not supported by prior works. 3) Since the SNN is formulated as filters, it can be efficiently
implemented in FPGAs and utilize abundant DSP resources. 4) The modular approach
54

NN Controller

…

PE

Start signal

Output

PE

Layer N

Inter-layer
Buffer

PE

Layer 2

Inter-layer
Buffer

Host

I/O Module

Layer 1

FPGA
Fabric

Figure 5.2: Overall Architecture.
constructs SNN from basic building blocks, allowing fine-grained hardware optimization.

5.4.1

Hardware Implementation

We implement the design by Vivado High Level Synthesis. The overall architecture is a
layer-wise pipeline of processing elements (PE). Each PE is parameterized and specifically
implements the function of a layer in SNN. The inter-layer buffer is an array stores only one
or zero to indicate spike. A controller schedules the computation. It sends a global start
signal to all PEs. Each PE receives spikes from the previous layer as input, updates synapse
and neuron states, and stores possible output spike in inter-layer buffer.
The detailed PE design is shown in Figure 5.3. It consists of a fine-grained pipeline.
The synapse filter bank is responsible for the computation of all synapses in a layer, it
implements equation 3.6d. It has multiple identical IIR filters to improve throughput, where
the synapse computation workload is evenly distributed among filters. Each filter is timemultiplexed by multiple synapses. The number of filters is parameterized and is determined
by the optimization procedure discussed later. The matrix multiplier implements equation
3.6b, which calculates PSP as the product of spike response and synaptic weights. In a fully
connected (FC) layer, PSP of a batch can be efficiently computed by matrix multiplication.

55

Synapse Filter Bank

Synapse States & Filter
Coefficients
(BRAM)

Neuron Filter Bank
PSP

Buffer

Hb[t]
Hb[t]
Hb[t]

Buffer

Int er-layer Buffer

Ha[t]
Ha[t]
Ha[t]

Spike
response

Matrix Multiplier/
Conv. Unit

Synaptic Weights
(BRAM)

+++

Cmp.

Ha[t]
Ha[t]
Ha[t]
Hb[t]
Hb[t]
Hb[t]

Reset
voltage

Spike
Generation

Reset
Filter

Figure 5.3: PE design.
The multiplier can also be parameterized to perform multiple dot product concurrently.
Depending on the layer type, matrix multiplier may be replaced by a convolution unit. we
use open source implementation from [79] to perform convolution.
Membrane potential (equation 5.4a) is calculated by the adder and neuron filter
bank.

Similar to synapse filter bank, neuron filter bank is also parameterized and

has multiple identical filters to improve throughput. A filter bank consists of multiple
simple first order low pass filters to implement the reset, i.e.

equation 5.4c.

The

spike generation mechanism (equation 5.4e and 5.4f) is achieved by a comparator and
demultiplexer.

The demultiplexer forward spikes to a specific address of inter-pipeline

buffer to indicate each neuron’s activation at current time step. In addition, the spike
also feeds into reset filter bank to compute the reset voltage.
We use a modular approach to construct a filter bank. Two basic building blocks are
implemented as shown in Figure 5.4. The first is a first-order low pass filter. It is mainly
used for neuron, reset and the first order low pass synapse. The coefficients such as λ, θ
are stored in BRAM. The delay unit stores the previous states. The address from controller
selects the corresponding neuron/synapse. The second is a direct form II biquadratic filter, where shift register arrays are used as the delay unit. The filter coefficients are also

56

S[t]
S[t]

×

+
×

Coefficients

BRAM

F[t]

+
+

Addr.

(a) First order low pass filter

+
+

F[t]

Shift register

Z-1

Addr.

×
×

Z-1

×

×

Z-1

×

BRAM

Coefficients

(b) Biquadratic filter

Figure 5.4: Filter bank.
stored in BRAM. It requires only two delay units, while naive implementation of second
order IIR filter requires four delay units. Second order synapse such as Dual-exponential
synapse and alpha synapse can be implemented by this structure. More complex synapse filters can be constructed by cascading biquad filters and first order low pass filters, which
helps to simplify design and improve stability [80].

5.4.2

Performance Analysis and Optimization

At top level coarse-grained pipeline, throughput is directly constrained by the slowest layer,
the performance is defined by the number of steps per second and is calculated as follows:

iteration/s = F req/max(T1 , T2 , · · · , TL )

(5.5)

Each iteration is defined as executing equation 3.6a to 3.6f once to update SNN states.
Ti is the cycles required by the ith layer to update the for one step, and F req is the system
clock frequency. At lower level, the resources in each module (i.e. synapse, matrix multiplication and convolution and thresholding) are shared by neurons in the same layer in a time
multiplexed way. The cycles required by those modules to process one time step is denoted as
Csyn , Cmm , Cconv and Cneu respectively. Without any parallelism, a general formula of these
delay is C = D +(N −1)·II. where D is the depth of data path. N is the number of neurons
57

or synapses. II is initiation interval of data path. D, II are layer-specific, they are obtained
from synthesis report. Time required by a layer is Tl = max(Csyn , Cmm/conv , Cneu ). The
key to improving throughput is using loop unrolling in matrix multiplication and allocate
more filters to exploit parallelism. Optimization must satisfy FPGA resource constraints
such as DSP, LUT etc.. We adopt the resource model proposed by [81], a module is divided
as a part Rbase that is irreverent from additional allocation and an incremental part Rinc .
Rbase and Rinc are obtained by profiling synthesis result. Given an optimization factor F
indicating number of filters or loop unrolling factor, resource consumption is estimated as
R = Rbase + K · Rinc . The cycles after optimization is estimated as C = D + (N/K − 1) · II.
The improvement after optimizing a particular computation is:

∆C = II · N/(K 2 + K)

(5.6)

As more resource are allocated to a module, the benefit decreases. It is desirable to
identify a particular module such that the optimization results to largest improvement
∆C.

Based on above resource and performance model, we use a greedy approach

to determine HLS directives as shown in Algorithm 2.

5.4.3

End-to-end Framework

We leverage High Level Synthesis (HLS) to create model-specific hardware implementation,
instead of developing a mesh-based multi-core architecture like neuromorphic chip Loihi
[71], TrueNorth [7] and mapping SNN to it. This relaxes the constraints on layer size,
neuron fan-in and fan-out etc.. The overall workflow is shown in 5.5. The software SNN
model and training algorithm are implemented in PyTorch. After training, synaptic weights
and filter coefficients can be straightforwardly exported, rounded to given precision and
finally used to create memory initialization file for FPGA.
A computation graph is generated from the inference model. Each node indicates a stage

58

Algorithm 2: HLS Optimization
Input: SNN computation graph G(V, E)
while true do
targetLayer = f indSlowestLayer(G)
maxGain = 0
target = N U LL
foreach stage ∈ targetLayer.stages do
gain = computeGain(stage) // Calculate by equation 5.6
if gain > maxGain then
maxGain = gain
target = stage
if target.type == conv or target.type == matrixM ult then
isSuccess = target.unrollF actor + +
else if target.type == f ilterBank then
isSuccess = target.addF ilyer()
if isSuccess == f alse then
break // No further optimization can be made
if checkResource() == f alse then
target.undoOptimization()
break
in the layer-wise pipeline. Edge direction indicates data dependency. Node’s attributes
include hyperparameter such as type, input/output size etc. of corresponding layer. The
model undergoes performance analysis discussed in section 5.4.2 to determine the HLS directives. We defined a set of templates targeting at Vivado HLS. Templates include different
filters, PE module, matrix multiplier, top level function. Data types, loop bounds and HLS
directives are parameterized. After updating the node attributes, template engine takes
computation graph as input, and generates headers and C++ source files.

5.5
5.5.1

Experiment and Evaluation
Impact of Encoding Window

In SNN, static data e.g. images have to be converted as spike trains in an encoding window
Te for processing. Larger Te provides better precision, but at the cost of longer computation
time. To evaluate the effectiveness of proposed coding method and SNN model, we studied
59

Software Model
SNN
Specification

Filter
HLS template

Mat. Mult.
HLS template

PE
HLS template

Performance
Analysis

HLS Directive
Optimization

Template Engine

Inference
Model

Computation
Graph

Synthesizable
C++ Code

Synthesis

Weights &
Coefficients

Quantization

Memory
Initialization file

Hardware
Implementation

PyTorch
Implementation

Model Traning

Figure 5.5: Framework.
the trade-off between Te and accuracy. Model and training algorithms are implemented in
PyTorch. We tested our model with MNIST dataset. It consists of 60000 hand written digits
images for training and 10000 images for evaluation. Images size is 28 × 28. Results are
shown in Figure 5.6. x-axis indicates the value of Te , y-axis represents accuracy.
We use proposed temporal SNN model to build a Multi-layer Perceptron (MLP) of structure 784-500-500-10 and a convolutional SNN (32C3-32C3-64C3-64C3-512-10). Pixel value is
treated as current to proposed encoder neuron and converted to spatial temporal spike patterns. Their accuracy with respect to Te are depicted by blue solid line and orange solid line
respectively. Since no prior works have studied this trade-off, we use an SNN converted from
an artificial neural network (ANN) (784-500-500-10). It can only process rate encoded inputs.
All neurons here are simple integrate and fire neurons. Its accuracy is shown by the gray dash
line. We also use the source code released by [31] to apply weight-threshold normalization
to the converted SNN, its accuracy is shown by the green dash line. Finally, we collected
related works’ encoding window and accuracy, which are shown as marks in Figure 5.6.

60

The original ANN achieves 98.6% accuracy, however, the accuracy of converted SNN
drops significantly (96.23% with Te = 100). As the Te decreases, the accuracy degradation
is even worse because, reducing Te actually reduces data precision. In addition, neurons do
not have time to accumulate enough input spikes to trigger the generation of output spikes
in a short window. As the network goes deeper, the activity of the upper layer neurons
will be sparser and hence more sampling may incur when rate coding is used. Weightthreshold normalization mitigates this issue by adjusting the threshold. It significantly
recovers accuracy. However, it still has a tendency of accuracy drop as the Te gets smaller.
Most existing works use Te ≥ 50, [35] achieved accuracy 98.64% with Te = 1000. [18] achieved
98.89% with Te = 30. Our approach always performs better under different Te . We achieved
98.89% with Te = 10. Two extreme cases are noteworthy, in which Te = 4 and Te = 7.
Output neuron of SNN that has L layers can fire once at most when Te = L + 1. With Te = 7
our convolutional SNN can indicate correct result by just one spike at accuracy 99.01%. The
best accuracy achieved by our MLP-SNN and convolutional SNN are 98.96% and 99.42%
respectively. This is comparable to the performance of DNN. Therefore, our coding method
and optimization are capable to reliably encode information with small encoding window,
reducing the computation time. This also helps reducing latency in real-time applications.

5.5.2

Neural Coding Evaluation
Table 5.1: Dataset statistics.
MNIST
Fashion MNIST
Articulary Word
Basic Motions
Epilepsy
Finger Movements
Japanese Vowels
Racket Sports
Spoken Arabic Digit

61

Length
144
100
206
50
29
30
90

Class Number
10
10
25
4
4
2
9
4
10

Size
784
784
9
6
3
28
12
6
13

100
98

Accuracy

96
94
This work (MLP SNN)
Conversion
Diehl et al.,2015
Wu et al., 2018
Tavanaei et al., 2019
Shrestha et al., 2019

92
90
88

4

5

6

7

8

9

10

15

This work (Conv. SNN)
Conversion & W/T Norm
Lee et al.,2016
Shrestha et al., 2018
Hunsberger et al., 2015
20

25

30

50

100 200 350 1000

Encoding Window Size
Figure 5.6: Accuracy with different encoding window.
We demonstrate advantages of our coding scheme on static image datasets MNIST, Fashion MNIST and multivariate time series dataset collected from [68]. Dataset statistics are
shown in Table 5.1. Fashion MNIST has the same image size, class number and training/testing splits as MNIST, but it is more challenging as it is generated from realistic
images of fashion products. Multivariate time series classification (MTSC) is non-trivial
in SNN domain, as they are heterogeneous continuous data streams collected from various
sources, posing challenges on spike coding. To our best knowledge, there is no prior SNN
models targeting such applications. We test four variants of proposed coding scheme. Given
input data that has N dimensions, identical 1-to-1 coding uses one dedicated neuron to encode one input dimension, and all encoder neurons are identical, we set g = 1, e = 1 and
b = 0. When input is static data such as image, this is essentially rate-coding. In optimized
1-to-1, encoder parameters are optimized by backpropagation, hence encoder neurons differ
from each other. In random population coding, input is fully connected to an encoder population. b and e are uniformly distributed in (−0.9, 0.9), g is set to 1, encoder parameters
are fixed. In optimized population coding, encoders are optimized by backpropagation.
We use the same configurations as section 5.5.1. for MNIST and Fashion MNIST. Results

62

Table 5.2: Accuracy of different coding method.

Dataset

Our Model
Identical Optimized Random Optimized
1-to-1
1-to1 population population

MNIST
0.9868
Fashion
0.892
MNIST
Articulary
0.723
Word
BasicMotions 0.702
Epilepsy
0.401
Finger
0.572
Movements
Japanese
0.184
Vowels
RacketSports 0.618
SpokenArabic
0.778
Digit

Baseline

0.9871

0.951

0.9888

EMSTDP
0.973

LSTM

TapNet

DTW

-

-

-

0.894

0.834

0.897

0.861

-

-

-

0.786

0.923

0.98

-

0.973

0.987

0.98

0.850
0.442

1
0.971

1
0.961

-

0.925
0.834

1
0.971

1
0.978

0.590

0.56

0.62

-

0.625

0.53

0.52

0.392

0.91

0.969

-

0.935

0.965

0.959

0.664

0.822

0.869

-

0.877

0.868

0.842

0.814

0.957

0.99

-

0.978

0.983

0.96

are shown in Table 5.2. EM-STDP is a training algorithm for SNN [38]. For these two
image datasets, there is no significant difference with identical 1-to-1, optimized 1-to1, and
optimized population. As the inputs are homogeneous, conventional rate-based coding is
good enough, though there is slight improvement in optimized population coding.
In MTSC tasks, superiority of our coding scheme is obvious. We use two-layer SNN
of size X-300-300-C, where X and C are input size and number of classes respectively. A
two-layer stacked LSTM with unit size 300 is built as baseline. Encoding window Te is the
same as time series’ length because encoder converts input to spikes on-the-fly. Identical
1-to-1 coding leads to worst performance, even the optimized one still has poor performance.
Reason is discussed in section 5.2.2. As multivariate times series consists of heterogeneous
input, one mapping function cannot satisfy all different inputs. Neuron only encodes stimulus effectively in a specific range [67], it is difficult for a LIF neuron to cover entire input
space. The random population performs surprisingly well, it achieves competitive accuracy with just randomly selected parameters. This shows the importance of heterogeneity,
even if an individual neuron cannot cover entire input space, the information loss can be

63

compensated by other neurons [76,77]. If the population is furtherly optimized by our training rule, it even outperforms DNN. Detailed results are listed in Table 5.2. TapNet [69]
is a DNN based model, DTW refers to dynamic time warping, a common approach for
MTSC. The advantages of our coding method are clearly seen in this task. It can faithfully convert continuous streams into spike domain on-the-fly, hence it enables novel SNN
applications and it is beneficial for complicated real-time tasks.
Table 5.3: Synthesis result.

5.5.3

Frequency

LUT

FF

BRAM

DSP

MLP MNIST

125 MHz

45.5%

33.8%

35.2%

40.8%

Conv. MNIST

125 MHz

56.9%

42.6%

30.9%

71.2%

MLP MTSC

125 MHz

20.4 %

22.3%

25.5%

14.8%

Hardware Performance

Our target platform is Zynq UltraScale+ XCZU9EG FPGA. We use Vivado HLS
2017 as synthesis backend.

Three network ares implemented, an MLP SNN (784-

500-500-10) and a convolutional SNN (28 × 28-32C3-P2-32C3-P2-256-10) for MNIST
classification and an MLP SNN (12-300-300-9) for multivariate time series Japanese
Vowels classification.

Weights and filter coefficients of trained model are rounded to

fixed point data type 16(4.12).

Detailed synthesis results are shown in Table 5.3.

Performance is obtained from synthesis report and simulation.
We tested the three networks on CPU, GPU, embedded GPU, and neuromorphic chip,
and results are shown in Table 5.5 and 5.6. Table 5.4 summarizes different platform configurations. NVidia RTX 5000 is a GPU which consists of 3,072 CUDA Cores, 384 Tensor
Cores and 16 GB GDDR6 memory. The GPU is mounted to a server equipped with 64G
memory and i9 9900k CPU. Jetson AGX Xavier is an edge deep learning platform, the GPU
employs Volta architecture, comprising 512 CUDA Cores and 64 Tensor Cores. Loihi is a

64

Table 5.4: Platform configurations.
Platform
This Work
RTX 5000
AGX Xavier
i 9900K
Loihi
SpiNNaker [82]
TrueNorth [45]
Minitaur [83]

Type
FPGA
GPU
Edge GPU
CPU
ASIC
ASIC
ASIC
FPGA

Drequency (Hz)
125 M
1.62 G
1.37 G
3.7G
180 M
75 M

1

TrueNorth and Loihi adopt asynchronous event-driven architecture, there is no global clock
in these chips.
Table 5.5: Cross-platform Comparison on MNIST
Platform
This Work
RTX 5000
AGX Xavier
i 9900K
Loihi
SpiNNaker
TrueNorth
Minitaur

Network

FPS

Latency (ms)

Power (W)

CNN
CNN
CNN
CNN
CNN
DBN
CNN
CNN

2124
864
211
100
671
50
1000
108

7.83 (0.52)
18.5
75.8
402.5
1.49
20
1
9.2

4.5
61.2
14
58.1
3.76
0.3
0.18
1.5

Energy
Efficiency
471
14.1
15
1.72
178
167
9259
72.4

1

Accuracy on MNIST: This work (99.2%), Loihi (98%), SpiNNaker (95%), TrueNorth
(95%), Minitaur (92%).
Table 5.6: Cross-platform Comparison on MTSC.
Platform
This work
RTX 5000
AGX Xavie
i 9900K

Network

FPS

Latency (ms)

Power (W)

MLP
MLP
MLP
MLP

717
231
61
220

22.3
72.7
261.7
69.1

4.5
48.5
13
57.1

Energy
Efficiency
201.8
4.7
4.7
3.9

state-of-the-art 14 nm technology neuromorphic processor developed by Intel in 2018 [71].
It consists of 128 neuron cores and 33 MB on-chip memory supporting up to 1.3 million neurons and 130 million synapses. We use Kapoho Bay system, which is a USB stick equipped
with 2 Loihi processors. We use batch size 16 on all platforms except Loihi as it does not
65

support batch processing. We achieved the highest throughput. The design is capable to
process 2124 MNIST images in one second, which is 3.16×, 2.12× and 42.5× improvement
compared with dedicated neuromorphic chip Loihi, TrueNorth and SpiNNaker [82]. Since
batch processing is not supported by ASIC platform Loihi, SpiNNaker, TrueNorth, and
FPGA platform Minitaur, they can only process one sample a time. For fair comparison,
we set batch size to 1 and the latency is 0.52 ms. We achieved 2.9×, 38.5×, 1.9× and
3.3× speedup compared with these works. Compared with RTX 5000, Loihi and SpiNNaker,
the improvements of energy efficiency (image/J) are 33.6×, 2.6× and 2.82× respectively.
Though TrueNorth achieved the highest energy efficiency, it is at the cost of oversimplified
SNN model, strict network constraints and degraded accuracy (95% on MNIST). TrueNorth
only supports IF neuron, and fan-in and fan-out of a neuron cannot be larger than 256,
network has to be specifically designed and trained to support TrueNorth. The oversimplified model also makes it difficult to efficiently perform temporal pattern recognition tasks.
Our work can provide flexibility for various SNN models, support for synapse dynamics,
but still achieved the best throughput, latency and energy efficiency.

5.6

Conclusion

In this work, we proposed a systematic optimization framework for SNN FPGA implementation. To reduce inference latency and improve neural coding accuracy, we proposed a
coding scheme which can reliably encode information into spatial temporal patterns with
small encoding window. Strategies to select coding scheme is also studied. To unleash the
computational power of SNN, we proposed a flexible SNN model that retains critical SNN
characteristics with reduced complexity. We developed an efficient hardware architecture,
an end-to-end flow to deploy SNN in FPGA and optimization strategy to improve throughput. Experiments show that proposed optimization methodology can significantly improve
performance in terms of latency and accuracy. Cross-platform comparison shows proposed

66

implementation outperforms high-end GPU and dedicated neuromorphic processors. Proposed optimization methodology is not specific to FPGA platform, the neural coding and
flexible SNN model can also provide insights for neuromorphic chip design.

67

Chapter 6
Neuromorphic Algorithm-hardware
Codesign for Temporal Pattern
Learning
6.1

Introduction

The brain-inspired Spiking Neural Network (SNN) has drawn interest as it demonstrates
high energy efficiency in performing machine learning tasks. In contrast to artificial neural
networks, which can be generally formulated as matrix operations, SNN is a neural network
with dynamics, and time is an essential computing element in its computation. Another
unique feature of SNNs is that the neuron’s output is a discrete spike event, and hence
information in a SNN is delivered and processed as spike trains. How the information is
encoded into spike trains is still an open question. Existing approaches can be roughly
divided into rate coding models and temporal coding models. Rate coding models assume
there is a time window, where spike number within the window resembles a continuous value
in deep neural network (DNN). Inputs in each window are spike trains of particular rates that
are processed window by window. Hence, a purely rate-based system suffers from significant

68

latency and inefficiency when processing time-varying inputs [17, 21]. Rate-based models
only consider spike statistics inside each window, ignoring dependencies in spike trains and
neuron states, which are proven to be critical in information processing [21].
In biological neural systems, an aforementioned window is hardly observed in which
neurons fire at particular rates. Instead, SNNs are inherently dynamical systems dealing with
time-varying stimulus. Inputs from physical environments change over time, and neurons’
output spike patterns are time-varying correspondingly. Information is naturally embedded
in the temporal structure of spike trains [21], referred to as temporal coding. Temporal coding
models allow fast and accurate information processing and are therefore beneficial for low
latency and real time applications, such as classifying sensor readings or generating control
signals to respond to external stimulus [21]. To exploit the potential of SNNs to process
temporal information, a novel training algorithm is required. Recently, there have been
works employing Backpropagation Through Time (BPTT) to train SNNs to learn temporal
patterns [47,48]. However, these works use relatively complex neuron models and dynamics,
posing challenges for deploying trained models on hardware. Therefore, a model with reduced
complexity but still retains critical aspects of neural dynamics and a training algorithm that
can train such a model to learn temporal information is highly desirable for hardware design.
In the conventional neural network or neuromorphic computing hardware implementation, synaptic weights are stored in SRAM cells. The computations often involve accessing to
SRAM, accumulating activations. The frequent data accessing and movement cause power
and latency overhead. In addition, a SRAM cell can require six or more transistors for a
single bit, consuming much power and area. Therefore, Resistive Random-access Memory
(RRAM) has drawn much attention, as it enables integrated data storage and computation.
A memristor-based memory can store one or more bits of a synapse weight in a single device. Furthermore, RRAM reads are extraordinarily quick and consume little power, and
computations with these synaptic weights can be executed directly in the memory without
the need to retrieve data [84]. These computations are carried out by applying voltages

69

to the word-lines of the memory, generating proportional currents through each cell that
are accumulated at each bit-line. While an DNN implemented in an RRAM-based system
would require costly digital-to-analog and analog-to-digital converters to interface with the
array, SNNs are uniquely compatible for these systems. Binary spike inputs generate currents that are integrated on capacitors at each bit-line, and an analog comparator spikes at a
threshold, discharging the capacitor [85]. However, many existing works assumes rate-based
model and ignores the neural dynamics such as [86]. Indeed, there are hardware designs
that stress the realistic neural model such as BrainScaleS [87]. Such designs target at neuroscience research rather than machine learning tasks, they can replicate realistic neural
dynamics however at the cost of additional design complexity. On one hand, oversimplified
design limits the potential of SNN, on the other hand, the complexity introduced by implementing biologically realistic neural dynamics may not be necessary for machine learning
tasks. This motivates us to find a sweet spot between hardware complexity and computational efficiency. That is, we aim at developing a hardware architecture that exhibits critical
components of neural dynamic without introducing significant overhead.
In this work, our contributions are summarized as follows:
• We derive a training algorithm for hardware friendly Leaky Integrate and Fire (LIF)
spiking neuron model.
• The model allows us to process spatial temporal pattern without explicit recurrent network structure, thus reduces the complexity of crossbar based implementation. Model
simplification and transformation are also adopted to facilitate hardware implementation.
• Though employing a simplified neuron model, our algorithm can still train SNNs to
learn complex spatial temporal patterns. This enables novel applications of SNNs,
rather than static image classification.
• We achieve competitive accuracy on common SNN datasets : 98.40 %, 85.69 % on
70

N-MNIST and Spiking Heidelberg Digits (SHD) datasets.
• We codesign a neuron circuit for RRAM-based architectures that replicates the dynamics of our hardware friendly neurosynaptic model.

6.2

Model

We start from a standard Leaky Integrate and Fire (LIF) model, which is defined by Ordinary
Differential Equations (ODEs) and event-triggered update [20]:

M

τ

X
dv(t)
= −v(t) +
wi xi (t)
dt
i

(6.1a)

v(t) ← vrest , if v(t) = Vth

(6.1b)

where v(t) is neuron’s membrane potential, τ is a time constant, xi (t) is input of the ith
synapse, wi is associated weight, M is the synapse number, Vth is threshold, vrest is rest potential, and for simplicity, we set vrest = 0. When v(t) exceeds Vth , v(t) is set to vrest , this is
called ’hard reset’. It is obvious that the spiking neuron is a stateful system as v(t) integrates
input over time, however the temporal dependency is implicit in this model. Since our purpose is to utilize the temporal dynamics of SNNs, next we employ the Spike Response Model
(SRM) framework [17] to derive an explicit temporal dependency. To simplify discussion let
M = 1, (6.1a) describes a Linear time-invariant (LTI) system. An LTI system can also be
characterized by impulse response. Solving (6.1a) with initial condition v(0) = 0 and a pulse
−t

as input, we can obtain kernel k(t) = e τ [17]. For a neuron with M input synapses, we have

P SP (t) =

M
X
i

Z

∞

−t

e τ xi (t − s)ds

wi

(6.2)

0

P SP (t) is referred as post-synaptic potential, it is the theoretical neuron potential with-

71

Voltage

Voltage

Synapse 1 PSP

Time

Summation of PSPs
Threshold

Voltage

Synapse 2 PSP

Input spikes

Time

Neuron

Output spikes

Figure 6.1: Synapse and adaptive threshold dynamic.
out reset. SRM implements reset by a negative charge induced by output spike to neuron itself, which can also be characterized by a reset kernel h(t). In principle, h(t) can
−t

have various forms, and we choose h(t) = e τr , where τr is a time constant. Therefore,
(6.1a) - (6.1b) can be converted to following form:
Z
v(t) = −ϑ

∞

h(s)O(t − s)ds +
0

where O(t) =

P

M
X

Z

i

∞

k(s)xi (t − s)ds

wi

(6.3)

0

δ(t − tf ), tf ∈ {tf : v(tf ) ≥ Vth } is a sequence of time-shifted Dirac

Delta functions, representing neuron’s output spike train, ϑ determines the strength of
the reset charge. SRM has following advantages: 1) equation (6.3) establishes an analytical relation of temporal dependency, at any time t, v(t) is determined by entire input and output history; 2) neuron is less likely to fire after reset, however, ODE model
cannot quantitatively reflect such likelihood due to the hard reset, while h(t) serves as a
bridge between previous history and current output. 3) the hard reset simply set membrane potential to 0, impairing historical information.
One potential problem with equation (6.3) is that voltage subtraction is more difficult

72

to realize than addition. However, resetting neuron by a negative charge is equivalent to
adopting an adaptive threshold, such that voltage subtraction is avoided. We rewrite (6.3) as:

P SP (t) =

M
X

Z

∞

k(s)xi (t − s)ds

wi

(6.4a)

0

i

Z

∞

h(s)O(t − s)ds

r(t) = ϑ

(6.4b)

0

O(t) =

X

δ(t − tf ), tf ∈ {tf : v(tf ) ≥ Vth + r(t)}

(6.4c)

where r(t) is the amplitude of the reset charge. Instead of comparing v(t) with a fixed
threshold, in this form, P SP (t) is compared with a time-varying threshold Vth + r(t) which
adaptively changes with respect to output spike activity. This leads to adaptive threshold
circuit design to be detailed in Section 6.4. Figure 6.1 illustrates the dynamic of synapse
and threshold. The left two figures indicate the synapse PSP change with respect to input
spikes. Blue curve in top right figure indicates summation of PSPs, and the orange curve is
the threshold. After issuing a spike, the threshold increases immediately, and exponentially
decays over time. The importance of such reset mechanism and drawback of ODE model
with hard reset will be shown by experiment in Section 6.5.1.
Equation (6.3) also has a physical interpretation, which explains the neuron’s capability to remember information and provides insights to circuit design. Essentially, h(t) and
k(t) are two first order low-pass filters. A neuron with M input synapses is expressed as
M + 1 filters. Therefore, neuron’s memory is distributed to filters, and the filter states
are never cleared. Unlike in the ODE model (6.1), memory only exists in neuron, after a
hard reset, historical information is discarded. A first order low-pass filter can be implemented by a RC circuit, with an input signal applied across a resistor and capacitor in series
and an output signal measured across the capacitor. The values selected for these discrete
components largely depends on the duration of a single time-step ∆t in the physicallyrealized algorithm. The time constant of a filter τ can be represented by the equation
73

τ =

RC
.
∆t

While recurrent structure causes additional complexity in memristor crossbar

based non-digital design, the filter-based model allows us to process temporal sequence with
a feedforward network, hence reduces difficulty in implementation.

6.3

Training Algorithm

Our next step is to derive training algorithm based on BPTT, it is necessary to get a
step-based updating rule. By using Z-transform [20], we can obtain the digital counterpart of filter k(t) and h(t) in discrete time domain:

k[t] = e

−1
τ

k[t − 1] + x[t]

−1

h[t] = e τr h[t − 1] + O[t − 1]

(6.5a)
(6.5b)

where t ∈ Z≥0 . Therefore, (6.3) can be transformed into a system of difference equations:

vl [t] = gl [t] − ϑhl [t]

(6.6a)

gl [t] = Wl kl [t]

(6.6b)

−1

hl [t] = e τr hl [t − 1] + Ol [t − 1]
kl [t] = e

−1
τ

kl [t − 1] + Ol−1 [t]

Ol [t] = U (vl [t] − Vth )
U (x) = 0 if x < 0

else 1

(6.6c)
(6.6d)
(6.6e)
(6.6f)

where l denotes layer index, Wl ∈ RNl ×Nl−1 is weight matrix, Nl is size of lth
layer, gl [t] is weighted input (PSP), Ol [t] is neuron output at time t, U (x) is
Heaviside step function.

Like (6.4a) to (6.4c), equation (6.6a) (6.6e) and (6.6f)

74

O[t-1]

O[t]

O[t+1]

v[t-1]

v[t]

v[t+1]

h[t-1]

h[t]
k[t-1]

Time t - 1

h[t+1]
k[t]

k[t+1]

Time t

Time t + 1

Figure 6.2: Unfolded network.
can also be interpreted as adaptive threshold:

O[t] = 1 if g[t] > ϑh[t] + Vth

else 0

(6.7)

Equation (6.6a) to (6.6f) exhibit recursive form, after unfolding BPTT can be applied.
The unfolded network is shown in Figure 6.2. Let E be the loss and δl [t] =
the error signal.

be

δl [t] can be propagated recursively:

T
δl [t] = Wl+1
(l+1 [t]δl+1 [t]) − ϑδl [t + 1]l [t + 1]

where l [t] =

∂E
∂Ol [t]

∂U (vl [t]−ϑ)
.
∂vl [t]

(6.8)

Equation (6.6a) to (6.6e) are differentialble, however

derivative of U (x) is a Dirac Delta function, which is a roadblock of backpropagation.

We solve this issue by pseudo-gradient [47], such that U 0 (x) is approximated

by the directive of a complementary error function:
x2
∂( 12 erfc( √x2σ ))
exp (− 2σ
∂U (x)
2)
≈
=− √
∂x
∂x
2πσ

where σ determines the sharpness of derivative.

75

(6.9)

We consider two learning tasks. First is the conventional classification task, we use
cross-entropy loss and spike rate is mapped to probability by Softmax function. In second
task, we consider a non-trivial scenario. As discussed in section 6.1, SNN can generate
particular temporal patterns in response to specific input patterns, which is common in
control and memory system [17, 21]. Therefore, in this task, we aim at training the network
to generate output O[t] which temporally resembles target spike train Starget [t]. In other
words, the goal is to train network to generate spikes at specific times. The loss is defined
as the distance between O[t] and Starget [t]. Essentially, spike train is a sequence of time
shifted Dirac Delta functions, it is difficult to directly measure the similarity. We employ
kernel method [88] to map discrete spike train to continuous trace, such that the distance
between two arbitrary spike trains Si [t] and Sj [t] is calculated as:
T
1 X
(f [t] ∗ Si [t] − f [t] ∗ Sj [t])2
D(Si , Sj ) =
2T t=1

(6.10)
−t

−t

where T is spike train length, ∗ denotes convolution and kernel function f [t] = e τm − e τs .
The loss is the summation of distances between each output and target pairs:

E=

NL
X

i
D(OiL , Starget
)

(6.11)

i=1

where i denotes output and target spike train index ,NL is number of output spike trains.

6.4

Hardware Design

Equation (6.3) reveals the filter nature of spiking neurons. This provides insights to physical
realization:
• Biological plausible neural dynamic can be achieved by low-pass filters, reducing circuit
design complexity.

76

Synapse filter

Matrix-vector multiplication

k1

Synapse filter

k2

Synapse filter

k3

Neuron compares
weighted PSP with
threshold
Neuron
curcuit
Neuron
curcuit
Neuron
curcuit

Temporal coded Synapse converts
spike to PSP
Spike trains

Synapse filter

kn

Synaptic weight wi,j

Neuron
curcuit

g1

g2

g3

gm

Figure 6.3: System structure.
• Adaptive threshold, an important neural dynamic, can be implemented by a comparator and a low-pass filter.
Figure 6.3 shows the overall structure of our hardware design. An array of synapse filters
receives time-varying spike trains as input, implementing filter k(t). The output of synapse
filter array is fed into crossbar to perform matrix-vector multiplication, corresponding to
(6.6b). The resulting PSPs are compared with threshold in neuron circuit. Output spikes
of neuron circuit are sent to synapse filter array of next layer.
Implementation of proposed neuron model and synaptic dynamics is relatively simple.
An incoming input spike is shaped by the RC filter into an exponentially decaying waveform. These voltage waveforms are applied to their respective word-lines to induce a current in each bit-line that can be interpreted as the dot-product of the bit-line synapse
conductances and word-line voltages. These dot-products are the PSPs. At the edge of
each bit-line is a single resistor that converts the output current of the bit-line to a voltage. While this resistor may have an effect on the resulting current, this can be avoided
with the inclusion of a current amplifier between the output of the bit-line and the re77

Table 6.1: Parameters
Parameter
Value Parameter Value Parameter Value
Optimizer
AdamW Batch size 64
τ
4
Learning rate
0.0001
τr
4
τm
4
(classification)
Learning rate
√1
0.001
σ
τs
1
2π
(pattern association)
sistor as in [89]. We ignore this effect for the sake of this work as it should only affect
the magnitude of the resulting current and not the shape.
The generated PSP voltages (k(t)) are applied to the positive input of an operational
amplifier, which acts as a comparator. The threshold voltage at the negative input is the the
feedback of the comparator through a separate, identical RC filter (h(t)) plus a bias (Vth ).
The bias is added to h(t) through the use of a separate operational amplifier that gives the
comparator’s negative input an offset. When k(t) rises above the threshold, the comparator
goes high, which in turn raises h(t) and the threshold voltage, causing the amplifier to go
low again and creating a spiking behavior. The output of the comparator is buffered to
the output of the neuron by two inverters to ensure the magnitude of the output spikes is
VDD . These spikes can then be buffered to the input of the next layer.
In the first RC filter, a voltage spike over a very short period of time is charging the
capacitor to a voltage that will slowly decay. In order to ensure that the voltage reached by
the capacitor produces an output within the operating range of the amplifier in the neuron
circuit, a level-shifter can be used between layers to raise the input-spike amplitude above
VDD . In order for the amplifier to drive the second RC filter, the amplifier must have a
strong second-stage capable of driving the load. The circuit diagram for our neurosynaptic
circuit with a single neuron and single synapse can be seen in Figure 6.6.

78

Figure 6.4: N-MNIST and SHD dataset samples.

6.5

Experiments and Evaluations

To demonstrate the effectiveness of our model and training algorithm, we applied them on
classification and temporal pattern association tasks. The second is non-trivial as it requires network to recognize specific spatial temporal input patterns and also produce spike
trains that follow particular patterns. Our model and training algorithm are implemented
in PyTorch. Detailed parameters are given in Table 6.1. Results of circuit level simulation is also provided and compared with the desired behavior.

6.5.1

Spiking Dataset Classification

We tested on two spiking datasets and compared our neuron model with standard LIF model
to demonstrate the proposed model’s advantage in temporal pattern classification.
N-MNIST classification.

Neuromorphic MNIST (N-MNIST) is the dynamic

version of MNIST handwritten digits, captured by recording MNIST images displayed on
LCD screen using a DVS camera mounted on a moving platform. Once the brightness
change at position (x, y) exceeds a certain threshold, a spike event is triggered.

79

The

Table 6.2: Classification Results
N-MNIST
Model
Accuracy
This work
98.40
This work (HR)
95.31
Spiking MLP [35]
98.66
Phased LSTM [91]
97.28
Spiking CNN [92]
95.72
Graph CNN [93]
98.5
Spiking CNN [95]
98.32
data exhibits complex temporal patterns.

SHD
Model
This work
This work (HR)
Spiking MLP [90]
R-SNN [90]
LSTM [90]
R-SNN [94]
SRNN [96]

Accuracy
85.69
26.36
47.5
83.2
89.0
82.0
84.4

A sample of N-MNIST demonstrating the

spatial temporal spike distribution is shown in Figure 6.4 (a).
dot represent spikes of two channels.

The blue and orange

Our network to classify N-MNIST is a Multi-

Layer Perceptron (MLP), structure is (34×34×2)−500−500−10.
existing works are shown in Table 6.2.

Comparisons with

We achieved 98.40% accuracy without more

expensive training techniques such as error normalization in [35].

We outperformed

convolutional SNNs such as [92, 95] and DNNs such as [91, 93].
SHD classification. Spiking Heidelberg Digits (SHD) is a speech dataset in spike
format, consists of spoken digits recordings ranging from 0 to 9 in English and German [90].
Spikes are generated by converting audio recordings using artificial inner ear model, resulting
700 spike trains with vary temporal structures. The dataset aims at mimicking the spike
activity of biological auditory system, and the spike events exhibit complex spatial temporal
distribution. The dataset has 20 classes, and splits into 8156 and 2264 samples for training
and testing respectively. A sample of the SHD dataset is shown in Figure 6.4 (b), in which
a dot at (x, y) indicates a spike event in yth channel at time x.
We built a spiking MLP to classify this dataset. Network structure is 700−400−400−20.
Results are shown in Table 6.2. We achieved 85.69% accuracy, which is the best in SNN
domain. It is noteworthy that [90, 96] employ explicit recurrent architecture, i.e. neurons
within one layer are also connected to each other through synapse connection. Such network topology is difficult to implement on crossbar. Our results show that it is possible
80

to achieve similar performance with simple feedforward structure.
Importance of adaptive threshold. The ’HR’ in Table 6.2 refers to ’hard reset’.
We keep the network structure and weights, but swap the neuron to ODE model defined
by (3.1). On N-MNIST dataset, the accuracy dropped to 95.31%, and on SHD dataset,
the accuracy dropped to 26.36%. This clearly shows the drawback of widely used hard
reset model, as claimed in Section 6.2, hard reset clears historical information, therefore
it impairs the capability of SNN to process temporal information. The accuracy degradation on SHD is more significant than N-MNIST. We presume that it is because the
richness of temporal information in the two datasets are different. [90] claimed that spike
timing is essential in SHD dataset. [97] reported that rate-based model can achieve high
accuracy on N-MNIST with data pre-processing, the temporal information is not dominant. Above results justify the capability of our neuron model and the necessity of proposed circuit design, to apply neuromorphic hardware to temporal pattern classification,
neural dynamic should be taken into consideration.

6.5.2

Spatial Temporal Pattern Association

In this task, we build a fully connected network with size 700(input)-500-500-300. The
objective is to train the network to produce a specific spatial temporal pattern when given
another spatial temporal pattern as input. We randomly pick 1000 samples from SHD dataset
as input patterns. Then we convert handwritten digit images representing 0 to 9 to spike
patterns by regarding pixel (x, y) as a spike event in yth spike train at time x. The input has
700 spike trains of length 300, the target output has 300 spike trains of same length. The
network is trained to output corresponding handwritten digits pattern when given a SHD
sample. For example, when given the audio representation of digit 3, the network generates
spike pattern that resembles handwritten digit 3. Example inputs and outputs are shown
in Figure 6.5. The x axis represents time, and y axis represent spike train index, a dot at
coordinate (x, y) represents an input/output spike in yth spike train at time x. This task
81

Input spike
patterns
Output spike
patterns

Figure 6.5: Pattern association input and output samples.
demonstrates the advantage of our training algorithm. It can train SNN to learn precise
spike timings, enabling novel applications beyond simple static image classification.

6.5.3

Hardware

A circuit implementation was designed and simulated in Cadence Virtuoso using TSMC
1V-65 nm technology node. A diagram of the designed neuron circuit can be seen in Figure 6.6. The synaptic circuit is an RC filter placed at the input of the RRAM array and
filters incoming spikes for each word-line. A resistor is placed between the output of bitline and ground to convert the synaptic current into voltage (PSP). Beyond these devices,
the circuit consists of RC filter in the neuron, an operational amplifier that behaves as a
comparator, another operational amplifier acting as our bias voltage source (Vth ), and two
simple inverters. We chose a physical step size, which corresponds to the width of an in-

82

Synapse filters RRAM crossbar
Rsyn

Neuron circuit

WL1
Csyn
Rnrn

Rsyn

WL2

Csyn

Cnrn
Rs

Rsyn

-Vth

R1

R2

WLn
Csyn
BLi

Figure 6.6: Diagram of synapse filter, RRAM crossbar, and neuron circuit.
put spike, of 10ns. To achieve a similar time constant τ to that in the simulations, we
chose an RC filter with R = 4.56kΩ and C = 10.14pF , which gives us the desired 40ns
time constant. A bias voltage of 550mV was chosen for an initial experiment to ensure
that the neuron would not spike with every input spike.

(a) Bit line output, PSP, threshold, input
and output spike.

(b) Comparator output
and feedback voltage.

Figure 6.7: Circuit simulation results.
The dynamics of the implemented circuit can be seen in Figure 6.7. A given input
spike train is filtered to produce k(t), and the output of the comparator rises as the bit83

98.5%

98.3%

Accuracy

98.1%
97.9%

97.7%
97.5%
97.3%
97.1%

4 bit

96.9%

5 bit

96.7%
96.5%

0

0.05

0.1

0.15

0.2

0.25

0.3

Process variation

0.35

0.4

0.45

0.5

Figure 6.8: Accuracy under different quantization level and process variation.
line output g(t) surpasses the overall threshold Vth + r(t). The output of the comparator is filtered to produce h(t), shown as ’feedback’ in Figure 6.7 (b). This feedback voltage goes back to comparator, turning off the comparator as the threshold increases. The
threshold decays slowly back to the bias as h(t) decays, but is high enough to prevent
a subsequent input spike from inducing an output spike. Thus, the circuit replicates the
dynamics demonstrated by the model. Because the output of the first amplifier is nonideal, shown by yellow curve in Figure 6.7, the inverters are used to generate spike with
ideal shape shown by dashed green curve in Figure 6.7 (b).
We next estimated area and power consumption of the circuit. An arbitrary input sample
of 300 time steps with 14 input spikes was provided as input to the circuit, and the threshold
bias was adjusted to replicate the neuron activation statistics of our software model. We
found that a single neuron and synapse circuit consumed a minimum of 1.067mW , a maximum of 1.965mW , and an average of 1.11mW . Integrating the power over the entire sample
duration, the circuit consumed 3.329nJ of total energy. We then calculated the footprint of
all devices in the circuit to estimate a total area of about 0.0125mm2 for a single neuron and
synapse circuit. Note that these estimates are independent of RRAM array size and energy

84

consumption, and are an estimate of the neurosynaptic dynamics circuitry.
We also simulated the influence of process variation and quantization using the same
N-MNIST classification model as in Section 6.5.1. Results of 4-bit and 5-bit quantization with process variation (resistance deviation) ranging from 0 to 0.5 are shown in Figure 6.8. Our model shows robustness to quantization and process variation. With 4-bit
level precision, 0.2 deviation, we achieved 97.97% accuracy.

6.6

Conclusion

In this work, we proposed algorithm hardware co-design for memristor based neuromorphic
computing systems. We developed a hardware friendly spiking neuron model and corresponding training algorithm to learn complex spatial temporal patterns. The algorithm has
achieved competitive performance on commonly used spiking datasets and enables novel applications. The hardware was designed based on proposed model, it exhibits critical neural
dynamics including synapse filter effect and adaptive threshold.

85

Chapter 7
Conclusion
7.1

Summary

In this thesis, a systematic algorithm-hardware codesign for neuromorphic computing including various aspects is presented. The contributions are summarized as follows:
• In Chapter 3, starting from ODE model, we derive a general formulation of SNN as a
network of filters. This model reflects the neuron’s filter dynamic, enabling temporal
pattern learning. An algorithm is proposed to train this model to learn spatial temporal
patterns. Experiments show that the proposed model and algorithm outperform state
of the art approaches on various datasets.
• In Chapter 4, we apply SNN to multivariate time series classification tasks. Evaluations
shows that the classification accuracies are comparable with LSTMs. Considering the
energy efficiency of SNN, this work can provide an alternative solution for power-limited
scenarios such as embedded systems and IoTs where real time signal classification is
required.
• In Chapter 5, we present a method to optimize neural coding. It enables novel applications such as time series classification in SNN domain. We also implement the

86

model proposed by Chapter 3 on FPGA platform, it outperforms CPU and GPU in
throughput.
• In Chapter 6, we present an algorithm-hardware codesign for RRAM based neuromorphic system. We develop a learning rule for hardware friendly LIF neurons. The
algorithm achieves competitive accuracy on common SNN datasets. Based on the LIF
model, we developed a neuron circuit using RRAM crossbars, it replicates the dynamics
of proposed model.

7.2

Future Research Directions

Majority neuromorphic hardware designs employ tile-based architecture [62] to maximize
parallelism. As a result, computing and data storage are tightly coupled. These designs
require to store parameters of entire network on chip. Therefore, most neuromorphic hardware are subject to constraints on network size, parameter number, neuron fanin/fanout.
For example, in RRAM crossbar based designs, neurons fanin/fanout are subject to crossbar
size, and typically crossbar size cannot be arbitrarily large due to manufacturing technology
limitations [98, 99]. Since modern neural networks have millions of parameters [100–102],
aforementioned limitations make it difficult to deploy neural networks on neuromorphic
chips [60, 103, 104]. Luckily, studies have found that there is sufficient redundancy on neural
networks, hence weights can be pruned to reduce network size [105, 106]. There are various techniques targeting at DNN pruning. However, as discussed in Chapter 2, there are
significant differences between DNN and SNN, whether the DNN pruning methods apply
to SNN is unknown. SNN pruning techniques are highly desirable as it can alleviate the
hardware constraints, hence makes neuromorphic hardware more practical.

87

Bibliography
[1] C. Mead, “Neuromorphic electronic systems,” Proceedings of the IEEE, vol. 78, no. 10,
pp. 1629–1636, 1990.
[2] A. G. Andreou, A. A. Dykman, K. D. Fischl, G. Garreau, D. R. Mendat, G. Orchard,
A. S. Cassidy, P. Merolla, J. V. Arthur, R. Alvarez-Icaza et al., “Real-time sensory
information processing using the truenorth neurosynaptic system.” in ISCAS, 2016, p.
2911.
[3] E. Covi, S. Brivio, A. Serb, T. Prodromakis, M. Fanciulli, and S. Spiga, “Analog memristive synapse in spiking networks implementing unsupervised learning,” Frontiers in
neuroscience, vol. 10, p. 482, 2016.
[4] M. Hu, H. Li, Y. Chen, Q. Wu, and G. S. Rose, “Bsb training scheme implementation
on memristor-based circuit,” in Computational Intelligence for Security and Defense
Applications (CISDA), 2013 IEEE Symposium on. IEEE, 2013, pp. 80–87.
[5] Y. Jin, W. Zhang, and P. Li, “Hybrid macro/micro level backpropagation for training
deep spiking neural networks,” in Advances in Neural Information Processing Systems,
2018, pp. 7005–7015.
[6] E. Hunsberger and C. Eliasmith, “Training spiking deep networks for neuromorphic
hardware,” arXiv preprint arXiv:1611.05141, 2016.

88

[7] P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan,
B. L. Jackson, N. Imam, C. Guo, Y. Nakamura et al., “A million spiking-neuron
integrated circuit with a scalable communication network and interface,” Science, vol.
345, no. 6197, pp. 668–673, 2014.
[8] S. B. Furber, D. R. Lester, L. A. Plana, J. D. Garside, E. Painkras, S. Temple, and
A. D. Brown, “Overview of the spinnaker system architecture,” IEEE Transactions on
Computers, vol. 62, no. 12, pp. 2454–2467, 2013.
[9] S. Park, A. Sheri, J. Kim, J. Noh, J. Jang, M. Jeon, B. Lee, B. Lee, B. Lee, and
H. Hwang, “Neuromorphic speech systems using advanced reram-based synapse,” in
Electron Devices Meeting (IEDM), 2013 IEEE International. IEEE, 2013, pp. 25–6.
[10] S. Carrillo, J. Harkin, L. J. McDaid, F. Morgan, S. Pande, S. Cawley, and B. McGinley, “Scalable hierarchical network-on-chip architecture for spiking neural network
hardware implementations,” IEEE Transactions on Parallel and Distributed Systems,
vol. 24, no. 12, pp. 2451–2461, 2013.
[11] A. Shrestha, K. Ahmed, Y. Wang, D. P. Widemann, A. T. Moody, B. C. Van Essen,
and Q. Qiu, “A spike-based long short-term memory on a neurosynaptic processor,”
in 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
IEEE, 2017, pp. 631–637.
[12] Y. Hao, X. Huang, M. Dong, and B. Xu, “A biologically plausible supervised learning
method for spiking neural networks using the symmetric stdp rule,” Neural Networks,
vol. 121, pp. 387–395, 2020.
[13] H. Fang, A. Shrestha, D. Ma, and Q. Qiu, “Scalable noc-based neuromorphic hardware
learning and inference,” in 2018 International Joint Conference on Neural Networks
(IJCNN). IEEE, 2018, pp. 1–8.

89

[14] R. Gütig and H. Sompolinsky, “The tempotron: a neuron that learns spike timing–
based decisions,” Nature neuroscience, vol. 9, no. 3, p. 420, 2006.
[15] R. Gütig, “Spiking neurons can discover predictive features by aggregate-label learning,” Science, vol. 351, no. 6277, p. aab4113, 2016.
[16] W. Maass, “Networks of spiking neurons: the third generation of neural network models,” Neural networks, vol. 10, no. 9, pp. 1659–1671, 1997.
[17] W. Gerstner, W. M. Kistler, R. Naud, and L. Paninski, Neuronal dynamics: From
single neurons to networks and models of cognition.

Cambridge University Press,

2014.
[18] Y. Wu, L. Deng, G. Li, J. Zhu, and L. Shi, “Spatio-temporal backpropagation for
training high-performance spiking neural networks,” Frontiers in neuroscience, vol. 12,
p. 331, 2018.
[19] H. Fang, A. Shrestha, Z. Zhao, and Q. Qiu, “Exploiting neuron and synapse filter
dynamics in spatial temporal learning of deep spiking neural network,” arXiv preprint
arXiv:2003.02944, 2020.
[20] R. Brette, M. Rudolph, T. Carnevale, M. Hines, D. Beeman, J. M. Bower, M. Diesmann, A. Morrison, P. H. Goodman, F. C. Harris et al., “Simulation of networks of
spiking neurons: a review of tools and strategies,” Journal of computational neuroscience, vol. 23, no. 3, pp. 349–398, 2007.
[21] C. Eliasmith and C. H. Anderson, Neural engineering: Computation, representation,
and dynamics in neurobiological systems. MIT press, 2004.
[22] W. Gerstner, “A framework for spiking neuron models - the spike response model,”
Handbook of Biological Physics, vol. 4, pp. 469–516, 2001.

90

[23] W. Gerstner and W. M. Kistler, Spiking neuron models: Single neurons, populations,
plasticity. Cambridge university press, 2002.
[24] D. Sterratt, B. Graham, A. Gillies, and D. Willshaw, Principles of computational
modelling in neuroscience. Cambridge University Press, 2011.
[25] J. Wu, Y. Chua, and H. Li, “A biologically plausible speech recognition framework
based on spiking neural networks,” in 2018 International Joint Conference on Neural
Networks (IJCNN). IEEE, 2018, pp. 1–8.
[26] Y. Miao, H. Tang, and G. Pan, “A supervised multi-spike learning algorithm for
spiking neural networks,” in 2018 International Joint Conference on Neural Networks
(IJCNN). IEEE, 2018, pp. 1–7.
[27] A. Mohemmed, S. Schliebs, S. Matsuda, and N. Kasabov, “Span: Spike pattern association neuron for learning spatio-temporal spike patterns,” International journal of
neural systems, vol. 22, no. 04, p. 1250012, 2012.
[28] R. Florian, “The chronotron: a neuron that learns to fire temporally-precise spike
patterns,” Nature Precedings, pp. 1–1, 2010.
[29] P. U. Diehl and M. Cook, “Unsupervised learning of digit recognition using spiketiming-dependent plasticity,” Frontiers in computational neuroscience, vol. 9, p. 99,
2015.
[30] Y. Wu, L. Deng, G. Li, J. Zhu, Y. Xie, and L. Shi, “Direct training for spiking neural
networks: Faster, larger, better,” in Proceedings of the AAAI Conference on Artificial
Intelligence, vol. 33, 2019, pp. 1311–1318.
[31] P. U. Diehl, D. Neil, J. Binas, M. Cook, S.-C. Liu, and M. Pfeiffer, “Fast-classifying,
high-accuracy spiking deep networks through weight and threshold balancing,” in 2015
International Joint Conference on Neural Networks (IJCNN). IEEE, 2015, pp. 1–8.
91

[32] K. Ahmed, A. Shrestha, Q. Qiu, and Q. Wu, “Probabilistic inference using stochastic
spiking neural networks on a neurosynaptic processor,” in Neural Networks (IJCNN),
2016 International Joint Conference on. IEEE, 2016, pp. 4286–4293.
[33] A. Shrestha, K. Ahmed, Y. Wang, and Q. Qiu, “Stable spike-timing dependent plasticity rule for multilayer unsupervised and supervised learning,” in Neural Networks
(IJCNN), 2017 International Joint Conference on. IEEE, 2017, pp. 1999–2006.
[34] A. Tavanaei and A. Maida, “Bp-stdp: Approximating backpropagation using spike
timing dependent plasticity,” Neurocomputing, vol. 330, pp. 39–47, 2019.
[35] J. H. Lee, T. Delbruck, and M. Pfeiffer, “Training deep spiking neural networks using
backpropagation,” Frontiers in neuroscience, vol. 10, p. 508, 2016.
[36] Y. Cao, Y. Chen, and D. Khosla, “Spiking deep convolutional neural networks for
energy-efficient object recognition,” International Journal of Computer Vision, vol.
113, no. 1, pp. 54–66, 2015.
[37] S. Esser, P. Merolla, J. Arthur, A. Cassidy, R. Appuswamy, A. Andreopoulos,
D. Berg, J. McKinstry, T. Melano, D. Barch et al., “Convolutional networks for
fast, energy-efficient neuromorphic computing. 2016,” Preprint on ArXiv. http://arxiv.
org/abs/1603.08270. Accessed, vol. 27, 2016.
[38] A. Shrestha, H. Fang, Q. Wu, and Q. Qiu, “Approximating back-propagation
for a biologically plausible local learning rule in spiking neural networks,” in
Proceedings of the International Conference on Neuromorphic Systems, T. E. Potok
and C. D. Schuman, Eds., ACM.

ACM, 2019, pp. 1–8. [Online]. Available:

https://doi.org/10.1145/3354265.3354275
[39] H. Fang, A. Shrestha, Z. Zhao, Y. Li, and Q. Qiu, “An event-driven neuromorphic system with biologically plausible temporal dynamics,” in 2019 IEEE/ACM International
Conference on Computer-Aided Design (ICCAD). IEEE, Nov 2019, pp. 1–8.
92

[40] T. Gollisch and M. Meister, “Rapid neural coding in the retina with relative spike
latencies,” science, vol. 319, no. 5866, pp. 1108–1111, 2008.
[41] D. A. Butts, C. Weng, J. Jin, C.-I. Yeh, N. A. Lesica, J.-M. Alonso, and G. B. Stanley,
“Temporal precision in the neural code and the timescales of natural vision,” Nature,
vol. 449, no. 7158, p. 92, 2007.
[42] Z. Pan, J. Wu, M. Zhang, H. Li, and Y. Chua, “Neural population coding for effective
temporal classification,” in 2019 International Joint Conference on Neural Networks
(IJCNN). IEEE, 2019, pp. 1–8.
[43] A. Borst and F. E. Theunissen, “Information theory and neural coding,” Nature neuroscience, vol. 2, no. 11, pp. 947–957, 1999.
[44] M. W. Kadous et al., Temporal classification: Extending the classification paradigm to
multivariate time series. University of New South Wales, 2002.
[45] S. K. Esser, R. Appuswamy, P. Merolla, J. V. Arthur, and D. S. Modha, “Backpropagation for energy-efficient neuromorphic computing,” in Advances in Neural Information
Processing Systems, 2015, pp. 1117–1125.
[46] S. B. Shrestha and G. Orchard, “Slayer: Spike layer error reassignment in time,” in
Advances in Neural Information Processing Systems, 2018, pp. 1412–1421.
[47] E. O. Neftci, H. Mostafa, and F. Zenke, “Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural
networks,” IEEE Signal Processing Magazine, vol. 36, no. 6, pp. 51–63, 2019.
[48] F. Zenke and S. Ganguli, “Superspike: Supervised learning in multilayer spiking neural
networks,” Neural computation, vol. 30, no. 6, pp. 1514–1541, 2018.

93

[49] P. Gu, R. Xiao, G. Pan, and H. Tang, “Stca: spatio-temporal credit assignment with
delayed feedback in deep spiking neural networks,” in Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 2019, pp. 1366–1372.
[50] A. D. Back and A. C. Tsoi, “Fir and iir synapses, a new neural network architecture
for time series modeling,” Neural computation, vol. 3, no. 3, pp. 375–385, 1991.
[51] P. Campolucci, A. Uncini, F. Piazza, and B. D. Rao, “On-line learning algorithms
for locally recurrent neural networks,” IEEE transactions on neural networks, vol. 10,
no. 2, pp. 253–271, 1999.
[52] M. H. Hennig, “Theoretical models of synaptic short term plasticity,” Frontiers in
computational neuroscience, vol. 7, p. 45, 2013.
[53] C. Stevens and A. Zador, “When is an integrate-and-fire neuron like a poisson neuron?”
Advances in neural information processing systems, vol. 8, pp. 103–109, 1995.
[54] A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. Di Nolfo, T. Nayak, A. Andreopoulos, G. Garreau, M. Mendoza et al., “A low power, fully event-based gesture
recognition system,” in Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 2017, pp. 7243–7252.
[55] J. Kaiser, H. Mostafa, and E. Neftci, “Synaptic plasticity dynamics for deep continuous
local learning,” arXiv preprint arXiv:1811.10766, 2018.
[56] J. Kaiser, A. Friedrich, J. Tieck, D. Reichard, A. Roennau, E. Neftci, and R. Dillmann,
“Embodied event-driven random backpropagation,” arXiv preprint arXiv:1904.04805,
2019.
[57] M. Abdollahi and S.-C. Liu, “Speaker-independent isolated digit recognition using
an aer silicon cochlea,” in 2011 IEEE Biomedical Circuits and Systems Conference
(BioCAS). IEEE, 2011, pp. 269–272.
94

[58] F. Karim, S. Majumdar, H. Darabi, and S. Harford, “Multivariate lstm-fcns for time
series classification,” Neural Networks, vol. 116, pp. 237–245, 2019.
[59] G. Yuan, C. Ding, R. Cai, X. Ma, Z. Zhao, A. Ren, B. Yuan, and Y. Wang, “Memristor
crossbar-based ultra-efficient next-generation baseband processors,” in 2017 IEEE 60th
International Midwest Symposium on Circuits and Systems (MWSCAS). IEEE, 2017,
pp. 1121–1124.
[60] X. Ma, G. Yuan, S. Lin, C. Ding, F. Yu, T. Liu, W. Wen, X. Chen, and Y. Wang,
“Tiny but accurate: A pruned, quantized and optimized memristor crossbar framework
for ultra efficient dnn implementation,” in 2020 25th Asia and South Pacific Design
Automation Conference (ASP-DAC), 2020, pp. 301–306.
[61] G. Yuan, X. Ma, C. Ding, S. Lin, T. Zhang, Z. S. Jalali, Y. Zhao, L. Jiang, S. Soundarajan, and Y. Wang, “An ultra-efficient memristor-based dnn framework with structured
weight pruning and quantization using admm,” in 2019 IEEE/ACM International
Symposium on Low Power Electronics and Design (ISLPED), 2019, pp. 1–6.
[62] H. Fang, A. Shrestha, Z. Zhao, Y. Wang, and Q. Qiu, “A general framework to map
neural networks onto neuromorphic processor,” in 20th International Symposium on
Quality Electronic Design (ISQED). IEEE, 2019, pp. 20–25.
[63] T. Liu, Z. Liu, F. Lin, Y. Jin, G. Quan, and W. Wen, “Mt-spike: A multilayer timebased spiking neuromorphic architecture with temporal error backpropagation,” in
Proceedings of the 36th International Conference on Computer-Aided Design.

IEEE

Press, 2017, pp. 450–457.
[64] D. Huh and T. J. Sejnowski, “Gradient descent for spiking neural networks,” in Advances in Neural Information Processing Systems, 2018, pp. 1433–1443.

95

[65] Q. Yu, H. Li, and K. C. Tan, “Spike timing or rate? neurons learn to make decisions for
both through threshold-driven plasticity,” IEEE transactions on cybernetics, vol. 49,
no. 6, pp. 2178–2189, 2018.
[66] S. Rotter and M. Diesmann, “Exact digital simulation of time-invariant linear systems
with applications to neuronal modeling,” Biological cybernetics, vol. 81, no. 5-6, pp.
381–402, 1999.
[67] S. Yarrow and P. Seriès, “The influence of population size, noise strength and behavioral task on best-encoded stimulus for neurons with unimodal or monotonic tuning
curves,” Frontiers in computational neuroscience, vol. 9, p. 18, 2015.
[68] A. Bagnall, H. A. Dau, J. Lines, M. Flynn, J. Large, A. Bostrom, P. Southam, and
E. Keogh, “The uea multivariate time series classification archive, 2018,” arXiv preprint
arXiv:1811.00075, 2018.
[69] X. Zhang, Y. Gao, J. Lin, and C.-T. Lu, “Tapnet: Multivariate time series classificationwith attentional prototype network,” in AAAI 2020 : The Thirty-Fourth AAAI
Conference on Artificial Intelligence, 2020.
[70] P. Schäfer and U. Leser, “Multivariate time series classification with weasel+ muse,”
arXiv preprint arXiv:1711.11343, 2017.
[71] M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, S. H. Choday, G. Dimou,
P. Joshi, N. Imam, S. Jain et al., “Loihi: A neuromorphic manycore processor with
on-chip learning,” IEEE Micro, vol. 38, no. 1, pp. 82–99, 2018.
[72] C. Zhao, J. Li, and Y. Yi, “Making neural encoding robust and energy efficient: an advanced analog temporal encoder for brain-inspired computing systems,” in Proceedings
of the 35th International Conference on Computer-Aided Design, 2016, pp. 1–6.

96

[73] C. Zhao, Y. Yi, J. Li, X. Fu, and L. Liu, “Interspike-interval-based analog spike-timedependent encoder for neuromorphic processors,” IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 25, no. 8, pp. 2193–2205, 2017.
[74] A. A. Lazar and L. T. Tóth, “Time encoding and perfect recovery of bandlimited
signals,” in 2003 IEEE International Conference on Acoustics, Speech, and Signal
Processing, 2003. Proceedings.(ICASSP’03)., vol. 6. IEEE, 2003, pp. VI–709.
[75] D. Wei, “Time based analog to digital converters,” Ph.D. dissertation, Department of
Electrical and Computer Engineering, University of Florida, Gainesville, FL, 2005.
[76] S. Panzeri, J. H. Macke, J. Gross, and C. Kayser, “Neural population coding: combining insights from microscopic and mass signals,” Trends in cognitive sciences, vol. 19,
no. 3, pp. 162–172, 2015.
[77] M. Shamir and H. Sompolinsky, “Implications of neuronal diversity on population
coding,” Neural computation, vol. 18, no. 8, pp. 1951–1986, 2006.
[78] J. Köhn and F. Wörgötter, “Employing the z-transform to optimize the calculation of
the synaptic conductance of nmda and other synaptic channels in network simulations,”
Neural Computation, vol. 10, no. 7, pp. 1639–1651, 1998.
[79] J. Duarte, S. Han, P. Harris, S. Jindariani, E. Kreinar, B. Kreis, J. Ngadiuba,
M. Pierini, R. Rivera, N. Tran et al., “Fast inference of deep neural networks in fpgas
for particle physics,” Journal of Instrumentation, vol. 13, no. 07, p. P07027, 2018.
[80] J. G. Proakis and D. K. Manolakis, Digital Signal Processing (4th Edition).

USA:

Prentice-Hall, Inc., 2006.
[81] P. Li, P. Zhang, L.-N. Pouchet, and J. Cong, “Resource-aware throughput optimization for high-level synthesis,” in Proceedings of the 2015 ACM/SIGDA International
Symposium on Field-Programmable Gate Arrays, 2015, pp. 200–209.
97

[82] E. Stromatias, D. Neil, F. Galluppi, M. Pfeiffer, S.-C. Liu, and S. Furber, “Scalable
energy-efficient, low-latency implementations of trained spiking deep belief networks
on spinnaker,” in 2015 International Joint Conference on Neural Networks (IJCNN).
IEEE, 2015, pp. 1–8.
[83] D. Neil and S.-C. Liu, “Minitaur, an event-driven fpga-based spiking network accelerator,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 22,
no. 12, pp. 2621–2628, 2014.
[84] J. J. Yang, D. B. Strukov, and D. R. Stewart, “Memristive devices for computing,”
Nature nanotechnology, vol. 8, no. 1, pp. 13–24, 2013.
[85] C. Liu, B. Yan, C. Yang, L. Song, Z. Li, B. Liu, Y. Chen, H. Li, Q. Wu, and H. Jiang, “A
spiking neuromorphic design with resistive crossbar,” in 2015 52nd ACM/EDAC/IEEE
Design Automation Conference (DAC). IEEE, 2015, pp. 1–6.
[86] Y. Wang, T. Tang, L. Xia, B. Li, P. Gu, H. Yang, H. Li, and Y. Xie, “Energy efficient
rram spiking neural network for real time classification,” in Proceedings of the 25th
edition on Great Lakes Symposium on VLSI, 2015, pp. 189–194.
[87] S. Schmitt, J. Klähn, G. Bellec, A. Grübl, M. Guettler, A. Hartel, S. Hartmann, D. Husmann, K. Husmann, S. Jeltsch et al., “Neuromorphic hardware in the loop: Training
a deep spiking network on the brainscales wafer-scale system,” in 2017 International
Joint Conference on Neural Networks (IJCNN). IEEE, 2017, pp. 2227–2234.
[88] I. M. Park, S. Seth, A. R. Paiva, L. Li, and J. C. Principe, “Kernel methods on spike
train space for neuroscience: a tutorial,” IEEE Signal Processing Magazine, vol. 30,
no. 4, pp. 149–160, 2013.
[89] C. Liu, Q. Yang, B. Yan, J. Yang, X. Du, W. Zhu, H. Jiang, Q. Wu, M. Barnell, and
H. Li, “A memristor crossbar based computing engine optimized for high speed and

98

accuracy,” in 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).
IEEE, 2016, pp. 110–115.
[90] B. Cramer, Y. Stradmann, J. Schemmel, and F. Zenke, “The heidelberg spiking data
sets for the systematic evaluation of spiking neural networks,” IEEE Transactions on
Neural Networks and Learning Systems, 2020.
[91] D. Neil, M. Pfeiffer, and S.-C. Liu, “Phased lstm: Accelerating recurrent network training for long or event-based sequences,” in Advances in neural information processing
systems, 2016, pp. 3882–3890.
[92] D. Neil and S.-C. Liu, “Effective sensor fusion with event-based sensors and deep network architectures,” in 2016 IEEE International Symposium on Circuits and Systems
(ISCAS). IEEE, 2016, pp. 2282–2285.
[93] Y. Bi, A. Chadha, A. Abbas, E. Bourtsoulatze, and Y. Andreopoulos, “Graph-based
object classification for neuromorphic vision sensing,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 491–501.
[94] F. Zenke and T. P. Vogels, “The remarkable robustness of surrogate gradient learning
for instilling complex function in spiking neural networks,” BioRxiv, 2020.
[95] R. Vaila, J. Chiasson, and V. Saxena, “Deep convolutional spiking neural networks for
image classification,” arXiv preprint arXiv:1903.12272, 2019.
[96] B. Yin,

F. Corradi,

and S. M. Bohté,

“Effective and efficient computa-

tion with multiple-timescale spiking recurrent neural networks,” arXiv preprint
arXiv:2005.11633, 2020.
[97] L. R. Iyer, Y. Chua, and H. Li, “Is neuromorphic mnist neuromorphic? analyzing the
discriminative power of neuromorphic datasets in the time domain,” arXiv preprint
arXiv:1807.01013, 2018.
99

[98] M. Hu, H. Li, Y. Chen, Q. Wu, G. S. Rose, and R. W. Linderman, “Memristor crossbarbased neuromorphic computing system: A case study,” IEEE transactions on neural
networks and learning systems, vol. 25, no. 10, pp. 1864–1878, 2014.
[99] M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves, S. Lam, N. Ge,
J. J. Yang, and R. S. Williams, “Dot-product engine for neuromorphic computing:
Programming 1t1m crossbar to accelerate matrix-vector multiplication,” in 2016 53nd
ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 2016, pp. 1–6.
[100] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in
Proceedings of the IEEE conference on computer vision and pattern recognition, 2016,
pp. 770–778.
[101] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep
convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–
90, 2017.
[102] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[103] E. Hunsberger and C. Eliasmith, “Spiking deep networks with lif neurons,” arXiv
preprint arXiv:1510.08829, 2015.
[104] Y. Ji, Y. Zhang, S. Li, P. Chi, C. Jiang, P. Qu, Y. Xie, and W. Chen, “Neutrams: Neural network transformation and co-design under neuromorphic hardware constraints,”
in Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 2016, pp. 1–13.
[105] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural
networks with pruning, trained quantization and huffman coding,” arXiv preprint
arXiv:1510.00149, 2015.

100

[106] A. Ren, T. Zhang, S. Ye, J. Li, W. Xu, X. Qian, X. Lin, and Y. Wang, “Admmnn: An algorithm-hardware co-design framework of dnns using alternating direction
methods of multipliers,” in Proceedings of the Twenty-Fourth International Conference
on Architectural Support for Programming Languages and Operating Systems, 2019, pp.
925–938.

101

Vita
Haowen Fang received the B.S. degree in Telecommunication Engineering in 2013 from
Heilongjiang University, and the M.S. degree from the Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY, USA in 2018. He completed his PhD. degree in the Department of Electrical Engineering and Computer Science
of Syracuse University in 2021. His research interests include deep learning, neuromorphic computing, neural network acceleration, and FPGA.

102

