Boosting Throughput and Efficiency of Hardware Spiking Neural
  Accelerators using Time Compression Supporting Multiple Spike Codes by Xu, Changqing et al.
Boosting Throughput and Efficiency of Hardware Spiking Neural Accelerators
using Time Compression Supporting Multiple Spike Codes
Changqing Xu1, Wenrui Zhang2, Yu Liu3, Peng Li4∗
1,2,4Department of Electrical & Computer Engineering, University of California, Santa Barbara
3Department of Electrical & Computer Engineering, Texas A&M University
1changqingxu1020@163.com, 2wenruizhang@ucsb.edu, 4lip@ucsb.edu
Abstract
Spiking neural networks (SNNs) are the third generation of
neural networks and can explore both rate and temporal cod-
ing for energy-efficient event-driven computation. However,
the decision accuracy of existing SNN designs is contingent
upon processing a large number of spikes over a long pe-
riod. Nevertheless, the switching power of SNN hardware ac-
celerators is proportional to the number of spikes processed
while the length of spike trains limits throughput and static
power efficiency. This paper presents the first study on devel-
oping temporal compression to significantly boost throughput
and reduce energy dissipation of digital hardware SNN accel-
erators while being applicable to multiple spike codes. The
proposed compression architectures consist of low-cost input
spike compression units, novel input-and-output-weighted
spiking neurons, and reconfigurable time constant scaling to
support large and flexible time compression ratios. Our com-
pression architectures can be transparently applied to any
given pre-designed SNNs employing either rate or tempo-
ral codes while incurring minimal modification of the neu-
ral models, learning algorithms, and hardware design. Using
spiking speech and image recognition datasets, we demon-
strate the feasibility of supporting large time compression
ratios of up to 16×, delivering up to 15.93×, 13.88×, and
86.21× improvements in throughput, energy dissipation, the
tradeoffs between hardware area, runtime, energy, and classi-
fication accuracy, respectively based on different spike codes
on a Xilinx Zynq-7000 FPGA. These results are achieved
while incurring little extra hardware overhead.
Introduction
Spiking neural networks (SNNs) closely emulate the spiking
behaviors of biological brains (Ponulak and others 2011).
Moreover, the event-driven nature of SNNs offer potentials
in achieving great computational/energy efficiency on hard-
ware neuromorphic computing systems (Merolla and others
2014; Furber and others 2014). For instance, processing a
single spike may only consume a few pJ of energy on recent
neuromorphic chips such as IBMs TrueNorth (Merolla and
others 2014) and Intels Loihi (Davies and others 2018).
SNNs support various rate/temporal spike codes among
which rate coding using Poisson spike trains is popular.
∗Corresponding Email: lip@ucsb.edu
However, in that case, the low-power advantage of SNNs
may be offset by long latency during which many spikes
are processed for ensuring decision accuracy. Various tem-
poral codes have been attempted to improve the efficiency
of information representation (Thorpe and others 2001;
Kayser and others 2009; Kim and others 2018; Thorpe and
others 1990; Izhikevich and others 2002). The time-to-first-
spike coding encodes information using arrival time of the
first spike (Thorpe and others 2001). Phase coding (Kayser
and others 2009) encodes information in a spike by its phase
relative to a periodic reference signal (Kim and others 2018).
No coding is considered universally optimal thus far. The
achievable latency/spike reduction of a particular code can
vary widely with network structure and application.
Time Compression
Neural Computation on a Faster Time Scale
Figure 1: Proposed general time compression for SNNs.
Rather than advocating a particular code, for the first time,
we focus on an orthogonal problem: temporal compression
applicable to any given SNN (accelerator) and spike code
to boost throughput and energy efficiency. We propose a
general compression technique that preserves both the spike
count and temporal characteristics of the original SNN with
low information loss, as shown in Fig. 1 It transparently
compresses duration of the spike trains, hence classifica-
tion latency, on top of an existing rate/temporal code. More
broadly, this work extends the notion of weight/model prun-
ing/compression of DNN accelerators from the spatial do-
main to the temporal domain.
The contributions of this paper include: 1) the first gen-
eral time-compression technique transparently compressing
spike train duration of a given SNN and achieving large la-
tency reduction on top of the spike codes that come with the
ar
X
iv
:1
90
9.
04
75
7v
1 
 [c
s.N
E]
  1
0 S
ep
 20
19
SNN, 2) facilitating the proposed time compression by four
key ideas: spike train compression using a weighted rep-
resentation, a new family of input-output-weighted (IOW)
spiking neural models for processing time-compressed spike
trains for multiple spike codes, scaling of time constants
defining neural, synaptic, and learning dynamics, and low-
cost support of flexible compression ratios (powers of two
or not) using time averaging, 3) low-overhead hardware
modifications of a given SNN accelerator to operate it on
a compressed time scale while preserving the spike counts
and temporal behaviors in inference and training, 4) a time-
compressed SNN (TC-SNN) accelerator architecture and
its programmable variant (PTC-SNN) operating on a wide
range of (programmable) compression ratios and achieving
significantly improved latency, energy efficiency, and trade-
offs between latency/energy/classification accuracy.
We demonstrate the proposed TC-SNN and PTC-SNN
compression architectures by realizing several liquid-state
machine (LSM) spiking neural accelerators with a time com-
pression ratio up to 16:1 on a Xilinx Zynq-7000 FPGA. Us-
ing the TI46 Speech Corpus (Liberman and others 1991),
the CityScape image recognition dataset (Cordts and oth-
ers 2016), and N-TIDIGITS18 dataset (Anumula and oth-
ers 2018), we demonstrate the feasibility of supporting large
time compression ratios of up to 16×, delivering up to
15.93×, 13.88×, and 86.21× improvements in through-
put, energy dissipation, the tradeoffs between hardware
area, runtime, energy, and classification accuracy, respec-
tively based on various spike coding mechanisms including
burst coding (Park and others 2019) on a Xilinx Zynq-7000
FPGA. These results are achieved while incurring little extra
hardware overhead.
Proposed Time-Compressed Neural
Computation
This work aims to enable time-compressed neural compu-
tation that preserves the spike counts and temporal behav-
iors in inference and training of a given SNN while sig-
nificantly improving latency, energy efficiency, and trade-
offs between latency/energy/classification accuracy. We de-
velop four techniques for this objective: 1) spike train
compression using a weighted representation, 2) a new fam-
ily of input-output-weighted (IOW) spiking neural models
processing time-compressed spike trains for multiple spike
codes, 3) scaling of time constants of neural, synaptic, and
learning dynamics, and 4) low-cost support of flexible com-
pression ratios (powers of two or not) using time averaging.
Spike Train Compression in Weighted Form
We time-compress a given spiking neural network first by
shrinking the duration of the input spike trains. To sup-
port large compression ratios hence significant latency re-
ductions, we represent the compressed input trains using an
weighted form. Typical binary spike trains with temporal
sparsity may be time-compressed into another binary spike
train of a shorter duration. However, as shown in Fig. 2,
the spike count and temporal characteristics of the uncom-
pressed train can only be preserved under a small compres-
sion ratio bound by the minimal interspike interval. More
aggressive compression would lead to merging multiple ad-
jacent spikes into a single spike, resulting in significant al-
terations of firing count and temporally coded information.
This severely limits the amount of compression possible.
Instead, we propose a new weighted form for represent-
ing compressed spike trains, where multiple adjacent binary
spikes are compressed into a single weighted spike with a
weight value equal to the number of binary spikes combined,
allowing preservation of spike information even under very
large compression ratios (Fig. 2).
Raw Spike Train without Time 
Compression
Compressed Spike 
Train(Binary form)
Compressed Spike 
Train(Weighted form)
Length -> 1/γ 
Lose spike information
Raw Spike Train without Time 
Compression
t
γΔt
Δtc
tc
t
γΔt
Length -> 1/γ 
Preserve spike information
Δtc tc
2
3
1
3
2
Compression 
ratio: γ
Figure 2: Binary vs. (compressed) weighted spike trains.
Input-Output-Weighted (IOW) Spiking Neurons
As such, each spiking neuron would process the received in-
put spike trains in the weighted form. Furthermore, as shown
in Fig. 3, under large compression ratios the membrane po-
tential of a spiking neuron may rise high above the firing
threshold voltage within a single time step as a result of re-
ceiving input spikes with large weights. In this case, out-
putting spike trains in the standard binary form can lead to
significant loss of input formation, translating into large per-
formance loss as we demonstrate in our experimental results.
Instead, we propose a new family of input-output-weighted
(IOW) spiking neural models which take the input spike
trains in the weighted form and produce the output spike
train in the same weighted form, where the multi-bit weight
value of each output spike reflects the amplitude of the mem-
brane potential as a multiple of the firing threshold. Spiking
neuronal models such as the leaky integrate-and-fire (LIF)
model and other models supporting various spike codes can
be converted to their IOW counterpart with streamlined low-
overhead modification as detailed later.
Input Spike Train
(Weighted form)
Lose input 
formation
Preserve input 
formation
 by mutil-bit weight
Output Spike Train
Time Step
Spiking Neuron (IOW)
t
Vmem
2Vth
3Vth
 Output Spike Train
1
2
3
Standard Spiking 
Neuron (Binary Form)
Vth
Figure 3: Binary vs. weighted output spikes.
Scaling of Time Constants of SNN Dynamics
The proposed compression is general in the sense that it
intends to preserve the spike counts and temporal behav-
iors in the neural dynamics, synaptic responses, and dy-
namics employed in the given SNN such that no substan-
tial alterations are introduced by compression other than
that the time-compressed SNN just effectively operates on
a faster time scale. The dynamics of the cell membrane is
typically specified by a membrane time constant τm, which
controls the process of action potential (spike) generation
and influences the information processing of each spiking
neuron (Gerstner and others 2002). Synaptic models also
play an important role in an SNN and may be specified by
one or multiple time constants, translating received spike
inputs into a continuous synaptic current waveform based
on the dynamics of a particular order (Gerstner and others
2002). Finally, Spike traces or temporal variables filtered
with a specific time constant may be used to implement
spike-dependent learning rules (Thorpe and others 2001;
Zhang and others 2015).
Maintaining the key spiking/temporal characteristics in
the neural, synaptic, and learning processes is favorable be-
cause: 1) the SNNs with time compression essentially attains
pretty much the same dynamic behavior like before such that
the classification performance would be also similar to the
one under no time compression, i.e. no large performance
degradation is expected when employing time compression;
2) the deployed learning rules need no modification and the
same rules can effectively train the SNNs with time com-
pression. Attaining the above goal entails proper scaling of
the time constants associated with these processes as a func-
tion of the time compression ratio as shown in Fig. 4.
Neuron Element
Synapse 
Model
Input Spike Train
ISP(t)
Neuron
Learning Rule
V
m
Output Spike 
Train
Ccal
t
τc
Vm
t
Vth
τm=RC
ISP
t
τs2τs1
Compression 
ratio:γ
tc
τs1,c
τs2,c
τm,c
τc,cfdecay(tc,τc)
Δtc=γΔt
Time 
Compression 
t
fdecay(t,τ)
γΔt
τs1
τs2
τm
τc
No Time 
Compression 
Figure 4: Scaling of time constants of SNN dynamics.
Without loss of generality, consider a decaying first order
dynamics x˙(t) = −x(t)/τ with time constant τ . For digital
hardware implementation, forward Euler discretization may
be adopted to discretize the dynamics over time:
X(t+ ∆t) = X(t)
(
1− ∆tτ
)
= X(t)
(
1− 1τnom
)
(1)
where ∆t is the discretization time stepsize and τnom =
τ/∆t is the normalized time constant used in digital hard-
ware implementation. Now denote the target time compres-
sion ratio by γ (γ ≥ 1). The discretization stepsize with
time compression is: ∆tc = γ∆t, i.e. one time step of the
time-compressed SNN equals to γ time steps of the uncom-
pressed SNN. Based on (1), discretizing the first order dy-
namics with time compression for one step gives:
X(t+∆tc) = X(t)
(
1− 1
τnom,c
)
= X(t)
(
1− 1
τnom
)γ
,
(2)
where τnom,c is the normalized time constant with compres-
sion. Linearly scaling τnom,c by τnom,c= τnomγ is equivalent
to: X(t + ∆tc)≈X(t)
(
1− 1τnom/γ
)
, which produces large
errors when γ  1. Instead, we get an accurate τnom,c value
according to: τnom,c = 11−(1− 1τnom )
γ .
Flexible Compression Ratios using Time Averaging
Digital multipliers and dividers are costly in area and power
dissipation. Normalized time constants in a digital SNN
hardware accelerator are typically set to a power of 2, i.e.
τnom = 2
K such that the dynamics can be efficiently im-
plemented by a shifter rather than expensive multipliers and
dividers (Zhang and others 2015). However, it may be de-
sirable to choose a compression ratio and/or scale each time
constant continuously in a wide integer range, e.g. within
{1, 2, 3, ..., 16}. In this case, each scaled normalized time
constant τnom,c may not be a power of 2. For example, when
τnom,c = 10, τnom,c is far away from its two nearest powers
of 2, namely 8 and 16. Setting τnom,c to either of the two
would lead to large errors.
We propose a novel time averaging approach to address
the above problem (Fig. 5). For a given scaled normal-
ized τnom,c , we find its two adjacent powers of 2: 2K2 ≤
τnom,c ≤ 2K1 . We decay the targeted first order dynamics
by toggling its scaled normalized time constant between two
values: 2K2 and 2K1 . Since each of them is a power of two,
the corresponding decaying behavior can be efficiently real-
ized using a shifter. The usage frequencies of 2K2 and 2K1
are properly chosen such the time-averaged time constant
is equal to the desired τnom,c. Fig. 5 shows how the time-
averaged (normalized) time constant value of 5 is achieved
by averaging between two compression ratios 4 and 8.
1 2 3 4
Time-average time constant(5)
5 6 7
2
4
8
0
6
0
T
im
e
 c
on
st
an
t
8
N1=3
Figure 5: Time-averaged time constants: the realized aver-
aged time constant is 5.
Proposed Input-and-Output Weighted (IOW)
Spiking Neural Models
Any given spiking neural model can be converted into its
input-and-output (IOW) counterpart based on straightfor-
ward low-overhead modifications. Without loss of gener-
ality, we consider conversion of two models: the standard
leaky integrate-and-fire (LIF) neuron model, which has been
widely used in many SNNs including ones based on rating
coding, and one of its variants for supporting burst coding.
IOW Neurons based on Standard LIF Model
The LIF model dynamics is (Gerstner and others 2002):
τm
du
dt
= −u(t) +RI(t), (3)
where u(t) is the membrane potential, τm=RC is the mem-
brane time constant, and I(t) is the total received post-
synaptic current given by:
I(t) =
∑
i
wi
∑
f
α(t− t(f)i ), (4)
where wi is the synaptic weight from the pre-synaptic neu-
ron i, α(t) = qτs exp
(
− tτs
)
H(t) for a first order synap-
tic model with time constant τs, H(t) is the Heaviside step
function, and q is the total charge injected into the post-
synaptic neuron through a synapse of a weight of 1. In this
work, we adopt a somewhat more complex second order
model for improved performance.
Once the membrane potential reaches the firing threshold
uth, an output spike is generated and the membrane potential
is reset according to:
lim
δ−>0;δ>0
u(t(f) + δ) = u(t(f))− uth, (5)
where t(f) is the firing time.
IOW LIF neurons shall process weighted input spikes be-
cause of time compression with the modified synaptic input:
I(t) =
∑
i
wi
∑
f
ωfspike,iα(t− t(f)i ), (6)
where a weight ωfspike,i is introduced for each input spike.
IOW LIF neurons shall also generate weighted output
spikes. According to Fig. 3, we introduce a set of firing
thresholds {uth, 2uth, ... ,nuth} with each being a multiple
of the original threshold uth. At each time step t, an output
spike is generated whenever the membrane potential reaches
above any firing threshold from the set and the weight of the
output spike is determined by the actual threshold crossed.
For example, when kuth ≤ u(t) < (k + 1)uth, the output
spike weight is set to k. Upon firing, the membrane potential
is reset according to:
lim
δ−>0
δ>0
u(t(f)+δ) =

u(t(f))− uth, uth ≤ u(t(f)) < uth
u(t(f))− 2uth, 2uth ≤ u(t(f)) < 3uth
... ...
u(t(f))− nuth, u(t(f)) ≥ nuth
(7)
IOW Neurons based on Bursting LIF Model
The LIF model for burst coding is also based on (3) (Park
and others 2019). A bursting function gi(t) is introduced to
implement the bursting behavior per each presynaptic neu-
ron i (Park and others 2019):
gi(t) =
{
βgi(t−∆t),
1,
if Ei(t−∆t) = 1
otherwise (8)
where β is a burst constant, Ei(t − ∆t) = 1 if the presy-
naptic neuron i fired at the previous time step and other-
wise Ei(t − ∆t) = 0. We assume a zero-th order synap-
tic response model. Per input spikes from the presynaptic
neuron i, the firing threshold voltage is modified from uth
to gi(t)uth and the corresponding reset characteristic of the
membrane potential after firing is:
lim
δ−>0;δ>0
u(t(f) + δ) = u(t(f))− gi(t(f))uth. (9)
Furthermore, the total post-synaptic current is:
I(t) =
∑
i
wi
∑
f
gi(t)α(t− t(f)i ). (10)
To implement the IOW version of the LIF model with
burst coding, we modify the burst function to:
gi(t) =
{
βωspike,i(t)g(t−∆t),
1,
if Ei(t−∆t) = 1
otherwise
(11)
Similar to the case of the IOW LIF model, we use a set
of firing thresholds to determine the weight of each output
spike and a behavior similar to (7) for reset. The only dif-
ference here is that the adopted set of firing thresholds are
gi(t)uth, 2gi(t)uth, · · · ,ngi(t)uth .
Time-Compressed SNN Accelerator
Architectures
The proposed time compression technique can be em-
ployed to support a fixed time compression ratio or
user-programmable time compression ratio, leading to the
time-compressed SNN (TC-SNN) and programmable time-
compressed SNN (PTC-SNN) architectures, respectively.
We describe the more general PTC-SNN architecture shown
in Fig. 6. It can be adopted for any pre-designed SNN hard-
ware accelerator for added programmable time compression.
PTC-SNN introduces three streamlined additions and minor
modifications to the embedded SNN accelerator to enable
application and coding independent time compression.
Based on the discussions presented in Section 2, firstly, a
set of input-spike compression units (ISCUs), one for each
input spike channel, are incorporated into the input layer of
the SNN. ISCUs convert the raw binary input spike trains
into the more compact weighted form with shortened time
duration. A user-specified command sets the time compres-
sion ratio of all ISCUs through the Global Compression
Controller. ISCUs compress the given spike channels with-
out assuming sparsity of the input spike trains and can sup-
port large compression ratios. Secondly, we introduce mod-
est added hardware overhead to replace all original silicon
ISCU
ISCU
ISCU
IOW-NE
IOW-NE
IOW-NE
OE
OE
OE
Global 
Compression 
Controller
Uncompressed 
Spike Input 
 
Compression
Ratio
Command
Weighted 
Input Proc.
SP
Update
Vm
Update
LUT
(Time Constant Scaling) R
N
G Learning
Unit
Compression 
Ratio
Win
Output LayerHidden LayerHidden LayerInput Layer
Input 
Weight 
Calculation
Sin
... ... ... ...
Time Constant Control
Sin
τc
τs τm
IOW-NE
IOW-NE
IOW-NE
Weighted 
Spike 
Generation
S
IP
O ωspike 
Figure 6: Proposed time-compressed SNN architecture with
programmable compression ratio (PTC-SNN). ISCU: in-
put spike compression unit; SIPO: serial-in and parallel-
out; IOW-NE: input-output-weighted spiking neuron el-
ement; SP: synapse response; NE: regular binary-input-
output neuron element; Vm: membrane potential. The LUT
enables programmable scaling of time constants of the neu-
ron/synaptic models and the learning unit.
spiking neurons by their input-output-weighted neuron ele-
ments (IOW-NEs). Finally, all time constants in the SNN are
scaled based on the time compression ratio. While an SNN
may employ a large number of time constants, they can be
all scaled in the same way, allowing use of one common
simple programmable logic unit, i.e. the Global Compres-
sion Controller for scaling all time constants according to a
user-specified compression ratio command.
[Input Spike Compression Unit (ISCU)] Each input
spike channel is compressed by one low-cost ISCU accord-
ing to the user-specified compression ratio Ncmp. When
each uncompressed spike input channel is fed by a single
binary serial input, a demultiplexer is utilized in the ISCU to
perform the reconfigurable serial-in and parallel-out (SIPO)
operation to convert the serial input into Ncmp parallel out-
puts, as shown in Fig. 7(a). If the input spike channel is sup-
plied by parallel spike data, the SIPO operation is skipped.
During each clock cycle, the Ncmp bits of the parallel out-
puts are added by an adder, which effectively combines these
spikes into a single weighted spike with a weight value set
by the output of the adder. No spike count loss is resulted as
the sum of spike weights is same as the total number of bi-
nary spikes in the raw spike input train. The global temporal
spike distribution of the input spike train is preserved up to
the temporal resolution of the compressed spike train.
[Input-Output-Weighted (IOW) Neuron Elements] We
discuss efficient hardware realization of the IOW spiking
neural models (Section 3). The IOW neuron element (IOW-
NE) is shown in Fig. 7(b), which consist of a synaptic unit
(SU), a neural unit (NU), and a time constant configuration
module, described later. SU realizes a discretized version
of (6). As in many practical implementations of hardware
SNNs, each ωi is constrained to be in the form of 2K . The
product of ωspike,i · ωi is efficiently realized by left shifting
ωspike,ki by K bits. NU performs membrane potential u(t)
D
E
M
U
X
1011000101000010
Input spike train
1000
0010
1001
1100
+ 3111D   Q
4 Clk
ISCU
Compressed 
spike train
Clk
Weighted 
Spikes 
<<Ksp
SP
>>KS
ωspike
>>Km
nuth~uth
Compression 
ratio
SU
NU
IOW-NE
T
im
e
 C
o
n
st
an
t 
C
o
nf
ig
. 
L
U
T
ωi
W
ei
g
h
te
d
 
O
u
tp
u
t 
S
p
ik
e
s
 
C
m
p
Vm
-
+
+
+
(a) (b)
-
+
+
+
Figure 7: (a) ISCU with 4:1 time compression, and (b) LIF
IOW neuron: SU - synaptic unit, NU - neural unit.
update based on discretization of (3) and reset behavior (7).
NU generates a weighted output spike when u(t) is above
certain threshold in the firing threshold set uth, 2uth, · · ·.
The design of IOW LIF neurons with burst coding is al-
most identical to that of the IOW LIF neurons except for
the following differences. We add a LUT to store the set of
firing thresholds {gi(t)uth, 2gi(t)uth, · · ·}, which are cal-
culated based on (11). Because gi(t)uth might not be in
the form of 2K , a multiplier is used to compute the prod-
uct g(t) · uth · ωi · ωspike,i.
Experimental Evaluations
The proposed time-compressed SNN (TC-SNN) architec-
ture with a fixed compression ratio and the more general pro-
grammable PTC-SNN architecture with user-programmable
compression ratio can be adopted to re-design any given dig-
ital SNN accelerator to a time-compressed SNN accelerator
with low additional design overhead in a highly streamlined
manner. For demonstration purpose, we show how an ex-
isting liquid state machine (LSM) SNN accelerator can be
re-designed to a TC-SNN and PTC-SNN on a Xilinx Zynq-
7000 FPGA. The LSM is a recurrent spiking neural net-
work model. With its spatio-temporal computing power, it
has demonstrated promising performances for various ap-
plications (Maass and others 2002).
Three speech/image recognition datasets are adopted for
benchmarking. The first dataset is a subset of the TI46
speech corpus (Liberman and others 1991) and consists
of 260 isolated spoken English letters recorded by a sin-
gle speaker. The time domain speech examples are pre-
processed by the Lyons passive ear model (Lyon and oth-
ers 1982) and transformed to 78 channel spike trains using
the BSA spike encoding algorithm (Schrauwen and others
2003). The second one is the CityScape dataset (Cordts and
others 2016) which contains 18 classes of 1,080 images of
semantic urban scenes taken in several European cities. Each
image is segmented and remapped into a size of 15 × 15,
are then converted to 225 Poisson spike trains with the mean
firing rate proportional to the corresponding pixel intensity.
The third one is a subset of N-TIDIGITS18 speech dataset
(Anumula and others 2018) which is obtained by playing the
audio files from the TIDIGITS dataset to a CochleaAMS1b
sensor. This dataset contains 10 classes of single digits (the
digits 0 to 9). There are 111 male and 114 female speakers
in the dataset and 2,250 training and 2,250 testing examples.
For the first two datasets, we adopt 80% examples for train-
Table 1: Comparison of the baseline and TC-SNN accelerators with IW/IOW LIF neurons based on TI46 Speech Corpus.
Compres-
sion ratio
Neuron
model Accuracy LUT FF
Power(W)
@50MHz
Runtime(s)
(Normalized
Runtime)
Runtime
Speedup
Energy(J)
(Normalized
Energy)
Energy
reduction
ratio
Normalized
ATEL
baseline LIF 96.15% 57326 18200 0.073 1.991(100%) 1.00x 0.145(100%) 1.00x 100%
2:1 IW-LIF 96.15% 58497 18460 0.077 0.995(49.97%) 2.00x 0.077(52.71%) 1.88x 26.68%
2:1 IOW-LIF 96.15% 60096 18532 0.086 0.995(49.97%) 2.00x 0.086(58.87%) 1.69x 30.72%
3:1 IW-LIF 92.31% 58762 18782 0.080 0.664(33.35%) 3.00x 0.053(36.55%) 2.74x 24.98%
3:1 IOW-LIF 92.31% 61162 18799 0.092 0.664(33.35%) 3.00x 0.061(42.03%) 2.38x 29.74%
4:1 IW-LIF 92.31% 58910 18753 0.081 0.499(25.06%) 3.99x 0.036(27.81%) 4.03x 14.31%
4:1 IOW-LIF 92.31% 61313 18923 0.095 0.499(25.06%) 3.99x 0.047(32.62%) 3.09x 17.40%
8:1 IW-LIF 80.77% 59210 19087 0.083 0.248(12.46%) 8.03x 0.021(14.16%) 6.90x 9.12%
8:1 IOW-LIF 86.54% 62548 19098 0.099 0.248(12.46%) 8.03x 0.025(16.89%) 5.80x 7.98%
16:1 IW-LIF 69.23% 59400 20000 0.117 0.125(6.28%) 15.93x 0.015(10.06%) 9.67x 5.28%
16:1 IOW-LIF 80.77% 65349 20808 0.134 0.125(6.28%) 15.93x 0.017(11.52%) 8.53x 4.12%
ing and the remaining 20% for testing. The three datasets
present two different types tasks, i.e. speech vs. image clas-
sification, and are based on three different raw input encod-
ing schemes, i.e. the BSA encoding, Poisson-based rate cod-
ing, and CochleaAMS1b sensor based coding. Therefore,
they are well suited for testing the generality of the proposed
time compression.
The baseline LSM FPGA accelerator (without compres-
sion) we built in this paper is based on the standard LIF
model, and consists of an input layer, a recurrent reservoir,
and a readout layer. The number of input neurons is set by
the number of the input spike trains, which is 78, 225 and
64, respectively for the TI46 dataset, CityScape dataset, and
N-TIDIGITS18 dataset, respectively. The reservoir has 135
neurons for the TI46 and CityScape datasets and 300 neu-
rons for the N-TIDIGITS18 dataset, respectively. The reser-
voir neurons are fully connected to the readout neurons. All
readout synapses are plastic and trained using the super-
vised spike-dependent training algorithm in (Zhang and oth-
ers 2015). The power consumption of various FPGA accel-
erators is measured using the Xilinx Power Analyzer (XPA)
tool and their recognition performances are measured from
the FPGA board.
Reservoir responses of the LSMs
We plot the raster plots of the reservoir IOW-LIF neurons
when the input speech example is the letter A from the TI46
Speech Corpus to examine the impact of time compression
in Fig. 8. It is fascinating to observe that when the compres-
sion ratio is between 2:1 to 4:1, the reservoir response in
terms of both total spike count and spatio-temporal spike
distribution changes little from the one without compres-
sion. When the compression ratio increases to the very large
values of 8:1 and 16:1, the original spatio-temporal spike
distribution is still largely preserved. This is consistent to
the decent recognition performance achieved at 8:1 and 16:1
compression ratios presented next.
Performances of TC-SNNs with IOW LIF Neurons
For the three datasets mentioned, we design a baseline LSM
SNN without time compression and five time-compressed
0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 00
2 04 0
6 08 0
1 0 01 2 0
Inde
x of
 Nu
ron
T i m e  S t e p
N o  C o m p r e s s i o n
0 5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 3 5 00
2 04 0
6 08 0
1 0 01 2 0
Inde
x of
 Ne
uro
n
T i m e  S t e p
2 : 1
0 5 0 1 0 0 1 5 0 2 0 00
2 04 0
6 08 0
1 0 01 2 0
Inde
x of
 Ne
uro
n
T i m e  S t e p
3 : 1
0 3 0 6 0 9 0 1 2 0 1 5 00
2 04 0
6 08 0
1 0 01 2 0
Inde
x of
 Ne
uro
n
T i m e  S t e p
4 : 1
0 2 0 4 0 6 0 8 00
2 04 0
6 08 0
1 0 01 2 0
Inde
x of
 Ne
uro
n
T i m e  S t e p
8 : 1
0 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0 4 50
2 04 0
6 08 0
1 0 01 2 0
Inde
x of
 Ne
uro
n
T i m e  S t e p
1 6 : 1
Figure 8: Reservoir response vs. compression ratio.
SNNs (TC-SNNs) with IOW LIF neurons and a fixed time
compression ratio from 2:1 to 16:1, all clocked at 50MHz.
For the TI46 speech dataset (Liberman and others 1991),
the runtime and energy dissipation of each accelerator ex-
pended on 350 training epochs of a batch of 208 randomly
selected examples are measured. We compare the inference
accuracy, hardware overhead measured by FPGA lookup
(LUT) and flip-flop (FF) utilization, power, runtime, and en-
ergy of all six accelerators in Table 1. To show the benefit
of producing weighted output spikes, we create a new input-
weighted (IW) LIF model which differs from the IOW LIF
model in that the IW model generates binary output spikes.
We redesign the five TC-SNN accelerators using IW LIF
neurons and compare them with their IOW counterparts in
Table 1. With large compression ratios the IOW accelerators
significantly outperform their IW counterparts on classifica-
tion accuracy. For example, the IOW accelerator improves
accuracy from 69.23% to 80.77% with a compression ratio
of 16:1.
The power/hardware overhead of the TC-SNN accelera-
tors with IOW LIF neurons only increases modestly with
the time compression ratio. Over a very wide range of com-
pression ratio, the runtime is linearly scaled with the com-
pression ratio while the energy is scaled almost linearly. For
example, 2:1 compression speeds up the runtime by 2×, re-
duces the energy by 1.69×, retaining the same classification
Table 2: Comparison of the baseline and TC-SNN accelerators with IOW LIF neurons based on the CityScape image dataset.
Compres-
sion ratio
Neuron
model Accuracy LUT FF
Power(W)
@50MHz
Runtime(s)
(Normalized
Runtime)
Runtime
Speedup
Energy(J)
(Normalized
Energy)
Energy
reduction
ratio
Normalized
ATEL
baseline LIF 99.07% 57017 16373 0.074 1.497(100%) 1.00x 0.111(100%) 1.00x 100%
2:1 IOW-LIF 99.07% 58826 17294 0.078 0.749(50.03%) 2.00x 0.058(52.25%) 1.91x 27.31%
3:1 IOW-LIF 97.69% 58895 17506 0.113 0.499(33.33%) 3.00x 0.056(50.45%) 1.98x 43.72%
4:1 IOW-LIF 97.69% 59276 17374 0.082 0.375(25.05%) 3.99x 0.031(27.93%) 3.58x 18.00%
8:1 IOW-LIF 95.37% 61254 19322 0.092 0.189(12.63%) 7.92x 0.017(15.32%) 6.53x 10.73%
16:1 IOW-LIF 94.91% 66350 21618 0.079 0.096(6.41%) 15.59x 0.008(7.21%) 13.88x 2.84%
Table 3: Comparison of the baseline and TC-SNN accelerators with IOW LIF neurons based on the NTIDIGITS18 dataset.
Compres-
sion ratio
Neuron
model Accuracy LUT FF
Power(W)
@50MHz
Runtime(s)
(Normalized
Runtime)
Runtime
Speedup
Energy(J)
(Normalized
Energy)
Energy
reduction
ratio
Normalized
ATEL
Baseline LIF 83.63% 106263 25778 0.116 424.61(100%) 1.00x 49.255(100%) 1.00x 100%
2:1 IOW-LIF 82.82% 111688 26070 0.110 212.31(50.00%) 2.00x 23.354(47.41%) 2.11x 26.04%
3:1 IOW-LIF 82.22% 124756 28364 0.112 141.50(33.32%) 3.00x 15.848(32.18%) 3.11x 13.58%
4:1 IOW-LIF 81.91% 112224 26158 0.113 106.87(25.17%) 3.97x 12.076(24.52%) 4.08x 7.17%
8:1 IOW-LIF 80.91% 131614 28934 0.158 53.61(12.63%) 7.92x 8.470(17.20%) 5.82x 3.10%
16:1 IOW-LIF 74.54% 128094 34707 0.174 27.17(6.40%) 15.63x 4.728(9.60%) 10.42x 1.16%
accuracy of 96.15% without degradation. With 4:1 compres-
sion, the runtime is sped up by 3.99×, the energy is re-
duced by 3.09×, and the classification accuracy is as high
as 92.31%. With a large 16:1 compression ratio, the runtime
and energy are reduced significantly by 15.93× and 8.53×,
respectively, and the accuracy is 80.77%.
To jointly evaluate the tradeoffs between hardware area,
runtime, energy, and loss of accuracy, we define a figure of
merit (FOM) ATEL as: ATEL = Area × Time × Energy ×
Loss, where each metric is normalized with respect to the
baseline (no compression), and Loss = (100% - Classifica-
tion Accuracy). Here the hardware area is evaluated by Flop
count + 2*LUT count as suggested by Xilinx. Table 1 shows
that as the compression ratio increases from 1:1 to 16:1, the
ATEL of the TC-SNNs with IOW LIF neurons favorably
drops from 100% to 4.12%, a nearly 25-fold reduction.
We evaluate the proposed architectures using the
CityScape image recognition dataset(Cordts and others
2016) and N-TIDIGITS18 dataset(Anumula and others
2018) in a similar way. The results for the CityScape dat-
set are reported in Table 2, for which the runtime and energy
dissipation of each accelerator are measured for 350 training
epochs of a batch of 864 randomly selected examples. Since
the proposed compression is application independent, the
TC-SNN architectures can be applied to this image recog-
nition task without any modification. Large runtime and en-
ergy reductions similar to the ones for the TI46 dataset are
achieved by the proposed time compression while the degra-
dation of classification accuracy is more graceful. The TC-
SNN with 8:1 compression reduces the runtime and energy
dissipation by 7.92× and 6.53×, respectively while the ac-
curacy only drops to 95.37%. The figure of merit ATEL im-
proves from 100% to 2.84% (35× improvement) when the
TC-SNN runs with 16:1 compression. The results on the N-
TIDIGITS18 dataset are in Table 3, for which the runtime
and energy dissipation of each accelerator are measured for
350 training epochs of a batch of 2,250 training samples.
Again, large runtime and energy reductions are achieved by
the proposed time compression. The TC-SNN with 8:1 com-
pression ratio reduces the runtime and energy dissipation by
7.92× and 5.82× respectively while the accuracy only drop
from 83.63% to 80.91%.
Clearly, the proposed compression architectures can lin-
early scale the runtime, and hence dramatically reduce the
decision latency, and energy dissipation without significant
accuracy degradation at low compression ratios, e.g. up to
4:1. Applying an aggressively large compression ratio can
produce huge energy and runtime reduction while the de-
graded performance may be still acceptable for practical ap-
plications. The supported large range of compression ratio
offers the user great flexibility in targeting an appropriate
performance/overhead tradeoff for a given application.
Performances of TC-SNNs with Bursting Coding
We redesign our TC-SNN accelerators using bursting IOW
LIF models to support burst coding (Park and others 2019)
and compare their performances with the baseline on the
TI46 speech dataset in Table 4. Once again, the proposed
time compression leads to large runtime and energy reduc-
tions and the degradation of classification accuracy is grace-
ful. The additional hardware cost for supporting bursting
coding is somewhat increased but still rather moderate.
Performances of PTC-SNNs with Reconfigurable
Compression Ratio
We also design a time-compressed SNN (PTC-SNN) ac-
celerator supporting programmable ratio ranging from 2:1
to 16:1 and evaluate it using the TI46 dataset in Table 5.
Table 4: Comparison of the baseline and TC-SNN accelerators with burst coding on the TI46 Speech Corpus.
Compres-
sion ratio
Neuron
model Accuracy LUT FF
Power(W)
@50MHz
Runtime(s)
(Normalized
Runtime)
Runtime
Speedup
Energy(J)
(Normalized
Energy)
Energy
reduction
ratio
Normalized
ATEL
baseline LIF 98.08% 92052 62390 0.240 2.527(100%) 1.00x 0.606(100%) 1.00x 100%
2:1 IOW-LIF 92.31% 107263 64845 0.163 1.266(50.10%) 2.00x 0.206(33.99%) 2.94x 77.38%
3:1 IOW-LIF 92.31% 124881 67343 0.168 0.946(37.44%) 2.67x 0.158(26.07%) 3.82x 50.55%
4:1 IOW-LIF 92.31% 102362 61332 0.172 0.637(25.21%) 3.97x 0.110(18.15%) 5.54x 19.68%
8:1 IOW-LIF 88.46% 121183 64481 0.212 0.318(12.58%) 7.95x 0.067(11.06%) 9.00x 10.47%
16:1 IOW-LIF 80.77% 132055 72508 0.289 0.163(6.45%) 15.50x 0.047(7.76%) 12.87x 6.85%
Table 5: Performances of the reconfigurable PTC-SNN hard-
ware accelerator on the TI46 Speech Corpus.
Compres-
sion ratio Accuracy
Power(W)
@50MHz
Runti-
me(s)
Energy
(J)
Normali-
zed ATEL
Baseline 96.15% 0.073 1.991 0.145 100%
2:1 96.15% 0.151 0.995 0.130 57.64%
3:1 92.31% 0.152 0.664 0.088 51.65%
4:1 92.31% 0.155 0.499 0.067 29.87%
8:1 86.54% 0.173 0.248 0.038 14.61%
16:1 80.77% 0.194 0.125 0.022 6.05%
The LUT and FF utilizations of PTC-SNN are 7,4742 and
2,1391, respectively. The overall hardware area overhead
stays constant with the programmable compression ratio,
which is only 12.78% more than that of the TC-SNN ac-
celerator with a fixed 16:1 compression ratio. The runtime
and accuracy of the PTC-SNN are identical to those of the
corresponding TC-SNN running on the same (fixed) com-
pression ratio. The energy overhead of the PTC-SNN is still
near linearly scaled down by the compression ratio albeit
that it is somewhat greater than that of the corresponding
TC-SNN. And yet, the PT-SNN reduces the energy dissipa-
tion and ATEL of the baseline by 6.59x and 16.53x, respec-
tively when running at 16:1 compression ratio.
Conclusion
We propose a general time compression technique and two
compression architectures, namely TC-SNN and PTC-SNN,
to significantly boost the throughput and reduce energy dis-
sipation of SNN accelerators. Our experimental results show
that the proposed time compression architectures can sup-
port large time compression ratios of up to 16×, deliv-
ering up to 15.93×, 13.88×, and 86.21× improvements
in throughput, energy dissipation, and a figure of merit
(ATEL), respectively, and be realized with modest additional
hardware design overhead.
References
[Anumula and others 2018] Anumula, et al. 2018. Feature repre-
sentations for neuromorphic audio spike streams. Frontiers in neu-
roscience 12:23.
[Cordts and others 2016] Cordts, et al. 2016. The cityscapes dataset
for semantic urban scene understanding. In Proceedings of the
IEEE CVPR, 3213–3223.
[Davies and others 2018] Davies, M., et al. 2018. Loihi: A neuro-
morphic manycore processor with on-chip learning. IEEE Micro
38(1):82–99.
[Furber and others 2014] Furber, S. B., et al. 2014. The spinnaker
project. Proceedings of the IEEE 102(5):652–665.
[Gerstner and others 2002] Gerstner, et al. 2002. Spiking neuron
models: Single neurons, populations, plasticity. Cambridge uni-
versity press.
[Izhikevich and others 2002] Izhikevich, et al. 2002. Resonance
and selective communication via bursts in neurons having sub-
threshold oscillations. BioSystems 67(1-3):95–102.
[Kayser and others 2009] Kayser, et al. 2009. Spike-phase coding
boosts and stabilizes information carried by spatial and temporal
spike patterns. Neuron 61(4):597–608.
[Kim and others 2018] Kim, J., et al. 2018. Deep neural networks
with weighted spikes. Neurocomputing 311:373 – 386.
[Liberman and others 1991] Liberman, et al. 1991. TI 46-word
LDC93S9.
[Lyon and others 1982] Lyon, et al. 1982. A computational
model of filtering, detection, and compression in the cochlea. In
ICASSP’82. IEEE ICASSP, volume 7, 1282–1285. IEEE.
[Maass and others 2002] Maass, et al. 2002. Real-time comput-
ing without stable states: A new framework for neural computation
based on perturbations. Neural Comput. 14(11):2531–2560.
[Merolla and others 2014] Merolla, et al. 2014. A million spiking-
neuron integrated circuit with a scalable communication network
and interface. Science 345(6197):668–673.
[Park and others 2019] Park, et al. 2019. Fast and efficient informa-
tion transmission with burst spikes in deep spiking neural networks.
In 2019 56th ACM/IEEE DAC, 1–6. IEEE.
[Ponulak and others 2011] Ponulak, et al. 2011. Introduction to
spiking neural networks: Information processing, learning and ap-
plications. Acta neurobiologiae experimentalis 71(4):409–433.
[Schrauwen and others 2003] Schrauwen, B., et al. 2003. Bsa, a
fast and accurate spike train encoding scheme. In Proceedings of
IJCNN, 2003., volume 4, 2825–2830 vol.4.
[Thorpe and others 1990] Thorpe, et al. 1990. Spike arrival times:
A highly efficient coding scheme for neural networks. Parallel
processing in neural systems 91–94.
[Thorpe and others 2001] Thorpe, et al. 2001. Spike-based strate-
gies for rapid processing. Neural networks 14(6-7):715–725.
[Zhang and others 2015] Zhang, Y., et al. 2015. A digital liquid
state machine with biologically inspired learning and its applica-
tion to speech recognition. IEEE Transactions on Neural Networks
and Learning Systems 26(11):2635–2649.
