A Time-domain Analog Weighted-sum Calculation Model for Extremely Low
  Power VLSI Implementation of Multi-layer Neural Networks by Wang, Quan et al.
ar
X
iv
:1
81
0.
06
81
9v
1 
 [c
s.E
T]
  1
6 O
ct 
20
18
A Time-domain Analog
Weighted-sum Calculation Model
for Extremely Low Power VLSI Implementation
of Multi-layer Neural Networks
Quan Wang, Hakaru Tamukoh, and Takashi Morie
Graduate School of Life Science and Systems Engineering,
Kyushu Institute of Technology
2-4, Hibikino, Wakamatsu-ku, Kitakyushu, 808-0196 Japan
Abstract. A time-domain analog weighted-sum calculation model is
proposed based on an integrate-and-fire-type spiking neuron model. The
proposed calculation model is applied to multi-layer feedforward net-
works, in which weighted summations with positive and negative weights
are separately performed in each layer and summation results are then
fed into the next layers without their subtraction operation. We also
propose very large-scale integrated (VLSI) circuits to implement the pro-
posed model. Unlike the conventional analog voltage or current mode cir-
cuits, the time-domain analog circuits use transient operation in charg-
ing/discharging processes to capacitors. Since the circuits can be de-
signed without operational amplifiers, they can operate with extremely
low power consumption. However, they have to use very high resistance
devices on the order of GΩ. We designed a proof-of-concept (PoC) CMOS
VLSI chip to verify weighted-sum operation with the same weights and
evaluated it by post-layout circuit simulation using 250-nm fabrication
technology. High resistance operation was realized by using the sub-
threshold operation region of MOS transistors. Simulation results showed
that energy efficiency for the weighted-sum calculation was 290 TOPS/W,
more than one order of magnitude higher than that in state-of-the-art
digital AI processors, even though the minimum width of interconnection
used in the PoC chip was several times larger than that in such digital
processors. If state-of-the-art VLSI technology is used to implement the
proposed model, an energy efficiency of more than 1,000 TOPS/W will
be possible. For practical applications, development of emerging analog
memory devices such as ferroelectric-gate FETs is necessary.
Keywords: time-domain analog computing, weighted sum, spike-based
computing, deep neural networks, multi-layer perceptron, artificial intel-
ligence hardware, AI processor
1 Introduction
Artificial neural networks (ANNs), such as convolutional deep neural networks
(CNNs) [15] and fully-connected multi-layer perceptrons (MLPs) [2], have shown
2 Quan Wang, Hakaru Tamukoh, Takashi Morie
excellent performance on various tasks including image recognition [2,13,4,30,14].
However, computation in ANNs is very heavy, which leads to high power con-
sumption in current digital computers, and even in highly parallel coprocessors
such as graphics processing units (GPUs). In order to implement ANNs at edge
devices such as mobile phones and personal service robots, very low power con-
sumption operation is required.
In ANN models, weighted summation, or multiply-and-accumulate (MAC)
operation, is an essential and heavy calculation task, and dedicated comple-
mentary metal-oxide-semiconductor (CMOS) very-large-scale integration (VLSI)
processors have been developed to accomplish it [29,23,28,11]. As an implemen-
tation approach other than digital processors, use of analog operation in CMOS
VLSI circuits is a promising method for achieving an extremely low power con-
sumption operation of such a calculation task [5,16,22,21]. Although the cal-
culation precision is limited due to the non-idealities of analog operation such
as noise and device mismatches, the neural network models and circuits can
be designed to be robust to such non-idealities [24,10,7]. On the other hand,
in the research field of ANN models, low-precision neural networks have been
proposed and their comparable performance has been demonstrated, mainly in
applications of image recognition [3,9]. These models facilitate the development
of energy-efficient hardware implementations [22].
The time-domain weighted-sum calculation model was originally proposed
based on mathematical spiking neuron models inspired by biological neuron be-
havior [18,19]. We have simplified and expanded this calculation model under
the assumption of operation in analog circuits with transient states, and call its
VLSI implementation approach “Time-domain Analog Computing with Tran-
sient states (TACT).” In contrast to conventional weighted-sum operation in
analog voltage or current modes, the TACT approach is suitable for much lower
power consumption operation in CMOS VLSI implementation of ANNs.
We have already proposed a device and circuit that performs time-domain
weighted-sum calculation [26,31,25]. The proposed circuit consists of plural input
resistive elements and a capacitor (RC circuit), which can achieve extremely
low-power operation. The energy consumption could be lowered to the order
of 1 fJ per operation, which is almost comparable to the calculation efficiency
in the brain. We also proposed a circuit architecture to implement a weighted-
sum calculation with different-signed weights with two set of RC circuits, one of
which calculates positively weighted-sums while the other calculates negatively
weighted-sums [32].
In this paper, we formulate the weighted-sum calculation model based on
the TACT approach, and propose its applications to ANNs such as MLPs and
CNNs. We also show simulation results of ANNs using the MNIST database
in which the calculation results by the proposed model are compared with the
ordinary numerical calculation results, and verify the usefulness of our model.
We then evaluate the energy consumption of the proposed circuit by conducting
post-layout circuit simulation of a CMOS circuit designed equivalently to the RC
A Time-domain Weighted-sum Calculation Model 3
input spikes
 output spike
PSP
ii
in
Pi
internal potential
firing thresholdθ
Vn
i1
i2
i3
P1
P2
P3
Vn θ in
(a) (b)
k1
i1(t)
i2(t)
i3(t)
P1(t)
P2(t)
P3(t)


	




θ
tv
Tin Tin
k2
k3
λβ
t1 t3 t2 time






Fig. 1. IF neuron model for weighted-sum operation: schematic of the model and
weighted-sum operation using rise timing of PSPs.
circuit. We propose a VLSI circuit architecture for ANNs based on the proposed
model.
2 Spike-based Time-domain Weighted-sum Calculation
Model
2.1 Time-domain Weighted-sum Calculation with Same-Signed
Weights
A simple spiking neuron model, also known as an integrate-and-fire-type (IF)
neuron model, is shown in Fig. 1 [20]. In this model, a neuron receives spike
pulses via synapses. A spike pulse only indicates the input timing, and its pulse
width and amplitude do not affect the following processing. A spike generates a
temporal voltage change, which is called a post-synaptic potential (PSP), and
the internal potential of the n-th neuron, Vn(t), is equal to the spatiotemporal
summation of all PSPs. When Vn(t) reaches the firing threshold θ, the neuron
outputs a spike, and Vn(t) then settles back to the steady state.
Based on the model proposed in [18], a simplified weighted-sum operation
model using IF neurons is proposed. Time span Tin is defined, during which
only one spike is fed from each neuron, and it is assumed that a PSP generated
by a spike from neuron i increases linearly with slope ki from the timing of the
spike input, ti, as shown in Fig. 1.
4 Quan Wang, Hakaru Tamukoh, Takashi Morie
A required weighted-sum operation is that normalized variables xi (0 ≤ xi ≤
1, i = 1, 2, · · · , N) is multiplied by weight coefficients ai, and the multiplica-
tion results are summed regarding i, where N is the number of inputs. This
weighted-sum operation can be performed using the rise timing of PSPs in the
IF neuron model. Input spike timing ti is determined based on xi using the
following relation;
ti = Tin(1− xi), (1)
xi = (1−
ti
Tin
). (2)
Coefficients ai are transformed into the PSPs’ slopes ki;
ki = λai, (3)
where λ is a positive constant. If the firing time of the neuron is defined as tν ,
the following equation is easily obtained:
N∑
i=1
ki(tν − ti) = θ. (4)
If we define the following parameters:
β =
N∑
i=1
ai, (5)
we obtain
N∑
i=1
ai · xi =
θ/λ+ β(Tin − tν)
Tin
, (6)
=
θ
λTin
+ β(1−
tν
Tin
). (7)
Here, we assume that all the weights in the calculation have the same sign;
i.e., ai ≥ 0 or ai ≤ 0 for all i. When all inputs are minimum (∀i xi = 0), the left
side of Eq. (6) is zero. Then, the output timing tν is given by
tminν =
θ
λβ
+ Tin. (8)
On the other hand, when all inputs are maximum (∀i xi = 1), the left side of
Eq. (6) is β, and the output timing tν is given by
tmaxν =
θ
λβ
. (9)
The time span during which tν can be output is [t
max
ν , t
min
ν ], and its interval is
Tout ≡ t
min
ν − t
max
ν = Tin. (10)
A Time-domain Weighted-sum Calculation Model 5
Thus, the time span of output spikes is the same as that of input spikes, Tin.
In this model, since the normalization of the sum of ai (β = 1) is not required
(unlike in our previous work [18,19,31]), the calculation process becomes much
simpler. When implementing the time-domain weighted-sum operation, setting
the threshold potential θ properly is the key to making the operation work
appropriately. As shown in Fig. 1, the earliest output spike timing has to be
later than the latest input spiking timing Tin; that is, t
max
ν ≥ Tin. Thus,
θ ≥ λβTin. (11)
Also, we can rewrite Eq. (11) as
θ = λβTin + δ, (12)
δ = ǫ(λβTin), (13)
where ǫ ≥ 0 is an arbitrarily small value. By substituting Eqs. (12) and (13) into
Eqs. (8) and (9), we obtain
tminν = (2 + ǫ)Tin, (14)
tmaxν = (1 + ǫ)Tin, (15)
where ǫTin is considered as a time slot between input and output timing spans,
as shown in Fig. 1, and ǫ determines the length of the slot. Also, the weighted
summation expressed by Eq. (6) is rewritten as follows:
N∑
i=1
ai · xi = β[2 + ǫ−
tν
Tin
]. (16)
2.2 Time-domain Weighted-sum Calculation with Different-Signed
Weights
We have proposed a time-domain weighted-sum calculation model with two spik-
ing neurons, one of which is for all the positive weights and the other for all the
negative ones [32]. We apply Eq. (6) to each neuron, and the two results are
summed as the final result of the original weighted-sum. Here, we show the
details of the model.
Let a+i and a
−
i indicate the positive and negative weights, respectively. We
define
β+ =
N+∑
i=1
a+i ≥ 0, (17)
β− =
N−∑
i=1
a−i ≤ 0. (18)
6 Quan Wang, Hakaru Tamukoh, Takashi Morie
where N+ and N− are the numbers of positive and negative weights, respec-
tively;
N = N+ +N−, (19)
N∑
i=1
ai =
N+∑
i=1
a+i +
N−∑
i=1
a−i , (20)
β = β+ + β−. (21)
Thus, assuming λ = 1, Eq. (4) is rewritten for the positive and negative weighted-
sum operations as follows:
N+∑
i=1
a+i (t
+
ν − ti) = θ
+, (22)
N−∑
i=1
a−i (t
−
ν − ti) = θ
−, (23)
where θ+(> 0), θ−(< 0), andt+ν and t
−
ν indicate the threshold values and output
timing for the positively and negatively weighted-sum operation, respectively.
Then we obtain
N+∑
i=1
a+i · xi =
θ+ + β+(Tin − t
+
ν )
Tin
, (24)
N−∑
i=1
a−i · xi =
θ− + β−(Tin − t
−
ν )
Tin
. (25)
Therefore, we can obtain the original weighted-sum result:
N∑
i=1
ai · xi =
N+∑
i=1
a+i · xi +
N−∑
i=1
a−i · xi, (26)
=
θ+ + θ− + βTin − (β
+t+ν + β
−t−ν )
Tin
. (27)
Let us define a dummy weight a0 as the difference between both absolute
values of β±;
a0 = −(β
+ + β−). (28)
If β+ ≥ −β−, then a0 ≤ 0 and this dummy weight is incorporated into the
negative weight group, and vice versa. This dummy weight is related to a zero
input, x0 = 0, which means t0 = Tin. By using the dummy weight, we can make
the absolute values of β± identical (β = 0), and we define
βo = β
+ = −β−. (29)
A Time-domain Weighted-sum Calculation Model 7


ff
fi
fl
ffi

 
!
"
#
$
%
&
'
(
)*+,
(a) (b)
-
.
/
0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
@
A
B
C
D
E
FG
H
I
J
K
L M N
O
P
Q
R
STU
V
W
XYZ[
\]^_`
a b c d
e
fgh
ijklmn
o
pqr
s
t
u
v
w
x
y
z
{
|}
~




  




Ł




  









 
¡¢
£
¤¥
¦
§¨
©
ª
«¬
­
®¯
°
±
²
³´
µ
¶
(c)
Fig. 2. Neuron model: (a) typical neuron model; (b) neuron model for time-domain
weighted-sum operation with a dummy weight, wn+1; (c) neuron model for time-domain
weighted-sum operation in which each synapse has two sets of inputs and weights. One
is (xi, wi) and the other is (0,−wi) or (ti, wi) and (Tin,−wi) according to Eq. (2).
Also, according to Eqs. (12) and (13), the absolute values of θ+ and θ− can be
the same, and θ+ + θ− = 0. Therefore, Eq. (27) can be rewritten as
N∑
i=1
ai · xi =
βo(t
−
ν − t
+
ν )
Tin
. (30)
3 Spiking Neural Network model
3.1 Neuron Model
The typical neuron model of artificial neural networks is shown in Fig. 2(a),
which has N inputs xi with weights wi and a bias b;
y = f(
N∑
i=1
wi · xi + b), (31)
where y is the output of the neuron, and f is an activation function. We can
consider the bias as a weight whose input is always unity. Therefore, our time-
domain weighted-sum calculation model with the dummy weight can be applied
to this neuron model, as shown in Fig. 2(b). According to Eq. (30),
N∑
i=1
wi · xi + b =
β(t−ν − t
+
ν )
Tin
. (32)
8 Quan Wang, Hakaru Tamukoh, Takashi Morie

 

 

 

 

 

 

	

	

	


	
	 	


	


	


	


	


	

·
¸¹º

»
¼½¾

¿
ÀÁÂ


 


 


	
	
 
	






	


ÃÄ


ÅÆ
ÇÈÉ

ÊË
ÌÍÎ

ÏÐ
ÑÒÓ

ÔÕ

Ö×
ØÙÚ


ÛÜ
ÝÞ
ßàá

âã
äåæ
1

ç
èéê

ë
ìíî

ïð


 

 
	

	
	

	
1
	

ñ
ò
ó
ô
Fig. 3. General neural network model with two inputs and outputs for time-domain
weighted-sum calculation with positive and negative weights.
Based on Eq. (32), we propose another model, shown in Fig. 2(c), in which
each synapse has two sets of inputs and weights; one is (xi, wi) and the other
is (0,−wi). In this model, it is not necessary to add a dummy weight because
the summation of positive weights is β =
∑N
i=0 |wi| and that of negative ones is
β = −
∑N
i=0 |wi|, which means that the absolute values of both summations are
equal.
As the activation function f , we often use the rectified linear unit called
“ReLU” [6,8], which is defined as follows:
f(x) = ReLU(x) =
{
x if x ≥ 0,
0 otherwise.
(33)
We can implement the ReLU function by comparing the output timings t
(n)−
vj
and t
(n)+
vj in the time-domain weighted-sum calculation. If t
(n)−
vk ≥ t
(n)+
vk , the
difference between two timing values is regarded as the output transferred to
neurons in the next layer. On the other hand, if t
(n)−
vk < t
(n)+
vk , the output is
A Time-domain Weighted-sum Calculation Model 9
zero, because the total weighted sum is negative. To do this, we set t
(n)−
vk and
t
(n)+
vk to be identical. Its circuit implementation will be shown later.
3.2 Neural Network Model
In this section, we show an application of our time-domain weighted-sum model
to the MLPs shown in Fig. 3 as an example that has one hidden layer and two
sets of input and weight for each neuron. In this application, after we calculate
a weighted sum using Eq. (32), the result is given to the activation function f ,
and the output is fed into the next layer.
According to Eq. (32), the weighted-sum result of the j-th neuron in the layer
labeled n in Fig. 3 can be
N∑
i=0
w
(n)
ij · xi =
β
(n)
j
Tin
(t
(n)−
vj − t
(n)+
vj ). (34)
where w
(n)
0j = b
(n)
j is the bias of the j-th neuron in the n-th layer. The output of
the k-th neuron in the layer labeled p(= n+ 1) in Fig. 3 is
N∑
j=1
w
(p)
jk · f(
N∑
i=0
w
(n)
ij · xi) + b
(p)
k =
N∑
j=1
w
(p)
jk · f(
β
(n)
j
Tin
(t
(n)−
vj − t
(n)+
vj )) + b
(p)
k . (35)
Here, substituting the activation function with ReLU, we can obtain
ReLU(
β
(n)
j
Tin
(t
(n)−
vj − t
(n)+
vj )) =
β
(n)
j
Tin
(t
(n)−
vj − t
(n)+
vj ), (36)
where if t
(n)−
vk < t
(n)+
vk , then let t
(n)−
vk = t
(n)+
vk . Thus, Eq. (35) can be rewritten as
N∑
j=1
w
(p)
jk ·ReLU(
N∑
i=0
w
(n)
ij · xi) + b
(p)
k =
N∑
j=1
w
(p)
jk ·
β
(n)
j
Tin
(t
(n)−
vj − t
(n)+
vj ) + b
(p)
k . (37)
In the MLP shown in Fig. 3, we transfer the output timings t
(n)+
vj and t
(n)−
vj
generated in layer n to neurons in layer p and perform the time-domain weighted-
sum operation. In layer n, we assume that timing t
(n)+
vj is related to weight w
(p)
jk
and t
(n)−
vj is related to −w
(p)
jk . We also assume here j = 3, and that w
(p)
1k ≥
0, w
(p)
2k < 0, w
(p)
3k ≥ 0, b
(p)
k ≥ 0, and θ
(p)+
k = −θ
(p)−
k , where θ
(p)+
k and θ
(p)−
k
are the threshold values for positively and negatively weighted-sum operations,
respectively. Thus, according to Eq. (4), we can obtain
w
(p)
1k (t
(p)+
vk − t
(n)+
v1 ) + (−w
(p)
2k )(t
(p)+
vk − t
(n)−
v2 ) + w
(p)
3k (t
(p)+
vk − t
(n)+
v3 )
+b
(p)
k (t
(p)+
vk − t
(n)+
v0 ) = θ
(p)+
k (38)
(−w
(p)
1k )(t
(p)−
vk − t
(n)−
v1 ) + w
(p)
2k (t
(p)−
vk − t
(n)+
v2 ) + (−w
(p)
3k )(t
(p)−
vk − t
(n)−
v3 )
10 Quan Wang, Hakaru Tamukoh, Takashi Morie
+(−b
(p)
k )(t
(p)−
vk − t
(n)−
v0 ) = θ
(p)−
k (39)
By adding Eq. (38) to Eq. (39), the following relationship is obtained:
w
(p)
1k (t
(p)+
vk − t
(n)+
v1 ) + (−w
(p)
2k )(t
(p)+
vk − t
(n)−
v2 ) + w
(p)
3k (t
(p)+
vk − t
(n)+
v3 )+
(−w
(p)
1k )(t
(p)−
vk − t
(n)−
v1 ) + w
(p)
2k (t
(p)−
vk − t
(n)+
v2 ) + (−w
(p)
3k )(t
(p)−
vk − t
(n)−
v3 )
+b
(p)
k (t
(p)+
vk − t
(n)+
v0 ) + (−b
(p)
k )(t
(p)−
vk − t
(n)−
v0 )
= t
(p)+
vk (w
(p)
1k − w
(p)
2k + w
(p)
3k + b
(p)
k ) + t
(p)−
vk (−w
(p)
1k + w
(p)
2k − w
(p)
3k − b
(p)
k )
+
N=3∑
j=0
w
(p)
jk · t
(n)−
vj −
N=3∑
j=0
w
(p)
jk · t
(n)+
vj
=
N=3∑
j=0
|w
(p)
jk | · (t
(p)+
vk − t
(p)−
vk ) +
N=3∑
j=0
w
(p)
jk · (t
(n)−
vj − t
(n)+
vj ) = 0, (40)
where w
(p)
0k = bk(p). Thus, we can obtain the following simple expression:
N=3∑
j=0
w
(p)
jk · (t
(n)−
vj − t
(n)+
vj ) = (t
(p)−
vk − t
(p)+
vk ) ·
N=3∑
j=0
|w
(p)
jk |. (41)
Therefore, we can have the more general expression as follows:
N∑
j=0
w
(p)
jk · (t
(n)−
vj − t
(n)+
vj ) = (t
(p)−
vk − t
(p)+
vk ) ·
N∑
j=0
|w
(p)
jk |. (42)
Then, we can obtain the following formula:
N∑
j=0
w
(p)
jk ·
β
(n)
j
Tin
· (t
(n)−
vj − t
(n)+
vj ) = (t
(p)−
vk − t
(p)+
vk ) ·
N∑
j=0
|w
(p)
jk | ·
β
(n)
j
Tin
, (43)
N∑
j=1
w
(p)
jk ·
β
(n)
j
Tin
· (t
(n)−
vj − t
(n)+
vj ) + w
(p)
0k ·
β
(n)
0
Tin
(t
(n)−
v0 − t
(n)+
v0 )
= (t
(p)−
vk − t
(p)+
vk ) ·
N∑
j=0
|w
(p)
jk | ·
β
(n)
j
Tin
, (44)
where w
(p)
0k = b
(p)
k , t
(n)−
v0 = Tin, and t
(n)+
v0 = 0. Because there is no input to the
bias b
(p)
k , we let β
(n)
0 = 1. Therefore, Eq. (44) becomes
N∑
j=1
w
(p)
jk ·
β
(n)
j
Tin
· (t
(n)−
vj − t
(n)+
vj ) + b
(p)
k = (t
(p)−
vk − t
(p)+
vk ) ·
N∑
j=0
|w
(p)
jk | ·
β
(n)
j
Tin
, (45)
where β
(n)
0 = 1 on the right side. Therefore, Eq. (37) can be modified as
N∑
j=1
w
(p)
jk · ReLU(
N∑
i=0
w
(n)
ij · xi) + b
(p)
k = (t
(p)−
vk − t
(p)+
vk ) ·
N∑
j=0
|w
(p)
jk | ·
β
(n)
j
Tin
. (46)
A Time-domain Weighted-sum Calculation Model 11
As a result, for neurons in the hidden layer n, we apply the time-domain
weighted-sum operation to generate the timing t
(n)+
vj and t
(n)−
vj for the positively
and negatively weighted-sum calculation from the input layer, respectively. Then,
these timings are directly transferred to neurons in the next layer p, and timing
t
(p)+
vj and t
(p)−
vj are obtained. Finally, we calculate the final outputs of the MLP
using Eq. (46). Note that intermediate weighted-sum results with different-signed
weights are not calculated in the middle layers.
For CNNs, weighted-sum calculations of convolutions can be performed in the
same way. In addition to the convolutions, max pooling operations can also be
implemented simply by considering the difference between positive and negative
timing values, as follows:
ymaxpooling = (t
(l)−
vk , t
(l)+
vk ), (47)
where
k = argmax
i
(t
(l)−
v1 − t
(l)+
v1 , · · · , t
(l)−
vi − t
(l)+
vi ). (48)
3.3 Numerical Simulations of Neural Networks
We performed numerical simulations to verify our weighted-sum calculation
model. First, in order to verify our model for weighted-sum calculation with
different-signed weights, we conducted a simulation to perform a weighted-sum
calculation with 501 pairs of inputs and weights that consisted of 249 positive
and 252 negative weights. We added a dummy weight to make the sum of pos-
itive weights equal to the absolute sum of the negative ones. Figure 4 shows
the simulation results of time-domain weighted-sum calculation with a dummy
weight wn+1. The results show that the weighted-summation can be calculated
correctly with different negative and positive firing timing inputs each set of
which are multiplied by the corresponding signed weights.
Then, we applied our model to a four-layer MLP (784-100-100-10) and a CNN
known as LeNet5 [15] to classify the MNIST digit character set. We trained these
two ANNs, and then performed inference according to Eq. (46) with the obtained
weights, which were either binary [3] or analog values. As described above, out-
put spike timing at each neuron in the previous layer was directly conveyed to
the neurons in the next layer without obtaining the subtraction of the signed
weighted-sum results. We founded that we obtained the same weighted-sum cal-
culation results in the last layer, and also the same recognition precisions in both
NNs as in the numerically calculated ones.
4 Circuits and Architectures for TACT-based Neural
Networks
As an implementation for our weighted-sum calculation based on our TACT
approach, we have proposed an RC circuit in which a capacitor is connected
12 Quan Wang, Hakaru Tamukoh, Takashi Morie




 


 












(a)
(b)



Fig. 4. Simulation results for the time-domain weighted-sum calculation model: (a)
PSP of positively weighted-sum operation with 249 inputs in which Tin = 1, λ =
1, β+ = 24.01, and θ+ = 26.41. The output spike timing is t+ν = 1.7256. (b) PSP
of negatively weighted-sum operation with 253 inputs in which w0 = −0.06, wn+1 =
−2.819, Tin = 1, λ = 1, β
− = −24.01, and θ− = −26.41. The output spike timing is
t−ν = 1.9221. Thus, the result of weighted-sum calculation is |β
±|(t−ν −t
+
ν )/Tin = 4.718.
by plural resistors, as shown in Fig. 5(a). Theoretical estimations have indi-
cated that this circuit can perform weighted-sum calculations with extremely
low energy consumption [31,32]. In CMOS VLSI implementation, resistance R
can be replaced by a p-type MOS field-effect transistor (pMOSFET), as shown
in Fig. 5(b). The approximately linear slope k is generated by capacitance C
and ON resistance of a pMOSFET with a step voltage input Vin, where we use
step voltages instead of spike pulses as inputs. Each resistance should have a
rectification function to prevent an inverse current.
The rectification function is automatically realized by the FET operation as
follows. When a pMOSFET receives a step-voltage input, the terminal voltage
of the input is higher than that at C, and therefore the input-side terminal of
the pMOSFET is “source”, and the capacitor-side terminal is “drain.” In this
state, if the gate-source voltage of the pMOSFET is set to exceed its threshold
voltage, the pMOSFET turns on, and C is charged up. On the other hand, when
A Time-domain Weighted-sum Calculation Model 13
C
(b)(a)
R
Vin1Vin2
C
Vin3
θ
tvti

Vini
Vini
Fig. 5. Synapse circuit: (a) step voltage input and a resistance-capacitance (R-C) cir-
cuit in which a pMOSFET acts as resistance R, and parasitic capacitance of inter-
connection and the gate capacitance of MOSFETs act as C in a VLSI circuit; (b)
approximately linear response of the step voltage input at timing ti with a slope de-
termined by gate voltage Vki.
a pMOSFET receives no input, the terminal voltage of the input is lower than
that at C, and therefore the source-drain position in the pMOSFET is reversed;
i.e., the input-side terminal of the pMOSFET is “drain”, and the capacitor-side
terminal is “source.” In this state, if the gate-source voltage of the pMOSFET
is set not to exceed its threshold voltage, the pMOSFET turns off, and charges
stored at C does not flow back to the input side.
In order to evaluate the energy consumption of this circuit, we designed a
CMOS circuit equivalent to the RC circuit. It is obviously difficult to change
the ON resistance of each pMOSFET independently, because different analog
voltages have to be given as the gate voltages of the pMOSFETs. Therefore, in
the PoC circuit, all MOSFETs have the same ON resistance with the same gate
voltage. To realize different analog weights, it is necessary to use analog memory
devices such as resistance-change memory [27,12], ferroelectric-gate FETs [17],
and floating-gate flash memory [1].
We designed a crossbar synapse circuit array to perform the weighted-sum
calculation shown in Fig. 1 with a one-layer MLP model, as shown in Fig. 6.
In the array, the horizontal and vertical lines are referred to as “axons” of the
previous neurons and “dendrites” of the post neurons, respectively; each axon
line has M synapse circuits, and each dendrite line receives N synapse outputs.
An input voltage charges up the parasitic capacitance of the axon line, Cal, and
then charges up the capacitance, which includes the parasitic capacitance of the
dendrite lines, Cdl, and the input capacitance of the post neuron, Ci, via synapse
blocks, each of which consists of a pMOSFET.
14 Quan Wang, Hakaru Tamukoh, Takashi Morie
(a) (b)
õö
÷ø
ùú
û
üý
þß
	
 


(c)









ff
fi
flffi !"#$%&'(
)*+,-./ 01234
5
6789
:
;<=>
?
@ABC
D
EFG
H
IJK
L
MNO
P
QRS
TUVWXYZ[\
]^_`
abcdefghijk
lmnopqrstuvw
x
yz
{
|}
~







  
Ł

Fig. 6. Crossbar architecture: (a) one-layer neural network model; (b) crossbar ar-
chitecture with pre-neurons having axons, post-neurons having dendrites; (c) CMOS
comparator that act as a post-neuron circuit.
We designed two single-layer neural network circuits, which have 500 inputs
and 20 outputs (N = 500,M = 20), and 500 inputs and 100 outputs (N =
500,M = 100), respectively. The layout result and post layout simulation results
are shown in Fig. 7. The fabrication technology of TSMC 250- nm were used,
and both the gate length and width of pMOSFETs were 0.6 µm. The simulation
results show a correct weighted-sum operation, where Tin = 1 µs, and θ = 0.3 V.
We extracted the parasitic capacitance of the dendrite and axon lines per
synapse, cdl = Cdl/N and cal = Cal/M , which were 1.76 fF and 1.78 fF, re-
spectively. It is assumed that the input capacitance of pMOSFETs is included
in this capacitance. Therefore, the energy consumption of line charge/discharge
operation per synapse, Esyn, is expressed by Esyn = (cdl + cal)V
2
dd, and it was
3.54 fJ, where Vdd = 1 V. We note that the above estimation does not include
the energy consumption related to charging/discharging of the input capacitance
Ci of post-neuron circuits; this, however, would be negligible compared to the
dendrite line capacitance under 250-nm technology.
As for the neuron part, which consists of a comparator and the output buffer,
the energy consumption was about 1.67 pJ, which means 3.34 fJ per synapse
operation. As a result, overall energy consumption was 6.88 fJ per synapse op-
eration, which consists of two operations, multiply and accumulate. This implies
that the energy efficiency is 290 TOPS/W (Tera-Operations Per Second per
A Time-domain Weighted-sum Calculation Model 15
Watt). This efficiency is more than one order of magnitude higher than that of
state-of-the-art digital AI processors [11].
For sufficient calculation precision, we expect the time constant of RC circuits
to be much more than 1 µs to guarantee a time resolution of 7 bits, assuming
a resolved time step of 10 ns. To obtain this time constant, R should be more
than 1 GΩ, which means that the current flowing through each resistance is
less than 1 nA. Such high resistance can be achieved by using the subthreshold
operation of MOSFET, and we set all the gate-source voltages of the pMOSFETs
at -0.37V.
We propose the circuit architecture of a neural network suitable for our TACT
approach, in which the weights may be both positive and negative as described
above. The architecture is shown in Fig. 8, and is composed of synapses acting as
resistive elements, the neuron part functioning as thresholding, the ReLU part
denoting the activation, and the configuration part controlling synapses.
There are two inputs for each synapse circuit, which are ti as signal input
and Tin as a dummy input in the first layer, and t
+
νi and t
−
νi at the subsequent
layers. Pairs of positive and negative timing are directly connected to the next
layer without calculating subtraction between positively and negatively signed
weighted results. The synapse part is designed with two resistive elements and
two pairs of switches. A set of two identical resistances represents the weight
value. We can assume that the upper-side axon is for t+νi and the other is for
t−νi, and the left-side dendrite is for a positive weight connection while the other
is for a negative one. The two switches are exclusively controlled according the
corresponding sign of weights, which is controlled by the weight control circuit.
The neuron part includes a comparator. The ReLU activation function can
easily be implemented by logic gates, as shown in Fig. 9. When t+νi > t
−
νi, we
set both timing at t+νi as shown in Fig. 9. With such circuits, the output spike
timings at each neuron in the previous layer are directly transferred into the
neurons in the next layer, while the nonlinear activation function ReLU can be
implemented with low energy consumption operation.
5 Conclusions
In this paper, we proposed a time-domain weighted-sum calculation model based
on the spiking neuron model, and formulated the calculation model to implement
MLPs with an activation function of ReLU. In the proposed model, weighted-sum
results with different-signed weights are not calculated in intermediate layers.
We also proposed VLSI circuits based on the TACT approach to implement the
calculation model with extremely low energy consumption. We demonstrated the
high energy efficiency of the circuit using 250-nm CMOS VLSI technology. If we
use a more advanced VLSI fabrication technology, able to achieve lower parasitic
capacitance, the energy efficiency will be much improved over 1 POPS/W (Peta-
operations Per Second per Watt).
However, there are some issues to be overcome toward developing practical AI
processors using our TACT approach. In terms of synapse circuits, the designed
16 Quan Wang, Hakaru Tamukoh, Takashi Morie
V 
[V]
time [µs]
0 3
1.0
0.3
500

20
500

100
(a) (b)
(c)
Process
Chip size
Voltage
TSMC 250 nm
5 mm sq.
1.5 / 1.0 V (Neuron)
1.0 V (Synapse)
LSI Specification
# of Synapses 50020500100






reset
 
¡¢£
¤
¥¦
§
Fig. 7. Layout results: (a) layout of 500-20 and 500-100 MLPs; (b) VLSI circuit spec-
ification; and (c) post-layout simulation results for time-domain analog calculation.
PoC CMOS VLSI chip provided no memory function. In order to construct
neural network systems, it is necessary to introduce analog memory devices
with high resistance in synapse circuits. As for the neuron parts, the results
of post-layout simulations suggest that the energy consumption of this part is
comparable to that of the whole synapse part with 500 inputs. It will be necessary
to design a comparator with much lower power consumption to improve the
energy efficiency of the whole weighted-sum calculation circuit.
Acknowledgments. This work was supported by JSPS KAKENHI Grant Nos.
22240022 and 15H01706. Part of the work was carried out under a project com-
missioned by the New Energy and Industrial Technology Development Organi-
zation (NEDO), and the Collaborative Research Project of the Institute of Fluid
Science, Tohoku University.
References
1. Bavandpour, M., Mahmoodi, M.R., Strukov, D.B.: Energy-efficient time-domain
vector-by-matrix multiplier for neurocomputing and beyond. arXiv preprint
arXiv:1711.10673 (2017)
2. Cires¸an, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Deep, big, simple
neural nets for handwritten digit recognition. Neural Comp. 22(12), 3207–3220
(2010)
3. Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: Training deep neural
networks with binary weights during propagations. In: Advances in Neural Infor-
mation Processing Systems. pp. 3123–3131 (2015)
A Time-domain Weighted-sum Calculation Model 17
¨
©
ª
«




	

	
	



¬­®¯ °±²³ ´µ¶·

¸¹
º

»¼
½

¾¿
À

ÁÂ
Ã
Ä
Å
Æ
Ç

ÈÉ
Ê

ËÌ
Í

ÎÏ
Ð

ÑÒ
Ó

ÔÕ
Ö 
×Ø
Ù 
ÚÛ
Ü 
ÝÞ
ß 
àá
â 
ãä
å

æç
è 
éê
ë 
ìí
î 
ïð
ñ 
òó
ô 
õö
÷
øùúûüýþ
Fig. 8. Two-layer MLP architecture.
4. Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features
for scene labeling. IEEE Trans. Pattern Analysis and Machine Intelligence 35(8),
1915–1929 (2013)
5. Fick, L., Blaauw, D., Sylvester, D., Skrzyniarz, S., Parikh, M., Fick, D.: Analog in-
memory subthreshold deep neural network accelerator. In: Proc. of IEEE Custom
Integrated Circuits Conf. (CICC). pp. 1–4 (2017)
6. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proc.
of Int. Conf. on Artificial Intelligence and Statistics (AISTATS). pp. 315–323 (2011)
7. Guo, X., Bayat, F.M., Prezioso, M., Chen, Y., Nguyen, B., Do, N., Strukov, D.B.:
Temperature-insensitive analog vector-by-matrix multiplier based on 55 nm NOR
flash memory cells. In: Proc. of IEEE Custom Integrated Circuits Conf. (CICC).
pp. 1–4 (2017)
8. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-
level performance on imagenet classification. In: Proc. of IEEE Int. Conf. on Com-
puter Vision (ICCV). pp. 1026–1034 (2015)
18 Quan Wang, Hakaru Tamukoh, Takashi Morie
 
	

 

 
   
   
   
   
   
 












 (b)(a)
(c)
(d)


Fig. 9. ReLU circuit.
9. Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Quantized neu-
ral networks: Training neural networks with low precision weights and activations.
J. Mach. Learn. Res. 18(1), 6869–6898 (2017)
10. Indiveri, G.: Computation in neuromorphic analog VLSI systems. In: Proc. of Ital-
ian Workshop on Neural Nets (WIRN). pp. 3–19 (2001)
11. Khwa, W.S., Chen, J.J., Li, J.F., Si, X., Yang, E.Y., Sun, X., Liu, R., Chen, P.Y.,
Li, Q., Yu, S., Chang, M.F.: A 65nm 4Kb algorithm-dependent computing-in-
memory SRAM unit-macro with 2.3 ns and 55.8 TOPS/W fully parallel product-
sum operation for binary DNN edge processors. In: IEEE Int. Solid-State Circuits
Conf. (ISSCC). pp. 496–498 (2018)
12. Kim, S., Ishii, M., Lewis, S., Perri, T., BrightSky, M., Kim, W., Jordan, R., Burr,
G., Sosa, N., Ray, A., Han, J., Miller, C., Hosokawa, K., Lam, C.: NVM neu-
romorphic core with 64k-cell (256-by-256) phase change memory synaptic array
with on-chip neuron circuits for continuous in-situ learning. In: IEEE Int. Electron
Devices Meeting (IEDM). pp. 17–1 (2015)
13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con-
volutional neural networks. In: Advances in Neural Information Processing Sys-
tems. pp. 1097–1105 (2012)
14. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444
(2015)
15. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to
document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
16. Lee, E.H., Wong, S.S.: A 2.5 GHz 7.7 TOPS/W switched-capacitor matrix multi-
plier with co-designed local memory in 40nm. In: IEEE Int. Solid-State Circuits
Conf. (ISSCC). pp. 418–419 (2016)
17. Li, Q.H., Horiuchi, T., Wang, S., Takahashi, M., Sakai, S.: Threshold voltage ad-
justment of ferroelectric-gate field effect transistors by ion implantation. Semicond.
Sci. Technol. 24(2), 025012 (2009)
A Time-domain Weighted-sum Calculation Model 19
18. Maass, W.: Fast sigmoidal networks via spiking neurons. Neural Comp. 9(2), 279–
304 (1997)
19. Maass, W.: Computing with spiking neurons. In: Pulsed Neural Networks. pp.
55–85. MIT Press (1999)
20. Maass, W., Bishop, C.M.: Pulsed Neural Networks. MIT Press (1999)
21. Mahmoodi, M.R., Strukov, D.: An ultra-low energy internally analog, externally
digital vector-matrix multiplier based on NOR flash memory technology. In: Proc.
of Design Automation Conf. (DAC). p. 22 (2018)
22. Miyashita, D., Kousai, S., Suzuki, T., Deguchi, J.: Time-domain neural network:
A 48.5 TSOp/s/W neuromorphic chip optimized for deep learning and CMOS
technology. In: IEEE Asian Solid-State Circuits Conf. (A-SSCC). pp. 25–28 (2016)
23. Moons, B., Uytterhoeven, R., Dehaene, W., Verhelst, M.: Envision: A 0.26-to-
10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Con-
volutional Neural Network processor in 28nm FDSOI. In: IEEE Int. Solid-State
Circuits Conf. (ISSCC). pp. 246–247 (2017)
24. Morie, T., Amemiya, Y.: An all-analog expandable neural network LSI with on-chip
backpropagation learning. IEEE J. Solid-State Circuits 29(9), 1086–1093 (1994)
25. Morie, T., Liang, H., Tohara, T., Tanaka, H., Igarashi, M., Samukawa, S., Endo,
K., Takahashi, Y.: Spike-based time-domain weighted-sum calculation using nan-
odevices for low power operation. In: IEEE Int. Conf. on Nanotechnology (IEEE-
NANO). pp. 390–392 (2016)
26. Morie, T., Sun, Y., Liang, H., Igarashi, M., Huang, C.H., Samukawa, S.: A 2-
dimensional Si nanodisk array structure for spiking neuron models. In: IEEE Proc.
of Int. Symp. Circuits and Systems (ISCAS). pp. 781–784 (2010)
27. Prezioso, M., Merrikh-Bayat, F., Hoskins, B., Adam, G., Likharev, K.K., Strukov,
D.B.: Training and operation of an integrated neuromorphic network based on
metal-oxide memristors. Nature 521(7550), 61–64 (2015)
28. Shin, D., Lee, J., Lee, J., Yoo, H.J.: DNPU: An 8.1 TOPS/W reconfigurable CNN-
RNN processor for general-purpose deep neural networks. In: IEEE Int. Solid-State
Circuits Conf. (ISSCC). pp. 240–241 (2017)
29. Sim, J., Park, J.S., Kim, M., Bae, D., Choi, Y., Kim, L.S.: A 1.42TOPS/W deep
convolutional neural network recognition processor for intelligent IoE systems. In:
IEEE Int. Solid-State Circuits Conf. (ISSCC). pp. 264–265 (2016)
30. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,
Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proc. of IEEE
Conf. on Computer Vision and Pattern Recognition (CVPR). pp. 1–9 (2015)
31. Tohara, T., Liang, H., Tanaka, H., Igarashi, M., Samukawa, S., Endo, K., Taka-
hashi, Y., Morie, T.: Silicon nanodisk array with a fin field-effect transistor for
time-domain weighted sum calculation toward massively parallel spiking neural
networks. Appl. Phys. Express 9(3), 034201 (2016)
32. Wang, Q., Tamukoh, H., Morie, T.: Time-domain weighted-sum calculation for
ultimately low power VLSI neural networks. In: Proc. Int. Conf. on Neural Infor-
mation Processing (ICONIP). pp. 240–247 (2016)
