Neural Network Design for Energy-Autonomous AI Applications using
  Temporal Encoding by Mileiko, Sergey et al.
rsta.royalsocietypublishing.org
Research
Article submitted to journal
Subject Areas:
Edge computing, artificial intelligence.
Keywords:
Neural networks, hardware design,
energy efficiency, energy autonomy.
Author for correspondence:
Serhii Mileiko
e-mail: S.Mileiko2@newcastle.ac.uk
Neural Network Design for
Energy-Autonomous AI
Applications using Temporal
Encoding
Sergey Mileiko1, Thanasin Bunnam2, Fei
Xia3, Rishad Shafik4, Alex Yakovlev5 and
Shidhartha Das6
1S.Mileiko2@newcastle.ac.uk
2T.Bunnam2@newcastle.ac.uk
3Fei.Xia@newcastle.ac.uk
4Rishad.Shafik@newcastle.ac.uk
5Alex.Yakovlev@newcastle.ac.uk
6Shidhartha.Das@arm.com
Neural Networks (NNs) are steering a new generation
of artificial intelligence (AI) applications at the micro-
edge. Examples include wireless sensors, wearables
and cybernetic systems that collect data and process
them to support real-world decisions and controls.
For energy autonomy, these applications are typically
powered by energy harvesters. As harvesters and
other power sources which provide energy autonomy
inevitably have power variations, the circuits need to
robustly operate over a dynamic power envelope. In
other words, the NN hardware needs to be able to
function correctly under unpredictable and variable
supply voltages.
In this paper, we propose a novel NN design
approach using the principle of pulse width modulation
(PWM). PWM signals represent information with their
duty cycle values which may be made independent
of the voltages and frequencies of the carrier signals.
We design a PWM-based perceptron which can serve
as the fundamental building block for NNs, by
using an entirely new method of realising arithmetic
in the PWM domain. We analyse the proposed
approach building from a 3× 3 perceptron circuit
to a complex multi-layer NN. Using handwritten
character recognition as an exemplar of AI applications,
we demonstrate the power elasticity, resilience and
efficiency of the proposed NN design in the presence
of functional and parametric variations including
large voltage variations in the power supply.
c© The Authors. Published by the Royal Society under the terms of the
Creative Commons Attribution License http://creativecommons.org/licenses/
by/4.0/, which permits unrestricted use, provided the original author and
source are credited.
ar
X
iv
:1
91
0.
07
49
2v
1 
 [e
es
s.S
P]
  1
5 O
ct 
20
19
2rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
1. Introduction
Advances in sensing devices are causing a shift towards the fourth industrial revolution [1]. The
large volumes of the data produced by these devices are enabling a new generation of artificial
intelligence (AI) systems at the micro-edge that are designed to infer important decisions in
the real world [2]. A promising direction of these AI Systems is the leap towards perpetual
computability, allowing always available local AI service. To enable this, designers of pervasive
AI system are facing two grand challenges: energy efficiency and energy autonomy [3–6].
Energy efficiency refers to economising the energy consumption of elementary compute
operations. The aim is to prolong operating lifetime with a given energy budget, typically defined
by the batteries. Reducing energy requires careful design considerations at device-, circuit- and
system levels. Examples include reducing device geometry [7], scaling operating voltage [8] and
designing circuits with reduced or approximate logic [9].
New generations of pervasive AI-based systems require maintenance-free long-life. As such
traditional energy-efficient design principles applied in battery-operated systems are not feasible,
as they need periodic re-charging and replacements. Portable energy harvesters, which produce
electrical energy to supply to computation loads by scavenging energy from the environment,
are gradually making inroads. Such a scheme of energy harvesting can remove the need
of maintenance in favour of energy autonomy. However, mitigating their energy variations
needs computational capability over a dynamic power envelope, otherwise known as power
elasticity [10,11].
Despite advances in low-power design methodologies, the energy footprint of existing AI
systems, such as Neural Networks (NNs), has generally remained high [12]. Our persistence in
using arithmetic-heavy circuits with growing algorithmic complexities is a major contributor to
this. For instance, object detection using deep NNs may require a hundred to over ten thousand
times the energy needed by the traditional histogram of oriented gradient techniques [13]. Due
to such poor efficiency, the widespread adoption of energy-autonomous AI hardware at the
micro-edge has proven challenging [14].
To appreciate the importance of efficient AI hardware design, we show the example of a
perceptron, whose idea originates from Rosenblatt’s work of 1958 [15]. It is a basic building block
of NNs used in AI applications [16–18]. It consists of an input vector, a set of weights and a bias
to produce binary classification outcomes, as follows:
f(x) =
{
1, if w.x+ b > 0
0, otherwise
(1.1)
where w is a vector of real-valued weights, w.x is the dot product
∑m
i=1 wixi with m number
of inputs, and b is the bias. The process of deciding the appropriate weights (w), often also
known as training, serves as the basic principle of supervised learning. When m becomes large,
it approximates the behaviour of a biological neuron. Figure 1 shows the typical structure of a
perceptron [19,20]. At its core is an adder that sums mweighted inputs. The result of the addition
is compared with a reference during the training phase, during which the weights are updated
to ensure the reference is matched. For hardware implementation, multiplication and addition
are crucial arithmetic circuits in a perceptron [21]. Such arithmetic operations require significant
area and power costs, which depend on the number of input-weight pairs, the precision of the
multipliers/adders, their underlying technology nodes and algorithmic complexities.
Over the years, substantial research has been dedicated to improving the energy efficiency of
AI hardware [22]. A vast body of this research has predominantly remained within the remits
of Landauer’s logic boundaries for energy or power reduction [23]. Reducing threshold voltage
that defines the logic boundaries and designing new low-complexity architectures are key to
achieving this. Andri et al. [14] proposed a NN architecture that showed how high-performance
NN operations can be achieved by parallel logic blocks. These blocks are designed using low-
threshold technology nodes that are faster and ultra-low power. Prado et al. [24] showed a
3rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
in1
in2
inm
X
X
X
∑ Compare
+
Feedback
w1
w2
wm
reference
Figure 1. Structural organisation of a perceptron, which is the basic building block of NNs.
logic approximation method applied in parallel NNs. Due to low-complexity architecture the
individual components are faster and more energy-efficient. Among others, Qiqieh et al. [9]
proposed logic compression approaches for reducing power consumption, area and critical path
delay of NNs. By combining the circuit-level approaches with online system-wide techniques,
significant energy reduction was reported.
However, reducing power or energy alone using the above principles, does not solve the
problem of energy-autonomous pervasive AI systems [10]. These systems will need to be
able to not only work with limited power supplies but also survive extreme variations as
power regulation and energy storage options are limited and expensive in low-end micro-edge
devices [10,25]. Indeed, these systems will need to be built with natural power elasticity to operate
over a large power domain [10,25,26].
Existing perceptron designs are predominantly digital, although a number of analogue
implementations have been reported [27] [28]. The digital designs can operate over a range
of powers defined by paired voltages (Vdd) and frequencies (f ). These designs are however
vulnerable to dynamic power supply variations, for example conditions where voltage of the
power source changes in time and continuous Vdd and f pairing can prove expensive under
limited energy budgets. As such, existing designs have poor power elasticity that prevents them
from providing useful computation under unreliable or unstable power supply conditions.
In this paper, for the first time, we tackle the power-elastic AI hardware design. Our design
underpins a radically new approach of duty cycle based computing for pulse-width modulated
(PWM) signals using a number of parallel inverters. These building blocks are then integrated
as part of a higher level analysis method to support system-wide investigations in the context of
an exemplar application. Our key motivation to use the duty-cycle time-domain representation
of data is due to its potential fundamental resilience to dynamic variations in the amplitude and
frequency of the signal, which are inevitable for energy autonomous systems drawing energy
from the environment. The other motivating factor is the natural ability of CMOS logic to perform
multiplication and addition operation on the duty-cycled inputs. This is enabled by the inherent
effects of proportionally ratioed current switching in CMOS networks between P and N subnets
during the operational cycle of each gate. This gives way to implementing the PWM-based
compute functions directly on CMOS logic gates. Thus, one of the important goals we pursue
in this investigation is to verify our hypothesis that the combined use of PWM representation
and CMOS logic, with minimal use of additional analogue (ideally, only passive components)
electronics, will deliver the sought efficiency and robustness of AI hardware.
In more concrete terms, the aim of this paper is to design and demonstrate a voltage and
frequency elastic perceptron, which performs its arithmetic computation in the PWM-coded
format for robustness to energy supply variations. To this purpose, a method of modelling and
analysing such power and frequency elastic NN components also needs to be developed. The
main contributions are:
4rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
(i) a mixed-signal perceptron design using duty cycle-based temporal weight encoding and
input switching via inverters,
(ii) extensive validation experiments in Cadence Analog Design tool demonstrating the
perceptron design’s resilience in the presence of static or dynamic voltage and frequency
variations,
(iii) a mathematical model describing the input-output relations of the proposed perceptron
for system-level design and analysis to support its use in NN design, and
(iv) configurations, analyses and evaluations using an example PWM perceptron-based NN
which solves the MNIST handwritten digits classification.
This rest of the paper is organised as follows: Section 2 establishes the method of designing the
PWM based perceptron: the idea of the PWM to voltage conversion, performing the arithmetic
operations, and the implementation of the voltage to PWM converter. Section 3 validates the
approach using a number of parametric sweeps to demonstrate the frequency elasticity and
power resilience. Section 4 discusses the strengths and weaknesses of the proposed approach,
and the ideas of the future improvements. Finally, Section 5 concludes the paper.
2. Method
This section focuses on the design of the PWM-based perceptron, including the fundamental
theories, the circuits of its constituent parts, methods of PWM-based arithmetic, leading to the
construction of NNs. The design methods form the basis of extensive analysis supporting the
validation of the perceptrons integrated in a NN.
A perceptron capable of voltage and frequency elasticity may be constructed by exploiting
the fact that relative temporal properties, such as a PWM’s duty cycle, are resilient to voltage
and frequency variations. As the supply voltage reduces, any oscillatory activity, such as a clock
signal, may show a reduced amplitude and reduced frequency. However, the ratio between the
time within a period when the clock signal is high and the time within a period when the clock
signal is low stays the same as both would increase at the same rate.
Our method, therefore, is dedicated to finding ways of exploiting this fact by transferring
computation from the digital domain, which is affected by voltage and frequency variations, to
the relative temporal domain, which is not. This means making use of PWM-based techniques.
(a) Principles of Duty Cycle to Voltage Conversion
Figure 2 shows an inverter-based PWM to voltage converter, which produces an output voltage
whose value represents the value carried by the input PWM signal, i.e. its duty cycle. Here we
exploit the principle that if the input of an inverter is a periodic signal, such as a clock, the average
voltage on its output is inversely proportional to the duty cycle of the input signal. In other words,
the analogue average value of the inverter’s output voltage encodes the value of the duty cycle of
the input signal. Since an inverter is a digital component, whose output equals to logic ′0′ or ′1′
at any moment in time, it needs to be "analogised" (i.e. transcoded) in order to convert the input
duty cycle into the output voltage that is a corresponding proportion of the supply voltage. This
may be achieved by the following ways:
• increasing the input switching frequency,
• increasing the output capacitance,
• limiting the output current.
For the inverter-based PWM to voltage converter shown in Figure 2, with the input clock duty
cycle at 50%, the average output voltage is around Vdd/2 (Figure 3). This is due to the fact that
during the interval of time when the input is low the output capacitance is charged with current
from the power source via the PMOS transistor, and during the interval of input being high the
5rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
capacitance is discharged via the NMOS transistor. With a 50% duty cycle these two periods of
time are the same length and, assuming the transistors are balanced, their voltages average out to
half the supply voltage. When the duty cycle deviates from 50% the average value of the output
voltage deviates from Vdd/2 proportionally in the same direction.
Cout
in out
Rout
Figure 2. A CMOS-based inverter circuit.
0.0
0.9
1.6
V(in)
V(out)
2.5
Time
Figure 3. Inverter output with PWM-coded input.
If the frequency is high enough that the output capacitor is never fully charged or discharged,
the inverter may be equivalently represented as a resistive voltage divider (see Figure 4). The
output voltage of such a divider can be calculated using the following equation.
Vout = (Vdd −GND) · R
∗
n +R
∗
out
(R∗n +R∗out) + (R∗p +R∗out)
. (2.1)
where Rn and Rp are parasitic resistances of NMOS and PMOS transistors. During the charging
phase (tlow) the input of the inverter is low and current passes through the PMOS and the output
resistor. During the discharging phase (thigh), the input of the inverter is high and current goes
through the output resistor and NMOS. As a result, the resistance values can be calculated from
the lengths of time of each phase:
R∗n +R∗out = (Rn +Rout) ·
tlow + thigh
thigh
; (2.2)
R∗p +R∗out = (Rp +Rout) ·
tlow + thigh
tlow
. (2.3)
Assuming that Rn ≈Rp (this transistor balancing can be achieved by the appropriate relative
sizing of the PMOS and NMOS transistors, for instance, by setting the PMOS width to 2.7 times
the NMOS width for the UMC65nm technology) and GND= 0, the equation 2.1 is simplified to:
Vout = Vdd · tlowtlow + thigh
= Vdd · (1−DC), (2.4)
where DC is the input duty cycle - the ratio between the length of time when the input clock is
high during a clock period and the length of the clock period.
Figure 5 shows the relationship between the input duty cycle and the output voltage of the
PWM inverter. In the case when there is no output resistor, the dependency of the output voltage
on the input is not linear. The reason of this non-linearity is that the PMOS and NMOS resistances
change with the change of their drain voltages. Thus, Rp 6=Rn when the value of Vout is different
from Vdd/2. This non-linearity, given the arithmetic functional requirements of a perceptron, is
undesirable and needs to be either removed or compensated for. Compensation means very high
per-inverter overheads which need to be precise in the analogue domain. However, by adding an
output resistor Rout (Rp, Rn), the difference between PMOS and NMOS resistances no longer
affects the output, and the input duty cycle to output voltage relationship becomes completely
linear. This requires no high-precision tuning in the analogue domain.
6rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
out
Rout*
Rout*
Rp*
Rn*
Figure 4. PWM inverter equivalent circuit,
approximated as a voltage divider.
Figure 5. Output voltage of the PWM inverter vs input
duty cycle.
(b) PWM Arithmetic
A perceptron needs to perform arithmetic operations. Converting from PWM to voltage is not
the only function of the PWM inverters. They can also be used to construct arithmetic units, such
as adders and weighted accumulators. Below we discuss these two operations and their circuits
relevant to NNs.
PWM Adder The circuit of a PWM adder is shown in Figure 6. To add n PWM-coded numbers
we use n inverters connected in parallel. Each inverter has an output resistor. The result is stored
in the output capacitor in the form of its average voltage.
in1 in2
out
Cout
inn
Figure 6. PWM adder circuit performed by parallel inverters, with outputs connected via a capacitor.
This kind of adder works on the principle of current summation and charge (i.e. voltage)
accumulation. In other words, the values encoded in the input PWM signals are accumulated in
the voltage on the output capacitor, and such circuits can be called voltage accumulators (VACs).
To calculate the VAC output voltage, we use the principle of current summation and rewrite 2.1
7rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
using conductances instead of resistances. The following equation is for a single PWM inverter:
Vout = Vdd ·
G∗p
G∗p +G∗n
, (2.5)
where G∗p = 1R∗p+R∗out and G
∗
n =
1
R∗n+R
∗
out
.
Likewise equation 2.4 can be expressed as follows:
G∗p =G · tlow
tlow + thigh
=G · (1−DC), (2.6)
G∗n =G ·
thigh
tlow + thigh
=G ·DC, (2.7)
where G= 1Rp+Rout =
1
Rout+Rn
.
Since the inverters in Figure 6 are connected in parallel, the output voltage of a multi-inverter
VAC can be given by:
Vout = Vdd ·
∑n
i=1G
∗
pi∑n
i=1(G
∗
pi +G
∗
ni)
. (2.8)
Using equations 2.6 and 2.7, equation 2.8 can be simplified as:
Vout = Vdd · (1−
∑n
i=1DCi
n
). (2.9)
In simple terms, the output voltage of a multi-inverter VAC is inversely proportional to the
average value of the duty cycles of its inputs, which is exactly what is required.
in
out
Rout
enable
Figure 7. A single cell of the PWM weighted adder, based on a NAND gate.
Weighted PWMAccumulation In order to design a perceptron, the ability to integrate weighted
additions is another crucial design requirement. The VACs must be capable of programming
the input weights, when required. This is performed by replacing the inverters by two-input
NAND gates (Figure 7). One input of this gate is the PWM-coded signal, and the other is a digital
switch signal for enabling or disabling this cell. The output of a disabled cell is always connected
to Vdd having the same effect as an enabled cell with zero input duty cycle. In this way, the
perceptron can be programmed to determine which NAND gates participate in the accumulation.
This programming may be carried out in the digital domain without affecting the voltage and
frequency elasticity of the computation.
Figure 8 shows a perceptron arithmetic VAC architecture for 3× 3 weighted addition based
on these types of gates. As can be seen, the circuit adds 3 PWM-coded inputs multiplied by 3-bit
8rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
weights. Every weight bit is implemented on a separate cell. The least significant bit goes to the
cells with the smallest transistor sizes and the largest output resistors (cells ’×1’). The second bit
is computed at the cells with doubled transistor widths and halved output resistances (cells ’×2’).
And the most significant bit is coded with 4 times the transistor widths, and 1/4 times the output
resistances (cells ’×4’).
w11
out
Cout
w21
w31
w12
w22
w32
w13
w23
w33
in1
in2
in3
X1
X1
X1
X2
X2
X2
X4
X4
X4
Figure 8. PWM weighted addition VAC with 3 inputs and 3-bit weights.
The output voltage of the 3× 3 weighted addition VAC can be calculated using 2.9,
considering the ×2 and ×4 cells as 2 and 4 single cells respectively.
Vout = Vdd · (1−
∑n
i=1DCi ·Wi
n · (2k − 1) ). (2.10)
where n is the number of inputs, k is the number of bits of the weight, DCi is the duty cycle of
the input i, and Wi is the weight of the input i.
In the case of the 3× 3 weighted addition VAC, where n= 3 and k= 3, the output voltage is:
Vout = Vdd · (1−
∑3
i=1DCi ·Wi
21
). (2.11)
The arithmetic part of this equation is the weighted sum of duty cycles DCsum:
DCsum =
∑3
i=1DCi ·Wi
21
. (2.12)
Thus, the definition of the 3× 3 weighted addition VAC is that its output voltage is
proportional to the weighted sum of its input duty cycles, which is exactly as required:
Vout = Vdd · (1−DCsum). (2.13)
(c) Voltage to PWM Conversion
In order to design a perceptron based on the type of VAC described in Section 2(b), we need to
provide an output interface for it. The output of the perceptron must be used as an input for
the perceptrons of any subsequent layer in an NN. Therefore the output voltage of the PWM
arithmetic unit (its VAC) should be converted back to the PWM format.
The schematics of the voltage to PWM converter is shown in Figure 9. The converter circuit was
proposed originally by [29]. The converter is a ring oscillator with different power supplies: the
odd-numbered inverters are supplied with a voltage of Vdd/2, and the even-numbered inverters
are supplied with the input voltage, which is the output voltage of the VAC. The difference
between the supply voltages of the odd- and even-numbered inverters determines the output
9rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
duty cycle. If the input voltage equals Vdd/2, the inverters have equal delay and the output duty
cycle is 50%. If the input voltage increases, the period of switching from 0 to 1 increases, and the
output duty cycle goes down. If the input voltage is lower than Vdd/2, the switching from 1 to 0
takes more time, and the output duty cycle goes up.
Cout
Vdd/2
out
in
Figure 9. The ring oscillator based voltage to PWM converter.
Given that the VAC theoretically achieves a linear relationship between its input duty cycle and
its output analogue average voltage, the voltage to PWM converter should also ideally achieve
a linear conversion relationship. In that case, the overall relationship between the input duty
cycle signal and the output duty cycle signal would also be linear, for the simple case where the
perceptron is programmed to do no arithmetic processing. In theory, the inverter chain-based
voltage to PWM converter should be able to achieve this if the inverters are set to work in the
linear regions of their transistors.
(d) PWM-coded Perceptron Design
The PWM-based perceptron consists of two main parts. The first part is the PWM arithmetic unit
in the form of a VAC. This converts the PWM-coded inputs to a voltage which encodes the result
of the computation as programmed by the enable signals. The second part then converts this
voltage result to PWM format for use as inputs by subsequent perceptrons as their inputs.
3x3 
PWM 
adder
in1
in2
in3
weights
Voltage to 
PWM 
converter
out
Figure 10. Structure of the perceptron: PWM adder, voltage to PWM converter, and compensation transistor.
This structure is shown in Figure 10, with the 3× 3 weighted addition VAC as an example
PWM-based arithmetic unit. Any desired VAC arithmetic unit can be put in this place to satisfy
specific perceptron functionality requirements. The simple glue logic consisting of a PMOS
transistor between the two blocks will be discussed in detail in Section 3(a).
The size of such a perceptron is such that its design may be entirely analysed and validated
through simulations within the VLSI CAD environment in which it is implemented. At least some
of this analysis must be conducted in the analogue signal domain as the voltage signal between
10
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
the two parts of the perceptron holds the computation results in its analogue value. As a result,
simulations in a VLSI CAD tool environment that support mixed-signal or analogue studies are
the best way of analysing and validating such designs. In this work we implement our perceptron
and analyse it using the Cadence Analogue Design Environment. Detailed results will be shown
in Section 3(a).
(e) PWM-coded Neural Network Design
The proposed PWM perceptrons can be used in constructing traditional NNs such as the example
shown in Figure 11. In this NN, the input vector (in) is fed to the input layer, and the activity
propagates through a number of hidden layers to reach the output layer, where the output vector
(out) is generated. Then, the output vector is compared to the target vector and the error is back
propagated to update the weights of each layer using gradient descent. This procedure is iterated
with respect to the specified epoch.
Compare 
&
feedback
target
vector
w784
w2
w1 Σ1
Σ2
Σ784
1
2
784 w784
w2
w1 Σ1
Σ2
Σ10
1
2
10
output layerinput layer
W x D: 28 x 28 
784 pixels
[0, 1]
input
vector
ini1
ini2
ini784
ino1
ino2
ino784
DCsum_i1
DCsum_i784
DCsum_i2
hidden
layers
DCsum_o1
DCsum_o10
DCsum_o2
outo1
outo2
outo10
output
vector
PWM adder
Voltage to 
PWM converter
Figure 11. Neural network for MNIST. The DCsum signals are voltages. in and out signals are duty cycles.
In this work, the in and out signals are of the PWM-type. The value of such a signal, which
is between 0 and 1, is represented by its duty cycle value between 0% and 100%. The VAC
arithmetic units then compute on such in values. This is illustrated by the example described
by equation 2.12, where each in is multiplied by its weight and all results are accumulated by
the VAC in the DCsum voltages. In other words, the weight and sum blocks in Figure 11 are
implemented by the proposed perceptron’s VAC. Then, every DCsum, which is an analogue
voltage across a capacitor, is fed to the activation function (AF) whose output is in PWM format
to be used as the input of the next layer. To include the AF, equation 2.12 can be modified as
expressed in equation 2.14. This requires that the voltage to PWM conversion also implements
the AF. Potential modifications from the basic ring oscillator may be necessary, although the basic
ring oscillator already approximates a popular AF. This will be discussed in detail in Section 3(a).
Finally, out is obtained, the error is calculated and every weight is adjusted by the back-
propagation (BP) algorithm. The comparison to target vector is not necessarily implemented with
a perceptron-like device and may be implemented by some external controller, which is outside
the scope of this paper.
For the PWM-coded NNs, a number of design choices must be made: weight encoding,
maximum weight, AF and number of layers, among others. This section establishes a method
of making the best use of the proposed PWM-based perceptron to construct NNs to perform
specific computational tasks. We will explore aspects of NN design, including weight types, AF,
maximum weight and number of layers. We will use the well-known handwriting digit (MNIST)
classification problem [30], which is widely used for machine learning algorithm testing [31], as
11
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
the benchmark application and case study for this investigation. The goal is to suitably determine
the best NN configurations for the proposed PWM-coded NN.
out= f(DCsum) = f(
∑n
i=1 ini ·Wi
n · (2k − 1) ) (2.14)
IntegerWeight and Training Regarding the circuit design, the weight is discretised to an integer
value. This is different from most related work where floating-point (FP) numbers are used for
weights. As the circuit size depends on the bit-width of the weight, it is crucial to find the smallest
bit-width that still provides the specified error rate tolerance.
The integer weight training can be designed as illustrated in Figure 12. The MNIST input vector
(in) is multiplied by the integer weight (W ) and the results are accumulated as out. Then, out is
divided by n · (2k − 1) (i.e. normalising), which yields the final value of out between 0 and 1.
Consequently, out is scaled to the same range and comparable to the target vector. Then, out
passes the AF, and the output vector is obtained and compared to the target vector. Next, the
FP update is computed from the gradient descent, the learning rate, and the error. To adjust the
integer weight, the update is scaled back to the integer number by multiplying by n · (2k − 1)
and rounding. Next, the integer weight is updated and capped if it exceeds the specified bit-
width (e.g. the example 3× 3 weighted addition VAC in Figure 8 has 3-bit weights). Finally, the
training process iterates until the number of specified epochs is reached. Note that the weight
capping can be disabled to allow unlimited weight adjustments to mimic FP training.
÷n·(2k-1)
× n·(2k-1)
int
weight
input 
vector
×
round
int
compare
target
vector
gradient 
descent
output
vector
activation 
function
weight
capping
weight
update
float
Figure 12. Integer weight training.
Activation Function The AF is necessary in an NN-based learning process because it provides
non-linearity to the computation so that the learning is not limited to linear problems. It also helps
map the resulting values in a certain range, depending on the function.
In this work, the input and output ranges of the AF are a main concern because they need to
match the output format of the problem and the circuit behaviour. In other words, depending on
the purpose of the NN, it may expect its input and output variables to take values within certain
ranges. These ranges then need to be mapped onto the working signal range of our perceptron,
which is restricted by the duty-cycle representation between 0% and 100%. Here we take popular
MNIST benchmark [32] as an exemplar to explore this aspect of NN design using our perceptron
as the basic building block.
In the context of MNIST, the AFs are needed to provide a fully positive output to comply with
the target vector [30]. Also, our perceptron design stores the VAC result as the voltage across
Cout between its two blocks, which means that the DCsum signals are entirely positive voltages.
And such a voltage gets converted to a PWM duty cycle, which is also entirely positive. For these
12
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
reasons, the well-known AF ReLU [16,33], which has an entirely positive output range, is best
suited.
Certain other popular AFs are less suitable for this initial investigation. For instance, the
sigmoid function is clearly non-linear across an input range between -5 and 5 [16], which requires
representation of negative values. The non-linearity also means that major modifications to the
voltage to PWM part need to be investigated for implementing such AFs. hence, we decided to
concentrate on trying to mimic the ReLU AF using our perceptron’s voltage to PWM converter.
The ReLU function in equation 2.15 [33] is depicted in Figure 13. One of its attractions is that
it is easily differentiable, facilitating gradient descent. To mimic the output of the VAC, it is better
than the sigmoid function because the charge in the output capacitor (Cout) is emptied when the
VAC result is negative. Otherwise, the capacitor is charged and the positive result is obtained.
However, the output of this function must be capped at 1 to represent the limit of the PWM
range as shown in equation 2.16 and Figure 14. This work will attempt to construct an AF that
approximates the capped ReLU function.
The size of an entire NN designed for the MNIST problem is such that it is not possible to
analyse it entirely within a VLSI CAD environment. For instance, to analyse an image of 784
pixels (cf. the example in Fig. 11) there need to be 784 perceptrons in the first layer of the NN
alone and this is clearly beyond analogue simulations at the VLSI level. Effort must be expended
in building models in a higher-level language to investigate the design properly.
f(x) =
{
0 , x < 0
x , x > 0
(2.15)
f(x) =

0 , x < 0
x , 0<x< 1
1 , x > 1
(2.16)
Figure 13. ReLU function. Figure 14. Capped ReLU function.
3. Results
This section reports experimental results of the PWM-based perceptron, leading to an NN
architecture. These results validate the design methods at both circuit- and architecture-level.
13
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
(a) Analysis and Validation of PWM-coded Perceptron
A prototype circuit of the PWM perceptron is designed using UMC65nm technology and
simulated in the Cadence Analog Design Environment tool1. We used the high voltage transistors
(with 2.5V nominal voltage) in purpose of better observation. Below we analyse the behaviour of
the perceptron circuit under different parametric variations, generated by the design tool.
1. VAC Validation The first constituent part of the perceptron is the VAC. Figure 15 shows
the charging of the capacitor in the VAC based on three inverters connected in parallel as
shown if Figure 6. The frequencies and duty cycles of the inputs are: f1 = 140MHz, DC1 = 70%,
f1 = 120MHz, DC1 = 30%, f1 = 100MHz, DC1 = 50%. The capacitor have been charged to the
voltage value, proportional to the average duty cycle of the inputs. The charging time of the
capacitor depends on the RC value, and the input frequency does not affect it. However, if
the frequency is too low, it may result in too high ripple of the output voltage, and, thereafter,
reduction of accuracy.
0
2.5
0
2.5
0
2.5
0
0.4
0.8
1.3
time (ns)
0.0 20.0 40.0 60.0 80.0 100.0 120.0 150.0
V(in1)
V(in2)
V(in3)
V(out)
Figure 15. Capacitor charging in the 3 inverters VAC.
To support our VAC design based on inverters/NANDs and voltage summation on a capacitor,
we implemented the 3× 3 weighted addition VAC shown in Figure 8 in Cadence and ran
simulation experiments on it. The results of these simulations are compared to theoretical
results obtained from 2.11 and compared in Table 1. The differences between the theoretical and
simulation results do not exceed 10%. These results validate the correctness of the PWM-based
weighted addition VAC design.
2. Validation of Voltage to PWM Conversion The second constituent part of the proposed
perceptron is the voltage to PWM converter. This converts the result of VAC arithmetic
1URL: https://tinyurl.com/y6k73k4t
14
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
Table 1. Experimental and theoretical results of the 3× 3 weighted adder.
DC1 W1 DC2 W2 DC3 W3
Vout Vout
theoretical simulation
70% 7 80% 7 90% 7 0.50V 0.51V
50% 1 50% 2 50% 4 2.08V 2.11V
20% 5 60% 6 80% 7 1.29V 1.33V
95% 7 90% 6 80% 6 0.50V 0.45V
30% 1 40% 4 50% 2 2.16V 2.21V
80% 7 20% 3 50% 4 1.54V 1.61V
computation stored as an analogue voltage (a DCsum signal) back to the PWM format for
output to subsequent perceptrons, as presented in Section 2(c). The Cadence Analog Environment
simulation results of the voltage to PWM converter is shown in Figure 16. Ideally, the voltage to
PWM conversion should be linear. The real relationship between the output and the input is
almost linear for input voltages between 0.7V and 2.3V . However, outside this range the ring
oscillator stops oscillating. The reason for this will be discussed below.
Another interesting effect is that the linearity of the output increases with increasing the
number of inverters, but the difference between 9 and 13 inverters is small. Thus, a chain of 9
inverters should be considered a reasonable voltage to PWM converter and we use this design in
our subsequent studies.
Input Voltage, (V)
0 0.5 1 1.5 2 2.5
O
ut
pu
t
du
ty
cy
cl
e,
(%
)
0
20
40
60
80
100
5 inverters
9 inverters
13 inverters
Figure 16. Output duty cycle of the voltage to PWM
converter.
Duty cycle in, (%)
20 30 40 50 60 70
D
ut
y
cy
cl
e
ou
t,
(%
)
0
20
40
60
80
100
Vdd=1.8V
Vdd=2.5V
Vdd=3.3V
ideal
Figure 17. Output vs input duty cycle of the
perceptron.
3. Perceptron Experimental Results and Design Adjustments Figure 17 shows the combined
operation of both parts of the perceptron: the 3× 3 weighted PWM addition VAC (Figure 8)
connected to the voltage to PWM converter (Figure 9). The three inputs of the perceptron are
connected together, and all the weights are 7 (all the cells are enabled). The line labelled ’ideal’ is
obtained through the equations in Section 2. Analysing these results we can say that:
• In this simulation for the ideal case we expect the output duty cycle to be equal to the
input duty cycle. However, the real output is slightly different from the ideal; and this
difference increases with the input duty cycle above 50%.
• The output duty cycles for different supply voltages are similar. The difference does not
exceed 10%. This indicates voltage variation resilience in the perceptron design.
• The input duty cycle has limited range - from 20% to 70%. Beyond this range the output
stops oscillating and becomes a constant signal.
15
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
Input duty cycle, (%)
20 40 60 80
O
ut
pu
t
du
ty
cy
cl
e,
(%
)
20
40
60
80
100
without compensation
with compensation
ideal
Figure 18. Output vs input duty cycle of the perceptron with and without compensation.
The observed reduction of operational range and loss of linearity in the voltage to PWM
converter are caused by the fact that the voltage DCsum powers the voltage to PWM converter.
When the input duty cycle is above 70%, DCsum is below 30% of Vdd. For V dd= 2.5V this is
below the threshold voltage. And in this case the NMOS transistors of the ring oscillator are
always off, and the output stops oscillating. In other words, there is a mismatch between the
voltage ranges of the two parts of the perceptron. The output voltage range of the PWM weighted
addition VAC is from 0V to 2.5V (Figure 5); and the input voltage range of the voltage to PWM
converter is from 0.7V to 2.3V (Figure 16).
We may limit the range of the output voltage of the PWM weighted addition VAC. This can
be done by adding a small glue logic between the two blocks of the perceptron. This may consist
of no more than a compensation PMOS transistor, whose gate and drain are connected to the
capacitor as shown in Figure 10. In this case, when the voltage on the capacitor goes below the
threshold, the PMOS starts charging this capacitor, and when the voltage is above the threshold,
the PMOS is off.
The input and output duty cycles of the perceptron with compensation are depicted in
Figure 18. The output is closer to the ideal, and its range is much wider: from 10% to 90%.
4. Power Elasticity and Resilience To demonstrate the perceptron’s resilience to power
variations we simulated the the 3× 3 PWM-based weighted addition VAC circuit (Figure 8) with
different values of supply voltage and input signal amplitude. The results are shown in Figure 19.
As can be seen, the output voltage grows almost linearly with increased Vdd. As expected, higher
duty cycle show lower output voltages, and vice versa. In the case of the unstable supply voltage,
the absolute value of the output voltage does not bear any reliable information. In this case, we
should consider the relative relationship between the output voltage and the supply voltage.
This relationship should be proportional to the input duty cycle independently from Vdd. This
is demonstrated by Figure 20 where the y axis represents not the absolute value of Vout, but the
ratio between Vout and Vdd that is more relevant for unstable power conditions.
The circuit shows high resilience to static supply voltage variations. Starting from 1 - 1.5V the
ratio Vout and Vdd remains the same for each duty cycle value of the input signal.
Further simulation experiments are carried out to investigate the VAC’s resilience to static
frequency variations. Two sizes of the 3× 3 VAC are investigated: the small - with the output
capacitor Cout = 10pF and the output resistors of each cell Rout = 100KΩ; and the large - with
Cout = 100pF and Rout = 1MΩ. The duty cycle of all the inputs is 50%, and all the weights
equal to 7 (all the cells are enabled). Figure 21 shows that both VACs produce the output 1.25V ,
that equals to Vdd/2. The average output voltage remains the same on the simulated range of
frequencies: from 1kHz to 1GHz.
16
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
Supply voltage Vdd, (V)
1 2 3 4 5
Vo
ut
,
(V
)
0
1
2
3
4
DC=25%
DC=50%
DC=75%
Figure 19. Output voltage (absolute values) vs static
variation of power supply.
Supply voltage Vdd, (V)
1 2 3 4 5
Vo
ut
/
V
dd
0
0.2
0.4
0.6
0.8
1
DC=25%
DC=50%
DC=75%
Figure 20. Output voltage (relative values) vs static
variation of power supply.
Input frequency, (Hz)
104 106 108
Av
er
ag
e
ou
tp
ut
vo
lta
ge
,
(V
)
0
0.5
1
1.5
2
2.5
small VAC
large VAC
Figure 21. Output voltage vs static variation of input
frequency.
Input frequency, (Hz)
104 106 108
O
ut
pu
t
vo
lta
ge
sw
in
g,
(V
)
0
0.5
1
1.5
2
2.5
small VAC
large VAC
Figure 22. Output voltage swing vs frequency.
On the other hand, the value ofCout does affect other aspects of perceptron performance.Cout
contributes to the RC time constant of the VAC circuit, providing a low-pass filter effect on the
voltage DCsum. As a result, a larger Cout is less suitable than a smaller Cout for fast response,
but would provide better robustness in the presence of frequency variations. In addition, as
the voltage to PWM converter depends on the charge on Cout for energy, a smaller Cout may
encounter difficulties in keeping DCsum constant enough to complete the conversion.
Figure 22 shows the DCsum voltage swing in the presence of static frequency variations. As
can be seen, with reduced input frequency the voltage swing increases, and at some point the VAC
operates as a simple inverter with the output voltage DCsum oscillating between Vdd and GND.
Ideally we would like the voltage swing to be not larger than 0.2V . It means the the frequency of
the input PWM signals should not be lower than 1MHz for the large VAC and 100MHz for the
small VAC.
In addition, Figure 23 shows that VAC size and frequency also affect power consumption. The
small VAC has higher power consumption. This is due to the output resistor limiting the charging
current. The resistor is 10× larger in the large VAC, and the current and the power consumption
are smaller.
In the large VAC we increase the size of Cout and reduce the charging current. This increases
the charging time of the capacitor. To investigate this we simulated the time when the voltage on
Cout reaches the average output value (which is Vdd/2 = 1.25V for the 50% input duty cycle). The
capacitor is initially charged to Vdd = 2.5V . The charging time of the capacitor is around 0.14µs
for the small VAC and 14.5µs for the large VAC, which is true for the entire range of frequencies.
17
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
Figure 23. Power vs frequency of the 3x3 VAC.
This ∼ 100× ratio is because the RC product is 100× as large for the large VAC as for the small
VAC.
1.7
2.1
2.5
2.9
3.3
10
30
50
70
0.3
0.5
0.7
0.9
0.0 5.0 10.0 15.0 20.0
V
dd
(V
)
V
ca
p/
V
dd
D
C
ou
t(
%
)
Time(us)
Region A Region B
Observation 1
Observation 2
Figure 24. The operation of the perceptron with dynamic power supply voltage variations using an AC supply with 100kHz
frequency (for illustration purposes only).
Figure 24 shows the operation of the perceptron with dynamic supply voltage variations to
investigate the dynamic power elasticity. The simulations have the following parameters: the cell
output resistance Rout = 100KΩ; the VAC output capacitor Cout = 100pF ; the supply voltage
varies from 1.8V to 3.2V with a period of 10µs; input duty cycle DC1 =DC2 =DC3 = 50%; the
weights are W1 =W2 =W3 = 7 in region A, and then change to W1 =W2 =W3 = 2 in region B.
This simulation illustrates the behaviour of the perceptron under very rapid voltage variations
at 100kHz frequency (for illustration purposes only). The time period of this change is the
same value as the system time constant RC = 10µs. The voltage value swing is also very
large - a variation amplitude of 1.4V for a voltage whose nominal value is 2.5V (also used for
demonstrating extreme variations). Even under these extreme conditions Vcap/Vdd still maintains
18
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
a high degree of resilience. After putting the VAC together with the voltage to PWM converter,
however, the combination fares less well, with the output duty cycle changing up to 47% in
Region B (Observation 2). This may be mitigated by improving the compensation mechanism
in the voltage to PWM conversion circuit (see Section 2) or by in-situ voltage regulation (not
discussed in this paper for brevity).
(b) Validation and Analysis of PWM-coded Neural Network
This section explores an NN system built using the proposed PWM-based perceptron. This NN is
designed for solving the MNIST problem and has the structure shown in Figure 11. Firstly a high-
level model of the perceptron is constructed so that analysis can be carried out in MATLAB, at a
higher level than analogue VLSI simulations, which is impractical for systems of this size. Then
this model is used in MATLAB investigations on system properties to validate our NN-design
approach.
1. PWM Model and Voltage to PWM Converter Serving as ReLU AF In this section, the
duty cycle output of the perceptron is modelled in the form of a mathematical equation with
parameters. Then, the model and the voltage to PWM converter itself are studied to verify that
the device approximately incorporates the capped ReLU AF.
The equations in Section 2(e) pertain to ideal cases. These can be used for comparing with how
the implemented perceptron actually delivers. In order to make this comparison at the whole
system level, we need to generate a high-level mathematical model based on observations made
whilst experimenting with the perceptron circuit at low level.
We experimented in the Cadence Analog Design Environment with a single perceptron, two
perceptrons connected in series, mimicking the simplest two-layer NN, and three perceptrons
connected in series, emulating the simplest NN with a depth of 3. This is as far as analogue VLSI
simulations could practically go, as the three-layer study took many hours on a competitively
specified server machine.
The outputs of these perceptron connection topologies are shown in Figure 25. In the ideal
case, the input and output duty cycles should be equal when every weight is at the maximum
value (dashed line). However, there is a non-linear relationship between the input and output of
the single perceptron (red line) and the degree of non-linearity increases when the depth of the
NN is increased (blue and green lines). In addition, the output begins to saturate in the last (third)
stage when the input (DCsum) reaches 0.85.
To model this relationship, a third-order polynomial equation, which is easy to differentiate, is
curve-fitted to the response of the single perceptron using basic regression in MATLAB. The result
is shown in equation 3.1. Note that the saturation point of the model is set at the maximum output
duty cycle, which is 98%. Then, the model is connected in the same two- and three-stage series
topologies as in the Cadence experiments and their outputs are plotted in Figure 26 - 28, together
with the relevant Cadence results for comparison. All figures show that this model accurately
estimates the input-output relationship of the perceptron. The accuracy can be obtained as the
R-squared values of stages one, two and three, which are 99.88%, 99.33% and 97.66% respectively.
DCout = 107.27V
3
Cout − 53.25V 2Cout + 52.92VCout + 13.44 (3.1)
f(x) =

0 , x < 0
DCout , 0<x< 1
0.98 , x > 0.98
(3.2)
2. PWM-based NN Simulations We use the perceptron model in equation 3.1 to assemble
models of large-size MNIST NNs then simulate these systems in full in MATLAB. The model
plot is shown in Fig. 29. We also create a capped ReLU function with offset (Oft.ReLU), expressed
19
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
Figure 25. PWM output. Figure 26. Output vs model stage 1.
Figure 27. Output vs model stage 2. Figure 28. Output vs model stage 3.
Figure 29. PWM perceptron output model function. Figure 30. Capped ReLU function with PWM-like
offset.
in equation 3.3. As can be seen in the equation, this offset ReLU function takes the constant 13.44
from the perceptron model in equation 3.1 to have the same offset nonlinearity as the perceptron
model in Fig. 29, but otherwise has a similar straight line behaviour to the non-offset capped
ReLU in Fig. 14. The plot of this offset ReLU can be found in Fig. 30.
20
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
Table 2. Simulation result of the floating-point weight neural network.
No.
No. Activation Learning
Error
Perceptron Function Rate
1 784/10 ReLU 0.010 1.40
2 784/10 Cap.ReLU 0.008 1.75
3 784/10 Oft.ReLU 0.009 5.09
4 784/10 PWM percept. 0.004 8.54
5 784/300/10 ReLU 0.040 1.63
6 784/300/10 Cap.ReLU 0.009 1.91
7 784/300/10 Oft.ReLU 0.002 79.54
8 784/300/10 PWM percept. 0.004 27.01
9 784/300/100/10 ReLU 0.040 2.07
10 784/300/100/10 Cap.ReLU 0.010 3.60
11 784/300/100/10 Oft.ReLU 0.010 90.20
12 784/300/100/10 PWM percept. 0.090 79.07
This function is used to investigate whether the step nonlinearity or the curvature nonlinearity
higher in the curve of equation 3.1 is more important when it comes to NN performance, through
comparisons with both the perceptron model in equation 3.1 and the capped ReLU function
without offset in Fig. 30.
f(x) =

0 , x < 0
DCout + 13.44 , 0<x< 1
1 , x > 0.86.56
(3.3)
There are two groups of simulations using MATLAB: without/with limiting the maximum
weight. All implement the training procedure described in Figure 12 for the MNIST problem,
which is selected as our benchmark. Without defining the maximum weight, the weight is
adjusted freely like the basic FP training while the proposed NN is demonstrated by the
limited weight simulation. Four AFs: ReLU, capped ReLU (Cap.ReLU), capped ReLU with offset
(Oft.ReLU) and PWM perceptron (PWM percept.) are applied in three network configurations:
two (784/10), three (784/300/10) and four (784/300/100/10) layers. The PWM perceptron AF is
implemented by the PWM perceptron on its own unmodified - the justification being that it may
be considered as an approximation of the capped ReLU (cf. Figure 29 and Figure 14).
As can be seen from this data, these systems being simulated include hundreds of perceptrons
and are well beyond analysing in the VLSI design domain.
For the unlimited weight simulation, the learning rate is swept from 0.001 to 0.1 for every AF.
The configurations with the smallest error are listed in in Table 2. The limited weight simulation
is carried out in the same way except that the initial weight is swept from ±1 to ±255. This is
because that a small weight causes a small update which can keep the final weight within the
specified range. Then, the configurations with higher than 90% accuracy are selected to sweep
their maximum weights down until the accuracy is nearly equal to 90%. This is to save the circuit
area by using the smallest bit-width. The simulation results are listed in Table 3.
4. Discussions
This section discusses the results, and highlights the challenges and opportunities in the
proposed approach. The section is organised hierarchically from perceptron circuit design to
NN experiments validating through the MNIST benchmark application. Towards the end of the
21
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
Table 3. Simulation result of the integer weight neural network.
No.
No. Activation Learning Initial Max
Error
Perceptron Function Rate Weight Weight
1 784/10 ReLU 0.030 ±3 ±31 9.28
2 784/10 Cap.ReLU 0.040 ±3 ±63 6.12
3 784/10 Oft.ReLU 0.004 ±7 ±255 7.10
4 784/10 PWM percept. 0.030 ±1 ±255 9.98
5 784/300/10 ReLU 0.020 ±255 ±255 79.49
6 784/300/10 Cap.ReLU 0.020 ±255 ±255 79.49
7 784/300/10 Oft.ReLU 0.020 ±3 ±255 18.35
8 784/300/10 PWM percept. 0.010 ±15 ±255 25.17
9 784/300/100/10 ReLU 0.010 ±31 ±255 88.50
10 784/300/100/10 Cap.ReLU 0.010 ±31 ±255 88.50
11 784/300/100/10 Oft.ReLU 0.020 ±127 ±255 64.09
12 784/300/100/10 PWM percept. 0.010 ±63 ±255 53.25
section, we relate to our original hypothesis and summarise the key features of the proposed
design approach.
(a) PWM-coded Perceptron
The design of the PWM-based VAC is shown in Figure 20 to have satisfied its main design aim,
which is to provide resilience in the presence of power supply uncertainty. Adopting a PWM-
based approach in order to transfer computation from the digital domain to the relative temporal
domain resulted in a device which is essentially independent of the value of the supply Vdd.
The use of an analogue voltage across a capacitor to represent the result of perceptron
arithmetic computations allowed the use of the simplest digital gate, the inverter, to be the
fundamental building block for both parts of the perceptron. Programmability, required for the
FP functionality of NNs, for instance, can then be realised by using the next simplest digital gate,
a two-input NAND gate. This results in an approach which implements digital computations
using the smallest digital components working in the relative temporal domain on analogue
values. Trading potential loss of precision for power resilience in this way is acceptable for NN
applications, as they tend to be accuracy resilient at the point of any particular perceptron. The
reduction of circuit complexity and avoidance of conventional multipliers and adders should also
contribute to savings in both circuit size and energy consumption.
Both parts of the perceptron have been shown to have acceptable quality, including being
reasonably linear within their operating ranges. However, after putting both together, the
perceptron as a whole has certain range problems because the VAC output voltage may be outside
the linear region of transistors in the voltage to PWM converter, which is a design assumption for
that part. A single-transistor glue logic is then shown to help mitigate this problem.
The entire perceptron, however, shows weaker power supply variation resilience than the VAC
on its own, under extreme dynamic Vdd variation conditions. This is primarily because of the
following two factors:
• as can be seen from Figure 9, the oscillator used to convert the voltage on the capacitor
between the two parts of the perceptron (DCsum) to PWM (DCout) draws different
amounts of power from the capacitor under different values of Vdd, and
• the simple glue logic in the form of a PMOS transistor does not fully compensate for
this because its main function is to keep the current going once the voltage across the
capacitor DCsum gets down to threshold.
22
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
Computationally, if the correct value of DCsum should be below threshold, which is entirely
possible coming from the VAC, this glue logic will cause an error in that value by keepingDCsum
at threshold, leading to DCout becoming inaccurate. This is preferable to having the voltage to
PWM part dying but computation correctness is still lost.
The above discussion is supported by observing the behaviour shown in Figure 24. The
most likely places for errors to appear are when Vdd becomes low. In only some of these cases
(Observation 2 but not Observation 1) the computation in the VAC would lead to a low DCsum
value which might dip below threshold. This will cause the output duty cycle DCout to become
incorrect. During Observation 1, although Vdd dips low, the computation because of the weights
etc. produces a high relative value result that manages to keep DCsum above threshold. This is
not the case during Observation 2. On the other hand, if Vdd itself reduces below threshold, no
matter what value the VAC produces the voltage to PWM part will produce relatively large errors
as the perceptron’s glue logic would kick in anyway.
Also of concern is the accumulation of non-linearity after both parts of the perceptron have
been put together, and this non-linearity continues to increase once the perceptron is connected
in series across multiple stages of an NN of non-trivial depth.
Another issue is the fact that there may be difficulties for the perceptron to implement AFs
other than flavours of ReLU. Of particular concern is that the perceptron, because of the use of
the voltage across a grounded capacitor to represent a crucial value, only works in the positive
value domain.
Future research topics include the better matching between the constituent parts of the
perceptron to overcome the threshold and non-linearity problems and the extension of the
perceptron to cover a larger set of AFs.
(b) PWM-coded Neural Network
In the single perceptron, two-perceptron and three-perceptron experiments, both the resultant
model and the Cadence simulations indicate that the voltage to PWM converter, without
modifications, may serve as an approximate capped ReLU AF, qualitatively. In addition, the
three-perceptron, three-stage full analogue simulation analysis shows that the single-perceptron
MATLAB model can be used in multi-stage system analysis without worrying about the fidelity
of high-level MATLAB models when multiple layers of perceptrons exist in a system.
Quantitatively, however, the use of a nonlinear perceptron to approximate linear behaviour
becomes increasingly problematic when the depth of the network increases, as the non-linearity
accumulates. This is shown to be true by the subsequent whole-NN experiments.
The unlimited weight simulation result in Table 2 gives us traditional NN examples which
contain FP weights. It shows that both ReLU and capped ReLU functions give less than 4% errors
at every depth. The results for the PWM perceptron AF are similar to those from the capped
ReLU with offset. They obtain small error rates at the shallowest NN depth, while the capped
ReLU without offset outperforms all others at every depth. This confirms that it is mainly the step
transition atDCsum = 0 that causes the convergence problem in our NN, more than the curvature
nonlinearity higher in the curve of Fig. 29. Therefore, compensating the circuit to shift the output
duty cycle back to 0% appears to be a promising route of investigation. This will be a subject in
our future work.
Table 3 shows the results with weight limitations. All results at two-layer NN are worse than
the ones in Table 2 due to the weight capping and rounding, except for the PWM perceptron,
which does better. The PWM perceptron continues to perform better at higher layer depths than in
the unlimited weight case, confirming an advantage for it when weight is limited and represented
by an integer. However, it again fails to approximate the ReLU function quantitatively at higher
layer depths, by returning obviously better performances than the latter.
These results confirm the discussions in the previous section. Our proposed PWM-based
perceptron’s non-linearity as well as not being able to properly extend to the low-percentage
range of the PWM duty cycle (it starts from ∼ 13% rather than 0% as shown in Figs. 26 and 29)
23
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
Table 4. Performance comparison.
Work
Weight
MNIST
NN
Error
Power Power
Hardware
type conf. (µW ) elastic
[36] integer quantized WAGE 0.4 n/a N MCU
[34] integer binarized MLP <3% n/a N n/a
[37] n/a
reduced &
MLP 13.5%
53,000
N
memristor
quantized (NN) crossbar
[31] fixed-point original MLP ∼5% 14,800 N Spec. CMOS
(NN)
This integer original MLP <10%
14-1,080
Y Std. CMOS
(VAC)
makes it less suitable for deeper NNs. Its lack of support for negative values also limits its wider
usability as the fundamental element of NNs, without further modification to better incorporate
established AFs. The approximation of ReLU, although qualitatively promising, proves to be
quantitatively unsatisfactory at higher NN depths, although in some cases this results in the
perceptron’s AF being better than the ReLU AF. The high error rate also comes from the resolution
loss in basic weight rounding which may be solved by implementing a rounding technique and
PF inference quantization presented in [34] and [35] respectively.
In other words, even if the negative value representation problem is solved, computing AFs
in the analogue and relative temporal domains remains a challenge that must be solved. As a
result, a future work direction is the incorporation of more general arithmetic operations in these
domains, which is needed to improve the accuracy of AF implementations.
A related and interesting unsolved problem is the ’comparing with target vector’ function in
Figure 11, which is currently relegated to external controllers. It is a duty cycle in and digital out
block and can potentially be designed by extending the methods in this work.
Table 4 summarises our design compared to related work. The work in [36] quantizes the
entire NN and yields the lowest error. However, it is designed for a digital-based processing
unit which contains the CPU-memory bottleneck issue implying extra power budget and latency.
Furthermore, real power measurement is missing as it estimates the power consumption from
the literature. Weight rounding methods are proposed in [34]. Although they achieve the second
lowest error, they require binarized input data which is a challenge for analogue applications.
Moreover, it does not include an investigation of hardware implementation. The memristor
crossbar NN in [37] acquires the lowest accuracy with the highest power consumption even
its input image size is reduced. Charge trap transistor-based NN which performs MNIST
classification from the original data is presented in [31]. It mainly aims to save power and requires
a specific CMOS technology to fabricate the special transistors.
Even though the error of our design is higher, it is still within the same order of magnitude,
and we have yet to investigate more sophisticated techniques for compensating the voltage to
PWM part to improve the duty cycle coverage and eliminate the step nonlinearity at DCsum = 0,
which promises to reduce error. We also do not investigate beyond standard CMOS technology
as that is out of scope for this investigation. Our focus is on tolerating unstable and unpredictable
power supply voltages, a necessity for energy autonomous AI devices. In this regard, this solution
is unique as existing research in the literature invariably requires stable and known voltages
and operating frequencies. Note that the power figures in Table 4 are not directly comparable.
The figure for [34] is obtained from measurements carried out on a fabricated chip at the 28nm
technology node with special non-CMOS transistor techniques aiming to showcase the advantage
of that technology in low-power operations, whilst that for this work is obtained from simulating
one VAC at the 65nm technology node with a deliberately high (2.5V) nominal supply voltage
to facilitate studying voltage instability scenarios. This is similar to [37] where the circuit is
24
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
implemented with non-CMOS technology. A fair power consumption comparison with [34]
and [37] is not yet possible without fabricating and testing real chips, preferably at the same
VLSI technology node, as whole-NN simulations at the VLSI level is not practical. In addition,
the power figures for [34] and [37] are themselves not comparable with each other as [37] covers
the entire system including peripherals and [34] covers the NN engine only.
(c) Overall Summary
Our design methods, supported by extensive analysis and validations, have proven the original
hypothesis, and demonstrated the following features:
• power elasticity and resilience across a dynamic range of Vdd (statistically varying by 5×)
and f (statistically varying by up to 6 orders of magnitude). Such elasticity is achieved
for the VAC without requiring any voltage regulator circuit and clock pairing between
Vdd and f . Dynamic power supply variations also show good resilience. However,
further compensation will be required at lower voltages to avoid large errors at the
whole-perceptron level;
• minimal use of additional analogue (ideally, only passive components) electronics,
coupled with low-complexity PWM-coded arithmetic using primarily digital components,
making our approach highly power efficient and suitable for low-cost fabrication;
• extensive validation and analysis using multi-layer PWM-coded NNs show good
scalability of the proposed approach; however, deeper NNs may need circuit-level
compensation after each layer or high-precision representation techniques to improve
the overall accuracy and efficiency.
5. Conclusions
We propose the first mixed-signal (analogue/digital/relative temporal) perceptron design using
the principles of PWM. Central to our design are a number of parallel inverters that suitably
transcode the input-weight pairs from the spatial domain to the relative temporal domain. This
approach aims to deliver high resilience to amplitude and frequency variations in the supply
voltage, exploiting the fact that PWM-based solutions are typically agnostic to such variations.
Another advantage of the proposed design is its simplicity. Whilst conventional
implementations of the perceptron require complex logic to perform multiplication and addition,
the proposed approach uses only one gate (either an inverter or a two-input NAND) per bit
for every input. Thus, for the 3× 3 weighted addition VAC we used only 54 transistors. This
significantly reduces the logic requirement and, therefore, the power consumption of the entire
device.
Extensive experimentation on the perceptron and its use in neural networks of relatively
significant sizes helps to explore the perceptron design’s advantages, usability and limitations.
Also through experimental studies, design improvements are found which further strengthen the
perceptron’s case. These experimental explorations also lead to further insights into the design
and provide guidance on potential future work.
The perceptron’s arithmetic unit design is shown to fully accomplish its design aim of
power and frequency resilience. It is also shown to be working within reasonable boundaries in
pragmatic applications, especially in NNs of limited depth which are nevertheless of significant
size. Future improvements should be concentrated on improving its linearity, its threshold voltage
independence and its representation of negative values to increase its usability of NNs of greater
depth and more sophisticated AFs.
This work points to potentially exciting research to extend the computation capabilities of
devices working in the relative temporal and analogue domains.
25
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
Machine learning is finding more applications at the micro-edge, where power variation
from energy harvesters is becoming commonplace. We believe the proposed perceptron will find
practical implementations in these applications as it is highly robust to these variations.
Ethics. Insert ethics statement here if applicable.
Data Accessibility. Insert details of how to access any supporting data here.
Authors’ Contributions. Serhii Mileiko was responsible for the low-level perceptron and NN circuit
design, experiments and anlysis, and also led the writing of the paper. Thanasin Bunnam was responsible
for the scale-up NN circuit modeling, analysis and experiments using MNIST benchmark application, and
co-led the writing of the paper. Fei Xia contributed to the writing of the paper, led its editing, and participated
in technical discussions. Rishad Shafik contributed in the scale-up models, tied up with the low-level
circuits, co-supervised the circuit- and system-level works of S. Mileiko and T. Bunnam, and contributed to
writing/editing. Alex Yakovlev contributed by proposing the idea of using CMOS logic for duty-cycle based
computing for power elasticity and robustness, supervising the work of S. Mileiko and T. Bunnam, and all
technical discussions. Shidhartha Das has contributed in the circuit design aspects, particularly reflecting on
how averaging could affect the overall accuracy.
Competing Interests. The authors declare that they have no competing interests.
Funding. This work was supported by the EPSRC STRATA project (EP/N023641/1).
Acknowledgements. The authors would like to thank Mr. Jonathan Edwards, CTO of Temporal Computing
Ltd., for his useful suggestions in the early stages.
References
1. K. Schwab, The fourth industrial revolution. London : Portfolio Penguin, 2017.
2. N. Javaid, A. Sher, H. Nasir, and N. Guizani, “Intelligence in iot-based 5g networks:
Opportunities and challenges,” IEEE Communications Magazine, vol. 56, no. 10, pp. 94–100,
October 2018.
3. A. Biswas and A. P. Chandrakasan, “Conv-ram: An energy-efficient sram with embedded
convolution computation for low-power cnn-based machine learning applications,” in 2018
IEEE International Solid - State Circuits Conference - (ISSCC), Feb 2018, pp. 488–490.
4. M. Chen, Y. Miao, X. Jian, X. Wang, and I. Humar, “Cognitive-LPWAN: Towards
Intelligent Wireless Services in Hybrid Low Power Wide Area Networks,” arXiv e-prints, p.
arXiv:1810.00300, Sep 2018.
5. M. T. Sharbati, Y. Du, J. Torres, N. D. Ardolino, M. Yun, and F. Xiong, “Low-power,
electrochemically tunable graphene synapses for neuromorphic computing,” Advanced
Materials, vol. 30, no. 36, p. 1802353, 2018. [Online]. Available: https://onlinelibrary.wiley.
com/doi/abs/10.1002/adma.201802353
6. E. O. Neftci, “Data and power efficient intelligence with neuromorphic learning machines,”
iScience, vol. 5, pp. 52 – 68, 2018. [Online]. Available: http://www.sciencedirect.com/science/
article/pii/S2589004218300865
7. R. Shafik and A. Yakovlev, Chapter: From Power-Efficient to Power-Driven Computing, in Many-
Core Computing: Hardware and Software. Ed: G. Merrett and B. M. Al-Hashimi, IET, 2019.
8. R. A. Shafik, S. Yang, A. Das, L. A. Maeda-Nunez, G. V. Merrett, and B. M. Al-Hashimi,
“Learning transfer-based adaptive energy minimization in embedded systems,” IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 35, no. 6, pp.
877–890, June 2016.
9. I. Qiqieh, R. Shafik, G. Tarawneh, D. Sokolov, and A. Yakovlev, “Energy-efficient
approximate multiplier design using bit significance-driven logic compression,” in
Proceedings of the Conference on Design, Automation & Test in Europe, ser. DATE ’17.
European Design and Automation Association, 2017, pp. 7–12. [Online]. Available:
http://dl.acm.org/citation.cfm?id=3130379.3130382
10. R. Shafik, A. Yakovlev, and S. Das, “Real-power computing,” IEEE Transactions on Computers,
vol. 67, no. 10, pp. 1445–1461, Oct 2018.
11. S. Beeby and N. M. White, Energy harvesting for autonomous systems. Artech House, 2010.
26
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
12. T.-J. Yang, Y.-H. Chen, and V. Sze, “Designing energy-efficient convolutional neural networks
using energy-aware pruning,” in Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 2017, pp. 5687–5695.
13. A. Suleiman, Y.-H. Chen, J. Emer, and V. Sze, “Towards closing the energy gap between hog
and cnn features for embedded vision,” in 2017 IEEE International Symposium on Circuits and
Systems (ISCAS). IEEE, 2017, pp. 1–4.
14. R. Andri, L. Cavigelli, D. Rossi, and L. Benini, “Yodann: An architecture for ultralow power
binary-weight cnn acceleration,” IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol. 37, no. 1, pp. 48–60, Jan 2018.
15. F. Rosenblatt, “The perceptron: A probabilistic model for information storage and
organization in the brain,” Psychological Review, pp. 65–386, 1958.
16. M. T. Hagan, H. B. Demuth, and M. Beale, Neural Network Design. Boston, MA, USA: PWS
Publishing Co., 1996.
17. E. Wilson and D. W. Tufts, “Multilayer perceptron design algorithm,” in Proceedings of IEEE
Workshop on Neural Networks for Signal Processing, Sep. 1994, pp. 61–68.
18. H. Adeli and C. Yeh, “Perceptron learning in engineering design,” Computer-Aided
Civil and Infrastructure Engineering, vol. 4, no. 4, pp. 247–256, 1989. [Online]. Available:
https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8667.1989.tb00026.x
19. S. Hung and H. Adeli, “A model of perceptron learning with a hidden layer for
engineering design,” Neurocomputing, vol. 3, no. 1, pp. 3 – 14, 1991. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/0925231291900165
20. Byeongjang Jeong and Yong Hoon Lee, “Design of weighted order statistic filters using the
perceptron algorithm,” IEEE Transactions on Signal Processing, vol. 42, no. 11, pp. 3264–3269,
Nov 1994.
21. Wang Qinruo, Yi Bo, Xie Yun, and Liu Bingru, “The hardware structure design of perceptron
with fpga implementation,” in SMC’03 Conference Proceedings. 2003 IEEE International
Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance
(Cat. No.03CH37483), vol. 1, 2003, pp. 762–767 vol.1.
22. Y. hsin Chen, T.-J. Yang, and J. S. Emer, “Understanding the limitations of existing energy-
efficient design approaches for deep neural networks,” in Energy, vol. 2, no. L1, 2018, p. L3.
23. R. W. Keyes and R. Landauer, “Minimal energy dissipation in logic,” IBM Journal of Research
and Development, vol. 14, no. 2, pp. 152–157, 1970.
24. M. de Prado, M. Denna, L. Benini, and N. Pazos, “Quenn: Quantization engine for low-
power neural networks,” in Proceedings of the 15th ACM International Conference on Computing
Frontiers. ACM, 2018, pp. 36–44.
25. A. Yakovlev, “Energy-modulated computing,” in 2011 Design, Automation Test in Europe,
March 2011, pp. 1–6.
26. D. Shang, X. Zhang, F. Xia, and A. Yakovlev, “Asynchronous design for new on-chip wide
dynamic range power electronics,” in 2014 Design, Automation Test in Europe Conference
Exhibition (DATE), March 2014, pp. 1–6.
27. R. LiKamWa, Y. Hou, Y. Gao, M. Polansky, and L. Zhong, “Redeye: Analog convnet image
sensor architecture for continuous mobile vision,” in 2016 ACM/IEEE 43rd Annual International
Symposium on Computer Architecture (ISCA), June 2016, pp. 255–266.
28. Chen, Yu-Hsin and Krishna, Tushar and Emer, Joel and Sze, Vivienne, “Eyeriss: An Energy-
Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” in IEEE
International Solid-State Circuits Conference, ISSCC 2016, Digest of Technical Papers, 2016, pp.
262–263.
29. I. Vaisband, M. Azhar, E. G. Friedman, and S. Köse, “Digitally controlled pulse width
modulator for on-chip power management,” IEEE Transactions on Very Large Scale Integration
(VLSI) Systems, vol. 22, no. 12, pp. 2527–2534, 2014.
30. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document
recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
31. Y. Du, L. Du, X. Gu, J. Du, X. S. Wang, B. Hu, M. Jiang, X. Chen, S. S. Iyer, and M. F. Chang,
“An analog neural network computing engine using cmos-compatible charge-trap-transistor
(ctt),” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 1–1,
2018.
32. L. Deng, “The mnist database of handwritten digit images for machine learning research [best
of the web],” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012.
27
rsta.royalsocietypublishing.org
P
hil.
Trans.
R
.S
oc.
A
0000000
..................................................................
33. M. D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q. V. Le, P. Nguyen, A. Senior,
V. Vanhoucke, J. Dean, and G. E. Hinton, “On rectified linear units for speech processing,”
in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp.
3517–3521.
34. L. K. Muller and G. Indiveri, “Rounding methods for neural networks with low resolution
synaptic weights,” eprint arXiv:1504.05767, p. arXiv:1504.05767, 2015. [Online]. Available:
https://ui.adsabs.harvard.edu/abs/2015arXiv150405767M
35. B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam,
and D. Kalenichenko, Quantization and Training of Neural Networks for Efficient
Integer-Arithmetic-Only Inference, ser. arXiv e-prints, 2017. [Online]. Available:
https://ui.adsabs.harvard.edu/#abs/2017arXiv171205877J
36. S. Wu, G. Li, F. Chen, and L. Shi, Training and Inference with Integers in Deep Neural
Networks, ser. arXiv e-prints, 2018. [Online]. Available: https://ui.adsabs.harvard.edu/#abs/
2018arXiv180204680W
37. H. Jiang, K. Yamada, Z. Ren, T. Kwok, F. Luo, Q. Yang, X. Zhang, J. J. Yang, Q. Xia,
Y. Chen, H. Li, Q. Wu, and M. Barnell, “Pulse-width modulation based dot-product engine for
neuromorphic computing system using memristor crossbar array,” in 2018 IEEE International
Symposium on Circuits and Systems (ISCAS), Conference Proceedings, pp. 1–4.
