Design of Neuromemristive Systems for Visual Information Processing by Merkel, Cory E
Rochester Institute of Technology
RIT Scholar Works
Theses Thesis/Dissertation Collections
11-2015
Design of Neuromemristive Systems for Visual
Information Processing
Cory E. Merkel
cem1103@rit.edu
Follow this and additional works at: http://scholarworks.rit.edu/theses
This Dissertation is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for
inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact ritscholarworks@rit.edu.
Recommended Citation
Merkel, Cory E., "Design of Neuromemristive Systems for Visual Information Processing" (2015). Thesis. Rochester Institute of
Technology. Accessed from
R·I·T
Design of Neuromemristive Systems for Visual
Information Processing
by
Cory E. Merkel
A dissertation submitted in partial fulfillment of the requirements
for the degree of Doctor of Philosophy in Microsystems Engineering
Microsystems Engineering Program
Kate Gleason College of Engineering
Rochester Institute of Technology
Rochester, New York
November 2015
Design of Neuromemristive Systems for Visual Information Processing
By
Cory E. Merkel
Committee Approval:
We, the undersigned committee members, certify that we have advised and/or supervised the candidate on the
work described in this dissertation. We further certify that we have reviewed the dissertation manuscript and
approve it in partial fulfillment of the requirements of the degree of Doctor of Philosophy in Microsystems
Engineering.
Dhireesha Kudithipudi, Ph.D. Date
Associate Professor
Computer Engineering, RIT
Ray Ptucha, Ph.D. Date
Assistant Professor
Computer Engineering, RIT
Santosh Kurinec, Ph.D. Date
Professor
Electrical and Microelectronic Engineering, RIT
Haibo He, Ph.D. Date
Associate Professor
Computer and Biomedical Engineering, URI
Manan Suri, Ph.D. Date
Assistant Professor
Electrical Engineering, IIT, Delhi
Bryant Wysocki, Ph.D. Date
Chief Engineer
Information Directorate, AFRL
Bradford Mahon, Ph.D. Date
Assistant Professor
Brain and Cognitive Sciences, UR
Gabrielle Gaustad, Ph.D. Date
Associate Professor
Golisano Institute for Sustainability, RIT
Certified By:
Bruce Smith, Ph.D. Date
Director
Microsystems Engineering Ph.D. Program, RIT
Harvey J. Palmer, Ph.D. Date
Dean
Kate Gleason College of Engineering, RIT
i
ABSTRACT
Kate Gleason College of Engineering
Rochester Institute of Technology
Degree Doctor of Philosophy Program Microsystems Engineering
Author’s Name Cory E. Merkel
Advisor’s Name Dhireesha Kudithipudi, Ph.D.
Dissertation Title Design of Neuromemristive Systems for Visual Information Processing
Neuromemristive systems (NMSs) are brain-inspired, adaptive computer architectures based
on emerging resistive memory technology (memristors). NMSs adopt a mixed-signal de-
sign approach with closely-coupled memory and processing, resulting in high area and
energy efficiencies. Previous work suggests that NMSs could even supplant conventional
architectures in niche application domains such as visual information processing. How-
ever, given the infancy of the field, there are still several obstacles impeding the transition
of these systems from theory to practice.
This dissertation advances the state of NMS research by addressing open design prob-
lems spanning circuit, architecture, and system levels. Novel synapse, neuron, and plastic-
ity circuits are designed to reduce NMSs’ area and power consumption by using current-
mode design techniques and exploiting device variability. Circuits are designed in a 45 nm
CMOS process with memristor models based on multilevel (W/Ag-chalcogenide/W) and
bistable (Ag/GeS2/W) device data. Higher-level behavioral, power, area, and variability
models are ported into MATLAB to accelerate the overall simulation time. The circuits
designed in this work are integrated into neural network architectures for visual informa-
tion processing tasks, including feature detection, clustering, and classification. Networks
in the NMSs are trained with novel stochastic learning algorithms that achieve ≈3.5×
reduction in circuit area, reduced design complexity, and exhibit similar convergence prop-
erties compared to the least-mean-squares algorithm. This work also examines the effects
of device-level variations on NMS performance, which has received limited attention in
previous work. The impact of device variations is reduced with a partial on-chip training
methodology that enables NMSs to be configured with relatively sophisticated algorithms
(e.g. resilient backpropagation), while maximizing their area-accuracy tradeoff.
ii
Contents
Abstract ii
List of Figures vi
List of Tables x
Frequently Used Symbols xi
1 Introduction 1
2 Background and Related Work 5
2.1 Overview of the Human Visual System . . . . . . . . . . . . . . . . . . . . 5
2.2 Memristors for Plasticity in Neuromemristive Systems . . . . . . . . . . . 6
2.3 Neuromemristive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Stochastic Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.1 Stochastic Representation of a Digital Value . . . . . . . . . . . . . 16
2.4.2 Stochastic Arithmetic Operations . . . . . . . . . . . . . . . . . . 17
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Device Models, Simulation Strategy, and Design Methodology 20
3.1 45 nm Low Power PTM MOSFET Characterization . . . . . . . . . . . . . 20
3.1.1 Current-Voltage Characteristics . . . . . . . . . . . . . . . . . . . 21
3.1.2 Output Impedance . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.3 Diode-Connected MOSFETs . . . . . . . . . . . . . . . . . . . . . 25
3.1.4 ON Resistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.5 Process Variations . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Memristor Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.1 Silver Chalcogenide Memristor . . . . . . . . . . . . . . . . . . . 28
3.2.2 CBRAM Memristor . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.3 Process Variations . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Simulation Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4 Current-mode Designs and Analog Signal Representation . . . . . . . . . . 47
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
iii
4 Synapse and Neuron Circuits 52
4.1 Synapse Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.1 Constant Current Mirror Synapse . . . . . . . . . . . . . . . . . . 53
4.1.1.1 Basic Operation . . . . . . . . . . . . . . . . . . . . . . 53
4.1.1.2 Area and Power . . . . . . . . . . . . . . . . . . . . . . 55
4.1.1.3 Process Variations . . . . . . . . . . . . . . . . . . . . . 56
4.1.1.4 Constant Current Mirror Synapse with Bipolar Input . . . 61
4.1.2 Bipolar Weight Memristive Synapse . . . . . . . . . . . . . . . . . 62
4.1.2.1 Basic Operation . . . . . . . . . . . . . . . . . . . . . . 62
4.1.2.2 Area and Power . . . . . . . . . . . . . . . . . . . . . . 67
4.1.2.3 Process Variations . . . . . . . . . . . . . . . . . . . . . 69
4.1.2.4 Crossbar Implementation . . . . . . . . . . . . . . . . . 70
4.2 Neuron Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2.1 Input Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2.2 Sigmoid and Hyperbolic Tangent Activation Functions . . . . . . . 77
4.2.2.1 Basic Operation . . . . . . . . . . . . . . . . . . . . . . 77
4.2.2.2 Area and Power Consumption . . . . . . . . . . . . . . . 80
4.2.2.3 Process Variations . . . . . . . . . . . . . . . . . . . . . 81
4.2.3 Periodic Activation Functions . . . . . . . . . . . . . . . . . . . . 81
4.2.3.1 Basic Operation . . . . . . . . . . . . . . . . . . . . . . 81
4.2.3.2 Area, Power Consumption, and Process Variations . . . . 84
4.2.4 Additional Activation Functions . . . . . . . . . . . . . . . . . . . 85
4.3 Voltage-Mode CBRAM Synapse and Neuron Circuits . . . . . . . . . . . . 86
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5 Synaptic Plasticity Circuits 90
5.1 Online SLMS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.1.2 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . 94
5.1.3 Algorithm Performance . . . . . . . . . . . . . . . . . . . . . . . 100
5.2 Batch SLMS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3 Min Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6 NMSs for Visual Information Processing 108
6.1 Feature Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.2.1 MNIST Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.2.2 Caltech101 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.3.2 Clustering MNIST Images . . . . . . . . . . . . . . . . . . . . . . 122
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
iv
7 Effects of Device Variations on System-Level Performance 125
7.1 Case Study: NMS for Electrical Load Forecasting . . . . . . . . . . . . . . 126
7.1.1 Simulation with Nominal Parameters . . . . . . . . . . . . . . . . 127
7.1.2 Variation Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.2 Off-Chip Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.2.1 Weight Programming . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.2.2 Feature Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.2.3 Results for Classification . . . . . . . . . . . . . . . . . . . . . . . 137
7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8 Conclusions and Future Work 141
Appendices 144
A Derivation of si for Constant Current Mirror Synapses 144
Bibliography 146
v
List of Figures
2.1 Lateral view of the human brain’s visual pathways. Black filled rectangles
show the visual information processing tasks explored in this work. . . . . . 6
2.2 Pinched hysteresis current-voltage relationship that characterizes memris-
tive devices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 High-level depiction of an NMS. The NMS design process requires inter-
disciplinary collaboration between experts in neuroscience/neuropsychology,
machine learning, and integrated circuit/architecture design. In this par-
ticular example, an NMS is designed for image classification. A neural
network architecture is used to extract features at lower levels and make
predictions at higher levels. . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Comparison of a biological synapse and a memristor as a synapse emulator. 12
2.5 NMS design space for visual information processing. . . . . . . . . . . . . 13
3.1 Logarithm of the NMOS drain current versus vgs (W = L = 45 nm, vds =
1.1 V, Vsb = 0 V). The solid curve shows the result from SPICE simulation
using the BSIM model, while the dashed curve shows the proposed semi-
empirical weak inversion model. . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Fitting the L and vds-dependent ζ parameters for (a) NMOS and (b) PMOS
devices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 ON resistance of 45 nm low-power PTM MOSFETs. . . . . . . . . . . . . 26
3.4 Memristor with energy band diagram showing dips in the energy level of
the insulator corresponding to defect states. . . . . . . . . . . . . . . . . . 29
3.5 Silver chalcogenide memristor stack (adapted from [66]). . . . . . . . . . . 34
3.6 Silver chalcogenide current versus time resulting from the application of a
sinusoidal voltage. (Top) Voltage across the memristor vs. time. (Bottom)
Current flowing through the memristor vs. time for 8 devices [73]. . . . . . 35
3.7 on and off conductances of 8 silver chalcogenide memristor devices. Dashed
lines show the mean values. Shown above the bars for each device is the
sum of the percent difference between that device’s on and off conductance
and the mean values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.8 Silver chacogenide memristor I-V characteristic: Experimental data [73]
and semi-empirical model. . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.9 Change in state variable γ vs. number of applied write pulses. (a) |vw| =
0.75 V, tw = 1 ns. (b) |vw| = 1.0 V, tw = 1 ns. (c) |vw| = 0.75 V, tw = 1µs.
(d) |vw| = 1.0 V, tw = 1µs. . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.10 SEM image of the CBRAM device used in this work [75]. . . . . . . . . . 39
vi
3.11 CBRAM switching probability vs. applied flux. . . . . . . . . . . . . . . . 42
3.12 Simulation times for a single-layer perceptron with different numbers of
inputs N and test vectors m applied. (a) Mean HSPICE simulation time.
(b) Mean MATLAB simulation time. . . . . . . . . . . . . . . . . . . . . . 44
3.13 Speedup of MATLAB simulation compared to HSPICE for a single-layer
perceptron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.14 NMS simulation strategy, where HSPICE is used for circuit-level design
and analysis, and MATLAB is used for system-level simulation. . . . . . . 46
3.15 (a) Voltage-mode NMS, where neuronal activations are represented as volt-
ages and synapses typically operate via transconductance. (b) Current-
mode NMS where neuronal activations and the results of synaptic weight-
ing are represented as currents. . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1 Constant current mirror synapse circuits for (a) positive weights and (b)
negative weights. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Probability densities for w′ in the constant current mirror synapse. (a)
W1/L1 =10/1, Amincs = Amin. (b) W1/L1 =10/1, A
min
cs = 25Amin. (c)
W1/L1 =1/1, Amincs = Amin. (d) W1/L1 =100/1, A
min
cs = Amin. In all
cases W2/L2 is drawn from a discrete uniform distribution between 1 and
W1/L1. Zero-valued weights are also added with probability 1/(1 +W1/L1). 57
4.3 Conditional distributions of w′ for excitatory synapses connected to differ-
ent pre-synaptic neurons with W2L2 = Amin. (a) W1L1 = Amin resulting
in a large variation in the distributions’ means. (b) W1L1 = 10Amin, re-
sulting in reduced variation of the distributions’ means. . . . . . . . . . . . 60
4.4 Constant weight synapse circuit with bipolar input. . . . . . . . . . . . . . 61
4.5 Bipolar weight memristive synapse. Two anti-parallel memristors control
the relative ratio of excitation to inhibition at the output. . . . . . . . . . . . 63
4.6 Bipolar weight memristive synapse output. (a) L = 45 nm, (b) L = 180 nm. 65
4.7 Equivalent write circuit for the bipolar weight memristive synapse. . . . . . 65
4.8 Evolution of the weight in the bipolar weight memristive synapse. (a) The
weight is changed from -1 to 1 by applying positive write pulses. (b) The
weight is changed from 1 to -1 by applying negative write pulses. In both
cases, |vw| = 3.5 V, and the write pulse width is 1 µs. . . . . . . . . . . . . 67
4.9 Crossbar and summing amplifier circuit for computing the distance be-
tween the input and a weight vector. . . . . . . . . . . . . . . . . . . . . . 71
4.10 (a) Simplified model of the bipolar weight memristive synapse connected
to the input of a post-synaptic neuron. (b) Number of neuron inputs and
opamp gain vs. the maximum absolute value of the fractional error between
the total synaptic output current and the ideal total synaptic output current. . 73
4.11 Constant current mirror synapses converging on a resistive input stage of a
post-synaptic neuron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.12 isi vs. ideal isi for a number N of constant current mirror synapses con-
verging on a resistive input stage of a neuron. (a) L = 45 nm, N = 2. (b)
L = 45 nm, N = 100. (c) L = 45 nm, N = 1000. (d) L = 90 nm, N = 2.
(e) L = 90 nm, N = 100. (f) L = 90 nm, N = 1000. . . . . . . . . . . . . 76
vii
4.13 Current-mode sigmoid/tanh activation function circuit. . . . . . . . . . . . 78
4.14 Transfer characteristics of the sigmoid activation function circuit. . . . . . . 78
4.15 Monte Carlo simulations of the sigmoid neuron activation function with
variation in Vth0. (a) and (b) have minimum sizing, except for the Imax
current mirror, which has L = 180 nm. (c) and (d) have W = 90 nm and
L = 90 nm, except for the Imax current mirror, which has L = 180 nm. . . 82
4.16 Folding amplifier activation function with an opamp input stage. . . . . . . 83
4.17 Folding amplifier activation function with different fold factors (F ). . . . . 83
4.18 (a) Proposed synapse circuit consisting of excitatory and inhibitory groups
of Ag/GeS2 CBRAM devices. (b) Effective synaptic weight vs. individual
conductances (Ge, Gi). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.1 (a) Random current generator circuit. (b) Random current comparator
(RCC) for converting an analog current iz to a Bernoulli-distributed dig-
ital voltage vZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2 Hardware implementation of the online SLMS algorithm. . . . . . . . . . . 95
5.3 Illustration of the transformation that must be applied to the output of the
Gilbert multiplier to implement the LMS algorithm: (Left) Unshifted mul-
tiplier output and (Right) output after applying the shift. . . . . . . . . . . . 98
5.4 Hardware implementation of the online LMS algorithm. . . . . . . . . . . 98
5.5 Performance of the proposed algorithm on a linear regression problem: (a)
MSE vs. training epoch for the LMS and SLMS algorithms. (b) Datapoints
from a straight line with random Gaussian noise added and the linear fits
provided by the LMS and SLMS algorithms. . . . . . . . . . . . . . . . . . 101
5.6 Mean MSE versus α (learning rate) over 10 runs. . . . . . . . . . . . . . . 101
5.7 Batch SLMS circuit design. . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.8 Circuits for implementing the min function. . . . . . . . . . . . . . . . . . 106
6.1 (a) Simulation setup for testing a single-layer, one-output neural network
for an edge detection application. (b) Training patterns for edge detection
(adapted from [111]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2 Quantitative comparison of edge detection simulation results: Lenna im-
age with (a) adaptive and (b) fixed learning rates. Clock image with (c)
adaptive and (d) fixed learning rates. Each plot shows the fraction of pixels
correctly classified by the network for networks with 1, 2, 3, and 4-fold
output neurons. Each group of three bars shows results, from left to right,
after 1, 2, and 5 training cycles, respectively. . . . . . . . . . . . . . . . . . 111
6.3 Qualitative comparison of edge detection simulation results: (a) Original
Lenna image, and edge-detected images after 5 training cycles with a fixed
learning rate and (b) 1-fold, (c) 2-fold, (d) 3-fold, and (e) 4-fold output
neuron. (f) Original Clock image, and edge-detected images after 5 training
cycles with a fixed learning rate and (g) 1-fold, (h) 2-fold, (i) 3-fold, (j) 4-
fold output neuron [114]. . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
viii
6.4 Simulation setup for classification on the MNIST database. The original
MNIST images are reduced to 5×5 pixels and fed into the NMS. The
NMS’s output layer is trained using online SLMS, and a winner-take-all
output is used to generate the hypothesis. . . . . . . . . . . . . . . . . . . 114
6.5 (a) Classification accuracy vs. epoch and (b) classification accuracy vs.
learning rate for the logistic regression problem. . . . . . . . . . . . . . . . 115
6.6 Simulation setup for classification on the Caltech101 database. The orig-
inal (color) images are first converted to black and white and normalized.
Then, their dimensionality is reduced using LPP. The lower-dimensional
vectors are classified using a single-layer perceptron NMS. . . . . . . . . . 116
6.7 (a) Classification accuracy of linear SVM and single-layer neuromemris-
tive architecture as a function of the top d dimensions from LPP. (b) Learn-
ing curves for the NMS using SLMS. . . . . . . . . . . . . . . . . . . . . 116
6.8 (left) Number of training samples per class; (right) Confusion matrix show-
ing input ground truth class on the rows, and corresponding mapping on the
columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.9 The NMS network size vs. the number of LPP output dimensions o. . . . . 118
6.10 The Caltech101 dataset. Two example images from each of the 100 classes
is shown, after resizing, normalization, and conversion to grayscale. . . . . 119
6.11 Block diagram of proposed NMS for unsupervised clustering. . . . . . . . 122
6.12 10 cluster centroids found in a set of 1000 MNIST images using the pro-
posed NMS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.13 Cost function versus epoch while clustering MNIST images using the pro-
posed NMS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.1 Block-level diagram of the proposed NMS for linear regression using the
SLMS training algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.2 (a) Predicted load vs. time for the month of January 2013 in 8-hour incre-
ments. (b) Predicted load vs. time for January 1, 2013 (24 hours). (c) Load
prediction accuracy versus training epoch. . . . . . . . . . . . . . . . . . . 129
7.3 Impact of device and parameter variations on the system forecasting ac-
curacy. (a) Impact of variations in the CBRAM programming window
(Gmon/Gmoff ) (b) Forecast accuracy vs. the number of CBRAM devices
used in each synapse. (c) Forecast accuracy vs. CBRAM switching proba-
bility. (d) Forecast accuracy vs. σC (variation in the capacitors used in the
synaptic circuits). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.4 High-level depiction of off-chip training for an NMS. . . . . . . . . . . . . 133
7.5 MNIST classification results: (a) Classification accuracy versus training
epoch for the ideal NMS model (off-chip). (b) Classification accuracy ver-
sus the transistor size factor ts for weight programming and feature train-
ing. (c) Classification accuracy versus 1/ts, showing the inverse area de-
pendence of the accuracy for both methods. (d) Accuracy per unit area
versus ts for both training methods. . . . . . . . . . . . . . . . . . . . . . 138
ix
List of Tables
2.1 Comparison of non-volatile memory technologies for brain-inspired com-
puting [8, 12–14]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 ζ1 parameters for different values of W , L, and vds. . . . . . . . . . . . . . 23
3.2 Low power PTM 45 nm MOSFET parameters. . . . . . . . . . . . . . . . . 25
3.3 Model parameters for a silver chalcogenide memristor. . . . . . . . . . . . 36
3.4 Memristor variation parameters for Ag chalcogenide and CBRAM devices. 43
5.1 Comparison of online SLMS and LMS area. . . . . . . . . . . . . . . . . . 100
7.1 Nominal system parameters used for electrical load forecasting. . . . . . . 128
x
Frequently Used Symbols1,2
a Mean distance between defects in a memristor
A Circuit area
A0 Open-loop gain
α NMS learning rate
Av Voltage gain
AVth0 Fitting parameter for MOSFET Vth0 mismatch
B Boost factor
β Geometry-dependent MOSFET current factor
C Capacitance or a binary number to be represented stochastically
χ Function that governs the change in memristor state variable
Cox MOSFET oxide capacitance
Cs Stochastic bit stream or bundle
d Distance measure for clustering
D Memristor film thickness or duty factor
δ Variation operator
δ (·) Dirac delta function
E Total energy
Ea Activation energy
η Activity factor (e.g. neuron activity)
f Neuron activation function
F Wire pitch or folding amplifier folding factor or electric field
g Memristor conductance ratio
G Conductance
γ Memristor state variable
gm Transconductance
Gm Memristor conductance
Gmon Memristor on conductance
Gmoff Memristor off conductance
h NMS hypothesis function
~ Planck’s constant divided by 2pi
1In general, the symbols defined here may contain superscripts or subscripts to specify indices, component
names, maximum, minimum, average, etc.
2All symbols listed here are scalar values (e.g. x), but in general, they may be components of vectors,
denoted by lowercase boldface symbols (e.g. x) or matrices, denoted by uppercase boldface symbols (e.g.
X).
xi
H (·) Heaviside step function
ids MOSFET drain current
ids0 Constant prefactor in the MOSFET long channel drain current
i′ds Long channel MOSFET drain current
im Memristor current
Imax Constant current value used for normalization in an NMS
L MOSFET channel length
Λ Multiplicative effect of drain-source voltage on MOSFET drain current
lr Number of clock cycles to apply memristor write voltage
m Number of training examples or mass
µ0 MOSFET low-field mobility
n MOSFET subthreshold slope factor
N Number of NMS inputs or length (width) of a stochastic bit stream (bundle)
Nepochs Number of training epochs
Nh Number of hidden-layer neurons in an NMS
Nw Number of weights/synapses in an NMS
Nx Number of neurons in an NMS
ν Attempt-to-escape frequency
o NMS input dimensionality after dimensionality reduction
O NMS input dimensionality before dimensionality reduction
p NMS training/test vector index or momentum
P Power consumption
φ Flux linkage
pi Reduced dimensionality SLMS input vectors
Π Ratio of Λ values for MOSFETs in a current mirror
Pr (·) Probability
pswitch Memristor switching probability
ψ Quantum wave function
Ψ Number of bits in a binary number to be represented stochastically
q Elementary electric charge
R Resistance
Rin Circuit input impedance
Rm Memristor resistance
ro MOSFET output impedance
Ro Circuit output impedance
Θ Factor related to saturated MOSFET drain current on vds in subthreshold
s Synaptic output or sum of synaptic outputs
T Temperature or tunneling probability or period
θ Threshold for threshold activation function
ts Transistor size factor
tw Memristor write time
u NMS input
U Stochastic NMS input or potential energy
vds MOSFET drain-source voltage
vgs MOSFET gate-source voltage
xii
vm Memristor voltage
VL Lower bound of gate-source voltage for MOSFET weak inversion
VM Upper bound of gate-source voltage for MOSFET weak inversion
VDD Positive supply voltage
Vdssat MOSFET saturation voltage
Vsb MOSFET source-body voltage
VSS Negative supply voltage
Vtn Negative memristor threshold voltage
Vtp Positive memristor threshold voltage
Vth0 MOSFET threshold voltage
Vth Folding amplifier neuron threshold voltage
VT Thermal voltage
vw Memristor write voltage
w Synaptic weight
w′ Synaptic weight without effect of vds
W MOSFET channel width
x Neuron output (activation)
X Stochastic neuron output (activation)
ξ Memristor model fitting parameter
y Expected NMS output (target)
Y Stochastic expected NMS output (target)
yˆ NMS output
Yˆ Stochastic NMS output
ζ MOSFET model fitting parameter
xiii
To my beautiful wife, Melissa,
for all of your patience, love, and encouragement.
xiv
Acknowledgements
This research was made possible by the support and encouragement of a number of people
inside and outside of RIT. First, I’d like to acknowledge the Air Force Research Laboratory
and National Science Foundation (NSF EAGER Grant ECCS-1445386) for partially fund-
ing this research. In addition, I’d like to thank my advisor, Dr. Dhireesha Kudithipudi, who
has been a wonderful mentor and role model during the course of my Ph.D. work. I am
also grateful for the support and flexibility of my dissertation committee: Dr. Ray Ptucha,
Dr. Santosh Kurinec, Dr. Bryant Wysocki, Dr. Brad Mahon, Dr. Haibo He, Dr. Manan
Suri, and Dr. Gabrielle Gaustad.
The RIT Computer Engineering Department’s faculty and staff, including Dr. Shanchieh
Yang, Dr. Andres Savakis, Kathy Stefanic, Pam Steinkirchner, Anne DeFelice, Rick Tolle-
son, Emilio Del Plato, and Sarah Buell, have been extremely helpful with administrative
tasks, providing resources, and addressing technical problems. I also want to thank the fac-
ulty and staff of the Microsystems Engineering Ph.D. Program, including Dr. Bruce Smith
and Lisa Zimmerman.
Several researchers from the Air Force Research Laboratory, including Dr. Bryant
Wysocki, Dr. Garrett Rose, Nathan McDonald, and Claire Thiem have been outstand-
ing, providing valuable technical contributions to this work. Additional collaborators in-
clude Dr. Nate Cady (SUNY Albany), Dr. Kris Campbell (Boise State University), and
Alex Nugent (Knowm, Inc.). I must also thank all of the members of the NanoComputing
Research Lab, including Dan Christiani, James Mnatzaganian, Qutaiba Saleh, Abdullah
Zyarah, James Thesing, and many more. They have been a wonderful source of encourage-
ment and technical insight. In addition, thank you to the many friends in my undergraduate
class at RIT, including David Brenner, Mike Sanfilippo, Daniel Liu, Jeff Kemp, and many
more.
Finally, a sincere and heartfelt thank you to my wife, mom, dad, brother, and the rest of
my family and friends for all of their support.
xv
Chapter 1
Introduction
How can emerging nanotechnologies be exploited to design the next generation of intelli-
gent computers? This is the central question explored in this dissertation and the underlying
theme of neuromemristive systems (NMSs). An NMS is a brain-inspired, special-purpose
computing platform based on nanoscale resistive memory (memristor) technology. NMSs
represent a subclass of a broader movement in brain-inspired computing called neuromor-
phic systems, which were pioneered by Carver Mead in the late 1980s [1]. The primary
goal of both neuromorphic and neuromemristive systems is to provide levels of intelligent
information processing, adaptation/learning, energy/area efficiency, and noise/fault toler-
ance in niche application domains that are not achievable using conventional computing
paradigms. Conventional computer architectures are limited in these aspects because of
their adherence to the von Neumann model, where the hardware is digital and immutable,
computation is sequential and precise, and a distinct separation exists between computation
and memory. Although the von Neumann model is unparalleled for well-defined sequential
problems (e.g. arithmetic and logic), it is ill-suited in application domains such as visual
information processing, where problems are not well-posed, data are analog and noisy,
and solutions are inherently parallel. Mead and many researchers before him recognized
1
2that biological systems such as the primate brain solve these types of problems with much
greater efficiency than conventional computing systems. In fact, it is estimated that for ap-
plications such as visual information processing, the brain is a factor 1×107 more energy
efficient than any conceivable digital computer. The explanation for this large efficiency
gap lies in the stark contrast between conventional computer architectures and the comput-
ing methods employed by the brain.
The human brain is inherently mixed-signal, massively parallel, approximate, and plas-
tic, giving rise to its incredible processing ability, low power consumption, and capacity
for adaptation. Both neuromorphic and neuromemristive systems attempt to emulate brain
functionality with neural networks built from mixed-signal circuits. The two distinguishing
features of an NMS are:
• The incorporation of memristive devices into NMSs enables plasticity at multiple
levels, beyond the synaptic plasticity that is typically implemented in neuromorphic
systems.
• NMSs focus on abstraction of the computational principles found in the nervous sys-
tem rather than biological plausibility. This approach is better for two reasons. First,
it is still unclear how behavior at the level of single neurons and small neuronal pop-
ulations leads to system-level behavior of the brain. Second, the basic components of
the brain (e.g. proteins, cells, etc.) are much different than those used in integrated
circuit (IC) design (e.g. transistors and memristors). Therefore, it is unlikely that
copying the brain’s structure in an IC will yield the same emergent properties.
Chapter 1. Introduction 3
Note that there are other computing platforms that are attractive for brain-like informa-
tion processing, including general-purpose graphical processing units (GPGPUs) and field-
programmable gate arrays (FPGAs). GPGPUs are optimized for the types of linear algebra
computations that govern neural network behavior. However, they lack the reconfigurabil-
ity that is offered by NMSs. On the other hand, FPGAs have a high degree of reconfig-
urability, but they have very high area and power overheads to support their interface and
routing resources.
Neuromorphic and neuromemristive systems have been attracting a lot of research at-
tention through various projects, such as the DARPA SYNAPSE [2] program, DARPA’s
Cortical Processor Program [3], Physical Intelligence [4], UPSIDE [5], the Human Brain
Project [6], and the Blue Brain Project [7]. Vision-related applications have been the targets
of many of these projects and other programs. Visual information processing has a large
application spectrum, including surveillance, medical imaging, content filtering/searching,
object classification, and many others. In order to explore these applications within NMSs,
one must first consider the breadth of the NMS design space, which spans circuits, ar-
chitectures, and system-level components. Although NMSs have the potential to replace
von Neumann architectures in niche application domains, there are several gaps in existing
work, particularly at the circuit level, that have impeded their transition from research to
practice. This dissertation fills those gaps, exploring the design and modeling of primitive
circuits and learning algorithms to improve the efficiency and variation tolerance of visual
information processing in NMSs. The specific novel contributions of this work are:
4• New current-mode synapse, neuron, and plasticity circuits for NMSs that have re-
duced area and power consumption over voltage-mode designs.
• Novel stochastic training algorithms for NMSs with circuit implementations that
have reduced area overhead and complexity.
• Integration of the circuits and training algorithms designed this work into NMSs for
visual feature detection, clustering, and classification.
• Detailed models and analyses of the effects of device-level variations on system-level
accuracy, enabling the development of better training methodologies.
The rest of this dissertation is outlined as follows: Chapter 2 provides background and
related work on the human visual system, memristor devices, NMSs, and stochastic com-
putation. Chapter 3 presents the metal-oxide-semiconductor field-effect transistor (MOS-
FET) and memristor models used in this work, the strategy for circuit and system-level
simulations, and the current-mode design methodology adopted in this research. Chapter
4 presents the synapse and neuron circuits designed in this work, along with models of
their area, power consumption, and variation analyses. Novel NMS training algorithms
and circuit-level implementations are presented in Chapter 5. Chapter 6 demonstrates the
utility of the circuits and training algorithms designed in this work for visual information
processing tasks. The effects of device-level variations on system-level performance are
evaluated in Chapter 7. Chapter 8 concludes this dissertation.
Chapter 2
Background and Related Work
2.1 Overview of the Human Visual System
The human visual system and visual perception are remarkable products of evolution. We
can easily classify objects, identify emotions from facial cues, approximate distances, un-
derstand scenes, and perform several other complex visual processing tasks. A simplified
depiction of the human visual pathways is shown in Figure 2.1. Light enters the eyes,
stimulating photoreceptors (rods and cones) on the retina. The response of the stimulus
propagates to ganglion cells whose axons leave the eye via the optic nerve. The eyes ac-
tually act as a pre-processor for visual information by transforming the single-ended input
signal to a differential signal, making it noise-tolerant and invariant to overall light level.
After information leaves the eyes, it follows the optic nerves and eventually the optic tract
to the lateral geniculate nucleus (LGN) of the thalamus. From there, information radiates
through several pathways in the temporal and parietal lobes collectively called Meyer’s
loop, eventually reaching the V1 and other visual cortices in the occipital lobe. It is here
that several low-level features are extracted, such as edges, corners, etc. The pathway from
5
2.2. Memristors for Plasticity in Neuromemristive Systems 6
the retina to the V1 cortex is known as the retino-geniculo-striate pathway. From the occip-
ital lobe, visual information takes two distinct pathways to the parietal (dorsal stream) and
temporal (ventral stream) lobes. Processing in these locations allows us to locate objects
in space/time, and identify/classify objects. This work will focus on the ventral stream
pathway, where neurons in the inferotemporal (IT) cortex respond to classes of objects.
Higher-level cortical areas such as the prefrontal cortex (PFC) perform clustering opera-
tions, where we can associate images with similar meaning.
Raw 
Sensory 
Input
Feature 
Extraction
Classification
“Where?”
Sensory Pre-
processing
LGN
V1
Dorsal 
Stream
Ventral 
Stream
Meyer’s 
Loop
IT
PFC
Clustering
Lateral view of the brain’s visual pathways.
visual_pathways.pdf
Figure 2.1: Lateral view of the human brain’s visual pathways. Black filled rectangles show
the visual information processing tasks explored in this work.
2.2 Memristors for Plasticity in Neuromemristive Systems
Our abilities to learn a new face, drive a car, identify objects, and perform other visual tasks
are the results of brain plasticity. Plasticity is the characteristic of a system that allows it
to undergo permanent changes in response to an external force. Biological systems exhibit
remarkable levels of plasticity, enabling organisms to adapt to a changing environment,
maintain a homeostatic state, and recover from injury. The same characteristics are of in-
terest for future computing systems as they facilitate reconfigurability and noise tolerance,
Chapter 2. Background and Related Work 7
-1 -0.5 0 0.5 1
-100
-80
-60
-40
-20
0
20
40
60
80
Voltageh[V]
100
Lowhconductance
state
Highhconductance
state
C
ur
re
nt
 [
 7
A
]
Figure 2.2: Pinched hysteresis current-voltage relationship that characterizes memristive
devices.
reliability, and self-healing/resilience. The mechanisms that enable plasticity in a biolog-
ical context occur at multiple scales, from the level of individual cells up to functional
brain regions. These include neurogenesis, epigenetic mechanisms, long-term potentiation
and depression in chemical synapses, and changes in topological mappings between brain
regions and brain functions (e.g. retinotopic maps). At an abstract level, each of these
plasticity mechanisms requires some form of memory. In particular, there is a certain level
of persistence in e.g. the locations of specific neurons, the efficacy of synaptic transmission
in a particular synapse, and the topology of brain regions. Hence, any brain-inspired com-
puting system should ideally employ some form of non-volatile memory (NVM) to achieve
plastic behavior.
Flash has been the dominant non-volatile memory technology used in computing sys-
tems for many years because of its high density and low cost. However, due to many
2.2. Memristors for Plasticity in Neuromemristive Systems 8
scaling-related challenges, flash is expected to be superseded by a novel memory technol-
ogy within the next decade. Table 2.1 shows a comparison of NAND flash and prototyp-
ical/emerging non-volatile memories across energy, performance, and reliability metrics.
Biologically-motivated targets for each metric are listed in the right column. In particular,
phase change memory (PCM), spin transfer torque random access memory (STT-RAM),
and resistive random access memory (RRAM) are among the most promising candidates
for future NVM implementations [8]. Each of these technologies may also be described as
a memristor or memristive device. A memristor is a two-terminal passive circuit element
that follows a state-dependent Ohm’s Law, characterized by a pinched hysteresis current-
voltage relationship as shown in Figure 2.2 [9–11]:
im(t) = Gm(γ)vm(t) (2.1)
dγ
dt
= χ(γ, vm(t)) (2.2)
where im is the current through the memristor, vm is the voltage across the memristor,
γ ∈ [0, 1] is a state variable, Gm(γ) is the state-dependent conductance, and χ governs how
γ changes over time. The conductance will range from Gmoff ≡ Gm(γ = 0) to Gmon ≡
Gm(γ = 1). By applying short voltage pulses to these devices, one can incrementally
modify their conductance states, enabling the storage of multi-level memory values.
The most important metrics for an NVM technology within an NMS are dynamic range,
number of memory states, retention, energy efficiency, and endurance. A memristor’s dy-
namic range can be measured as the ratio of its on and off conductances (Gmon/Gmoff ).
Chapter 2. Background and Related Work 9
Table 2.1: Comparison of non-volatile memory technologies for brain-inspired computing
[8, 12–14].
Memristors
Metric Flash PCM STT-RAM RRAM Targets
Dynamic Range (f/f) - >1000 2 1000 > 4
Number of States 8-16 100 4 100 20-100
Retention Several years at room temp. years
Energy (pJ/bit) >100 2-25 0.1-2.5 0.1-3 0.01
Endurance (cycles) 104 109 1015 1012 109
A large dynamic range allows sense circuitry to easily distinguish an NVM cell’s different
memory states. The number of states that the NVM cell can achieve has a direct impact
on the area and energy efficiencies, as well as the functionality of an NMS. The number of
memory states in a memristive device is equivalent to the number of distinguishableGm(γ)
values that exist. For two conductance states to be distinguishable, they need to yield two
different current levels (when placed in a circuit) that have a range which is larger than
the noise level (e.g. thermal and shot noise) of the circuit. It may take several bi-stable
(only able to achieve two memory states) NVM cells in an NMS to attain the same level of
functionality as an NMS with a single NVM cell that has many memory states. Retention
is another critical characteristic for an NVM technology. Within an NMS, a large retention
allows the system to accumulate and integrate information over long periods of time. Low
power is a primary NMS design goal, making energy-per-bit a critical metric in evaluating
NVM technologies within these systems. Finally, in order for an NMS to learn and adapt,
its underlying memory must be able to endure a large number of write events. Based on
these metrics, RRAM is the most suitable NVM for NMS implementation. Although it
has a good dynamic range, number of states, and retention, PCM requires high energy and
voltages for writing. In addition, its endurance is borderline. In contrast, STT-RAM has
2.3. Neuromemristive Systems 10
very high endurance, but its dynamic range and number of resistance states are too small
for NMS implementation. In addition, RRAM has excellent compatibility with CMOS and
is highly scalable; its competition with other emerging NVM technologies will continue to
fuel research that will be fruitful for RRAM-based NMSs.
RRAM cells, which will be referred to as memristors for the rest of this document,
have a metal-insulator-metal (MIM) structure, where two conducting electrodes sandwich
a thin-film switching layer. Various MIM memristor stacks have been explored, and there
are several ways to categorize them based on their material properties (e.g. crystalline
structure, band gap, etc.), proposed switching mechanism (e.g. anion, cation, Ferroelectric,
etc.), or observed switching characteristics (e.g. bipolar or unipolar switching). Compre-
hensive reviews are provided in [12, 15–18]. The proposed switching mechanism for most
of the fabricated devices is based on redox reactions and migration of defects such as in-
terstitial ions or vacancies. Several different models have been proposed to capture the
physical phenomena underlying memristive behavior [18–21]. However, many of them are
computationally expensive and are not amenable to large-scale circuit simulations. Simpler
empirical models such as the PWL model proposed in [22] are parametrized by experimen-
tal memristor data and have lower computational complexity. Finally, several groups have
proposed simulation program with integrated circuit emphasis (SPICE) or Verilog AMS
models for circuit-level simulations [23–30].
Chapter 2. Background and Related Work 11
Features
Concepts
Features
Ears, fur
Face
Cat
Neuroscience &
Psychology
Machine Learning
IC and 
Architecture 
Design
NMS
y
�
^
Figure 2.3: High-level depiction of an NMS. The NMS design process requires interdisci-
plinary collaboration between experts in neuroscience/neuropsychology, machine learning,
and integrated circuit/architecture design. In this particular example, an NMS is designed
for image classification. A neural network architecture is used to extract features at lower
levels and make predictions at higher levels.
2.3 Neuromemristive Systems
NMSs leverage memristors’ small footprint, simple structure, potential for high density
(possibly on the order of 1014 bits/cm2 [31]), and capacity for incremental multi-level
memory to achieve plastic behavior. Figure 2.3 presents a high-level depiction of an
NMS. Here, it is emphasized that NMS designs are inspired by principles from the neu-
roscience/neuropsychology, machine learning, and IC design domains. An NMS has three
levels of design abstraction. Namely, there are primitive (circuit), architectural, and system-
level designs. The NMS design space is shown in Figure 2.5. At the circuit, or primitive
level, choices related to memristive devices, signaling type (e.g. analog or digital), mode,
and interface between different technologies are important. Synaptic weighting circuits
provide a weighted connections between neurons: si,j = xjwi,j , where si,j is the synapse’s
2.3. Neuromemristive Systems 12
output, wi,j is the weight of the connection, and xj is the input to the synapse. Neuron
circuits sum their inputs si and apply an activation function f to produce an output xi. Fi-
nally, synaptic adaptation circuits are used to modify the synaptic weights by changing the
conductance states of their memristors. At the architecture level, a topology and a training
algorithm must be specified. Topologies often take the form of neural network structures
such as multilayer perceptrons (MLPs), and training algorithms range from simple unsuper-
vised learning rules to complex algorithms such as backpropagation. At the system level,
multiple architectures can be combined and applied to a specific problem. In Figure 2.3, the
NMS is analyzing a picture to determine the most-likely classification given the pictures
that it has seen before. More generally, an NMS maps an input vector u to an output vector
yˆ through a hypothesis function h. Other design choices related to data pre-processing,
classification/regression algorithms, and partitioning of functions across different parts of
the system must also be made.
Na+
Neurotransmitters
(e.g. Glutamate)
Ionotropic
Receptor
Metabotropic 
Receptor
Ca2+
Second 
Messenger 
System
Post-synaptic 
Neuron
Pre-synaptic
Neuron
Defects
Electrons
Biological Synapse Memristor as a Synapse
Figure 2.4: Comparison of a biological synapse and a memristor as a synapse emulator.
There is an abstract behavioral similarity between biological synapses and memristors
which has sparked wide interest in the use of these devices as hardware synapses. A mem-
ristive synapse in an NMS must provide three functions: point-to-point communication,
Chapter 2. Background and Related Work 13
Circuit-Level Design
Architecture-Level 
Design
System-Level Design
Feature Detection, 
Clustering, 
Classification, etc.
Network 
Topology
Synaptic 
Weighting
Neuronal 
Activation
Training 
Algorithm
Synaptic 
Plasticity
• Pre-processing
• Datasets
• Functional partitioning
• Feedforward, reservoir, etc.
• Training algorithms
• On or off-chip training, online or
batch-mode training
• Device threshold, polarity, number of states
• Voltage mode, current mode, spiking
• Analog, digital stochastic
• CMOS/memristor interface (e.g. crossbar, 1T1R)
NMS design space.
Nmsdesignspace.pdf
Figure 2.5: NMS design space for visual information processing.
linear computation1, and learning. In theory, a single memristor is sufficient to provide all
three synaptic functions [32]. This idea is illustrated in Figure 2.4. In the case of a bi-
ological synapse, information is communicated between a pre-synaptic and post-synaptic
neuron through the diffusion of neurotransmitter molecules. For example, the neurotrans-
mitter glutamate opens ligand-gated ion channels at the post-synaptic neuron’s dendrite
allowing charge-carrying ions to diffuse in. The strength, or weight of this communica-
tion pathway is dependent on the number of ion channels present and the efficacy of each
channel in facilitating ion diffusion. Furthermore, the synaptic weight can be changed by
adding or removing ion channels or changing their transmission efficacy. In comparison,
memristive devices can communicate information between their electrodes via electrons.
The weight of transmission depends on the state of the memristive device. The weight can
be changed by modifying the memristor’s defect state.
Several groups have leveraged the similarity illustrated in Figure 2.4 in networks of
spiking neurons that implement spike time-dependent plasticity (STDP)-based Hebbian/anti-
Hebbian learning. Networks of analog spiking neurons with single-memristor synapses are
1The synaptic computation can be non-linear as well, but linear computation is better for implementation
of most neural network architectures.
2.3. Neuromemristive Systems 14
presented in [33, 34], and a digital implementation is proposed in [35]. However, these
implementations require neurons to output three different voltage levels, complicating the
hardware neuron design. Furthermore, the digital implementation requires a complex spik-
ing sequence controlled by a finite state machine. Single memristors have also been used
for binary synapse (on and off states only) realization in cellular neural networks [36].
However, additional circuitry is generally needed in a memristor-based synapse design
(switches, current mirrors, etc.) depending on which type(s) of learning algorithms (e.g. su-
pervised or unsupervised learning, synchronous or asynchronous learning, etc.) and neuron
designs (e.g. neuron activation function, analog/digital implementation, etc.) are present in
the network. In [37], a series combination of an ambipolar thin-film transistor (TFT) and
a memristor is used for synaptic transmission and weight storage in a spiking neural net-
work. The gate of the TFT is controlled by the pre-synaptic neuron, enabling or disabling a
constant voltage to pass through the memristor, creating a memristance-modulated current
at the input of the post-synaptic neuron. The authors demonstrate learning in a two-neuron
network with an average-spike frequency-based learning rule. Kim et al. [38, 39] propose
a memristor synapse based on a bridge circuit and a differential amplifier. It can be pro-
grammed to implement both positive (excitatory) and negative (inhibitory) weights. It also
has good noise performance due to its fully differential architecture. However, it requires
3 MOSFETs, 5 memristors, and additional training circuitry (depending on which learning
algorithm is being implemented). Another synapse design, presented in [40], incorporates
two memristors which can be trained to provide a desired ratio of excitation to inhibi-
tion. The synapse also allows bidirectional communication. However, the authors do not
Chapter 2. Background and Related Work 15
include a detailed description of the training circuitry, and the design also consumes a con-
stant static power, which could cause high power dissipation in large networks. Another
memristor-based synapse design is proposed in [41]. The design operates in subthresh-
old, resulting in low power consumption. However, the charge sharing technique that the
authors employ requires separate pre-charge and evaluate phases of operation, similar to
dynamic logic.
At the architecture and system levels, NMSs have been designed for associative mem-
ory [42], brain-state-in-a-box recall [43], temporal pattern recognition in a reservoir net-
work [44], implication logic [36], and RRAM architectures [17, 45–50]. This work focuses
on vision because of its numerous application domains and its well-established models
within the brain. Other groups have designed NMSs for vision-related applications. In [35]
a neuromemristive winner-take-all type network is designed to detect the position of an ob-
ject. STDP is used for unsupervised training. The authors of [51] propose the integration of
a memristor bridge synapse into a multilayer perceptron network for classifying automo-
biles. An NMS for optical character recognition (OCR) is designed in [52] using a simple
feedforward network and STDP training. In [53], the authors propose an NMS that uses
stochastic conductive bridge random access memory (CBRAM) devices for visual pattern
extraction.
2.4 Stochastic Computation
One of the contributions of this work is the design of an NMS training algorithm based on
stochastic logic. The roots of stochastic computing can be traced back to the work of John
2.4. Stochastic Computation 16
von Neumann. In his 1956 paper [54], von Neumann proposes using a bundle of N wires
to send a message. If (1−4)N or more wires carry a 1, then the message is interpreted as
a 1, where 0 ≤ 4 ≤ 1/2. If4N or fewer wires carry a 1, then the message is interpreted
as a 0. The idea of distributing a message across multiple wires or multiple time windows
on a single wire was attractive to the machine learning community. Indeed, stochastic
computing grew out of the need to reduce hardware complexity, power, and unreliability in
machine learning applications [55, 56]. Stochastic computing achieves these goals simply
by changing the way data is represented in a computer.
2.4.1 Stochastic Representation of a Digital Value
At the heart of stochastic computing is the stochastic representation of digital values. In a
unipolar [56] stochastic representation, an Ψ-bit number C ∈ {0, 1, . . . , 2Ψ−1} is mapped
to an N -bit stream (serial) or bundle (parallel) Cs = CsN−1 , CsN−2 , . . . , Cs0 . Then, Cs is
interpreted as the probability that any Csi will be a logic 1. For example, the stream or
bundle 0,1,0,1,0,0 represents the probability 2/6. The stream or bundle is characterized by
a Bernoulli process X = XN−1, XN−2, . . . , X0, where [57, 58]
x ≡ Pr(X = 1) ≡ Pr(Xi = 1) = 1− Pr(Xi = 0) = C
2Ψ − 1 (2.3)
Here, Xi = 0 if Csi = 0, or 1 otherwise. It is also possible to have a bipolar stochastic
representation such that Xi = −1 if Csi = 0 or 1 otherwise. In this case, x is defined as
[56]
x ≡ 2Pr(X = 1)− 1 (2.4)
Chapter 2. Background and Related Work 17
The bipolar representation is useful in the cases where negative numbers need to be repre-
sented.
Converting from the digital to the stochastic representation can be achieved using a
random number generator and a comparator. If the random number is less than or equal
to the value held in a register, then a logic 1 value is produced on the output. Otherwise,
the comparator output is logic 0 [57]. As N becomes large, Pr(X = 1) approaches the
value in (2.3). Linear feedback shift registers (LFSRs) are commonly used to generate
pseudorandom numbers. Note that the bits inCs could also be generated in parallel, but this
would require N independent random number generators and N comparators. Converting
from a stochastic representation back to a digital number can be achieved by counting the
number of 1s in the stochastic bit stream. In this work, analog values, rather than digital
values, are converted to stochastic bit streams, requiring different circuitry.
One advantage of the stochastic representation is its inherent fault tolerance. Consider
the binary representation of an Ψ + 1-bit binary number C = CΨCΨ−1 . . . C0. In the case
of a single soft error, one bit of C is flipped, producing a new number C ′. The maximum
error occurs when CΨ is flipped. In that case, the error is |C − C ′| = 2Ψ. However, when
the same number is represented stochastically, the maximum error is only |Cs − C ′s| = 1.
2.4.2 Stochastic Arithmetic Operations
Stochastic data representation has another key advantage, where arithmetic operations such
as multiplication become trivial to implement in hardware. For example, consider a 2-input
AND gate with inputs mapped to stochastic bit streams X1 and X2. Let the probabilities
(probability that a 1 will occur) of the bit streams be x1 = 3/6 and x2 = 2/6. The
2.5. Summary 18
output probability is y ≡ Pr(Y = 1) ≡ Pr(X1 = 1 and X2 = 1). If X1 and X2 are
statistically independent, then y = x1x2 = 1/6 [59]. Therefore, a single AND gate can be
used to multiply two numbers in the stochastic domain. In general, any logic function g =
f(I1, I2, . . . , In) with inputs mapped to independent stochastic bit streamsX1, X2, . . . , Xn
will have an output probability of [59]
y ≡ Pr(Y = 1) ≡
∑
I1,...,In:f(I1,...,In)=1
(
n∏
k=1
Pr(Xk = Ik)
)
, (2.5)
which is a multivariate polynomial in x1, x2, . . . , xn with integer coefficients and powers
no greater than 1. Integration, division, square root, and squaring operations have also been
demonstrated using stochastic logic [56, 57]. In [56], a state machine-based stochastic logic
architecture is used to implement more complex functions such as the hyperbolic tangent.
Several applications and examples of stochastic computing have been reported in the
literature. In [60], stochastic logic with parallel bit streams (bundles) is used to synthesize
logic on self-assembled nanowire crossbar arrays. A neurochip based on stochastic logic is
designed and fabricated in [61]. Overall, stochastic computing has several advantages over
deterministic computation. These include reduced hardware complexity, fault tolerance,
single-wire communication of signals, easy implementation of pipelining, and the ability
to trade off performance and accuracy [56].
2.5 Summary
This chapter reviewed background and existing work related to NMSs. The key topics/ideas
covered in this chapter were
Chapter 2. Background and Related Work 19
• The human visual system is composed of feedforward pathways with areas dedi-
cated to feature detection, clustering, and classification. Well-studied pathways in
the brain, such as the retino-geniculo-striate and ventral stream pathways can be
modeled using neural networks.
• The brain’s visual system is plastic, allowing us to make predictions about what we
see based on our past experience. This plasticity occurs at many levels of abstraction,
such as in the adaptive strength of synapses.
• Memristors have a behavioral similarity to biological synapses and have been used
in previous NMS designs to facilitate learning.
• Stochastic representation of data can reduce the hardware complexity of an NMS.
• Key aspects that are missing or have received limited attention from previous work
include exploration of current-mode circuit designs, circuit design based on realistic
(experimentally-driven) memristor models, exploration of non-monotonic neuronal
activation functions, reduced complexity stochastic training algorithms, exploration
of hardware-friendly topologies (e.g. single-layer networks), and the effect of device
variations on system-level performance.
Chapter 3
Device Models, Simulation
Strategy, and Design Methodology
This chapter discusses the MOSFET and memristor device models, as well as the simu-
lation strategy and circuit design methodology used in this work. A 45 nm low power
predictive technology MOSFET model (PTM) was used for all of the circuit designs pre-
sented in this research. The PTM model and a simplified behavioral MOSFET model are
discussed in Section 3.1. Section 3.2 discusses the memristor models used in this work.
Section 3.3 presents the simulation/verification strategy used in this research. Section 3.4
concludes this chapter with an overview of the simulation/verification strategy that was
adopted for circuit and system-level
3.1 45 nm Low Power PTM MOSFET Characterization
The MOSFET model chosen for this work is a 45 nm high threshold (low power) PTM
model. The detailed device behavior is captured using Berkeley’s BSIM4.0 (SPICE model
level 54). The BSIM4.0 model accounts for short channel effects, narrow width effects,
non-uniform doping, mobility degradation, velocity saturation, and many other non-ideal
behaviors. While this is good for verification purposes, the model becomes too complex
20
Chapter 3. Device Models, Simulation Strategy, and Design Methodology 21
for system-level design and analysis. Therefore, a simplified model is developed to capture
the essential device behavior in a more tractable set of equations.
3.1.1 Current-Voltage Characteristics
Almost all of the analog circuits designed in this work operate in subthreshold, where
the gate-source voltage vgs is well below the MOSFET threshold voltage Vth0. The long
channel model of the subthreshold drain current can be written as
i′ds = ids0exp
(
ζ1vgs − Vth0
nVT
)
, (3.1)
where
ids0 = ζ2β(n− 1)V 2T , (3.2)
and β ≡ µ0CoxW/L is the current factor, µ0 is the low field carrier mobility, Cox is the
oxide capacitance, W is the channel width, L is the channel length, n is the subthreshold
slope factor, VT is the thermal voltage, vgs is the gate-source voltage, ζ1 and ζ2 are fitting
parameters, and Vth0 is the threshold voltage. It is assumed that the drain-source voltage
vds ≥ 4VT ≈1 mV at room temperature (i.e. the device is saturated). Note that this model is
most accurate when VL ≤ vgs ≤ VM , where VL and VM define the depletion/weak inversion
and weak inversion/moderate inversion borders, respectively. This work uses VL = 0.2 V
and VM = Vth0 − 0.1 V. Although these values can be found rigorously based on the flat
band and Fermi voltages, the chosen values of VL and VM for the model being developed.
Now, (3.1) must be modified to match the results of SPICE simulations. First, the log (base
3.1. 45 nm Low Power PTM MOSFET Characterization 22
10) is taken of both sides, and then two additional fitting parameters ζ3, ζ4 are introduced:
log (ids) = ζ1log(e)
vgs
nVT
+ log
(
β (n− 1)V 2T
)− log(e)Vth0
nVT
+ f (ζ2, ζ3, ζ4) . (3.3)
Now, the ζ parameters, as well as the form of f , are adjusted to match the slope and
intercept of the log of the simulated voltage transfer characteristics in weak inversion. The
idea is illustrated in Figure 3.1 for an NMOS device with, W = L = 45 nm, vds = 1.1 V,
and Vsb = 0 V. Note that, unless stated otherwise, the source-body voltage Vsb = 0 V for
all devices in this work.
Parameter ζ1 can be found easily since the slope in the weak inversion region has very
little dependence on any of the free device parameters (W , L, vds, vgs)1. A straight line is fit
to log (ids) in the domain VL ≤ vgs ≤ VM for several different sets of {W,L, vds}, resulting
in multiple values of ζ1 = SnVT/log (e), where S is the slope of the line. Then the final
value of ζ1 is taken as the median over all of the parameter sets. In particular, W and L
are varied from 45 nm to 4500 nm, while vds is varied from 0.1 V to 1.1 V. The results are
shown in Table 3.1. Notice that there is little variation except in the case where L = 4500
nm and vds = 1.1 V. It is important to point out that the circuits presented in this work will
never have such extreme aspect ratios, and they are only included for completeness.
The function f is a little more complicated because the intercept depends on L and vds:
f (ζ2, ζ3, ζ4) = log (ζ2) +
log(e)ζ3exp (−ζ4L) vds
nVT
. (3.4)
1There is some dependence, however. For example, gate leakage will be related to the area of the device.
Chapter 3. Device Models, Simulation Strategy, and Design Methodology 23
Table 3.1: ζ1 parameters for different values of W , L, and vds.
ζ1
W [nm] L [nm] |vds| [V] NMOS PMOS
45 45 0.1 0.97 0.97
45 45 1.1 0.92 0.92
45 4500 0.1 0.94 0.92
45 4500 1.1 0.82 0.72
4500 45 0.1 0.97 0.97
500 45 1.1 0.92 0.92
4500 4500 0.1 0.97 0.98
4500 4500 1.1 0.97 0.97
Median: 0.96 0.95
These ζ parameters were found by simulating the devices at a fixed |vgs| = 0.4 V (approx-
imately midway between VL and VM ) while sweeping L and vds. Then, surfaces of the
form (3.3) were fit to the data. The results are shown in Figure 3.2, and the corresponding
ζ parameters are listed in Table 3.2, along with all of the other parameters required for the
simplified model. Note that (3.3) can also be written as
ids = i
′
dsexp
(
ζ3exp (−ζ4L) vds
nVT
)
= i′dsexp (Θ (L) vds) = i
′
dsΛ (vds, L) . (3.5)
In this form, it is easier to see the effect of vds and L on the I-V characteristics. For
example, when the channel length is small, there is an exponential dependence on vds, as is
the case in drain-induced barrier lowering. However, when the channel length is larger, the
vds dependence becomes weak, closer to the effect of channel length modulation.
3.1. 45 nm Low Power PTM MOSFET Characterization 24
vgs [V]
0 0.2 0.4 0.6 0.8 1
lo
g 
i d
s 
[lo
g A
]
-15
-10
-5
0
VM
   Weak
Inversion
VL
BSIM
Simplified
Model
Figure 3.1: Logarithm of the NMOS drain current versus vgs (W = L = 45 nm, vds = 1.1
V, Vsb = 0 V). The solid curve shows the result from SPICE simulation using the BSIM
model, while the dashed curve shows the proposed semi-empirical weak inversion model.
2000
L [nm]
400010.6
vds [V]
0.2
-10
-11
-7
-8
-9
lo
g 
i d
s 
[lo
g A
] Fit
BSIM
R2 =1
(a)
2000
L [nm]
400010.6
|vds| [V]
0.2
-10
-11
-7
-8
-9
lo
g 
|i d
s| 
[lo
g A
] Fit
BSIM
R2 =1
(b)
Figure 3.2: Fitting the L and vds-dependent ζ parameters for (a) NMOS and (b) PMOS
devices.
3.1.2 Output Impedance
A critical characteristic of a MOSFET is its output impedance ro, which is defined as
ro ≡
(
∂ids
∂vds
)−1
. (3.6)
Chapter 3. Device Models, Simulation Strategy, and Design Methodology 25
Table 3.2: Low power PTM 45 nm MOSFET parameters.
Parameter NMOS PMOS Description
Cox [F/m2] 0.019 0.019 Oxide capacitance
Vth0 [V] 0.62 -0.59 Threshold voltage at Vsb = 0
µ0 [m2/Vs] 0.049 0.021 Low field mobility
n 1.5 1.5 n ≡
(
dψsa
dVgb
)−1
ζ1 0.96 0.95 Fitting parameter
ζ2 6.742 3.838 Fitting parameter
ζ3 0.2756 0.3425 Fitting parameter
ζ4 3.3956×107 3.4489×107 Fitting parameter
AVth0 [mVµm] 1.8 1.8 Process constant describing
threshold voltage mismatch
From (3.5), ro can be written as
ro =
1
idsΘ (L)
. (3.7)
The drain current and channel length have opposite effects on the output impedance, as
expected.
3.1.3 Diode-Connected MOSFETs
This work makes frequent use of diode-connected MOSFETs, where the device’s gate is
connected to its drain. Equivalently, vgs = vds, so (3.5) can be solved in terms of ids as
vgs =
nVT ln
(
ids
ids0
)
+ Vth0
ζ1 + nVTΘ (L)
, (3.8)
where ln(·) is the natural logarithm.
3.1. 45 nm Low Power PTM MOSFET Characterization 26
3.1.4 ON Resistance
An important superthreshold characteristic of the MOSFETs used in this work is their ON
resistance, which is measured when |vgs| and |vds| are at their maximum values (i.e. VDD +
VSS). Figure 3.3 shows the ON resistance of NMOS and PMOS devices with varying
channel widths. All channel lengths are equal to L = 45 nm. The large values of the
ON resistance result in large IR drops when MOSFETS are used as analog switches (e.g.
transmission gates or pass transistors). This has a direct implication for the maximum
conductanceGmon that a memristor can have in order to program it using a transistor circuit.
Channel Width [nm]
45 450
O
N
 R
es
ist
an
ce
 [k
+
]
0
20
40
60
80
100
NMOS
PMOS
Figure 3.3: ON resistance of 45 nm low-power PTM MOSFETs.
3.1.5 Process Variations
The accuracy (target output voltage or current vs. actual output voltage or current) of
CMOS analog circuits is highly sensitive to physical parameters that vary across devices
on the same die. In general, mismatch in both β and Vth0 parameters are modeled in order
to capture these effects. However, in subthreshold operation, variations in Vth0 have a
dominant exponential effect on the variations in circuit behavior, so variations in β can
Chapter 3. Device Models, Simulation Strategy, and Design Methodology 27
be ignored. From Pelgrom’s work, it can be shown that the variance of the difference in
threshold voltage Vth0 between two closely-spaced MOSFETs is given as [62, 63]
σ2(∆Vth0) = σ
2(δVth01) + σ
2(δVth02) =
A2Vth0
2W1L1
+
A2Vth0
2W2L2
, (3.9)
where AVth0 is a process-dependent constant. In this work, a value of AVth0 = 1.8 mVµm
was used for both NMOS and PMOS devices, which follows from the empirical law
Avth0 = (1 mVµm/nm)tox [64]. This value of AVth0 is also in close agreement with data
published from Intel’s 45 nm process [65]. Equation (3.9) reveals that transistors with
larger areas will match better, so there is an inherent area/reliability tradeoff. This model
of the threshold voltage variation will be used to assess the random variations in NMS
primitive circuits.
3.2 Memristor Models
NMS design choices are guided by the characteristics of a target memristor technology.
Therefore, a memristor model must be chosen that accurately reflects the behavior of fabri-
cated devices. The best choice would be a model that is in use by a majority of other NMS
researchers. This would allow for a fair comparison of state-of-the-art NMS designs to
those proposed in this work. However, each group bases their designs on different models,
making it impossible to identify one standard choice. Even more difficult is the fact that
memristor materials, fabrication processes, and theory have been rapidly evolving.
To overcome these challenges, a semi-empirical model is developed that is strongly
rooted in memristor device physics, but flexible enough to account for static and dynamic
3.2. Memristor Models 28
behavior which are not yet fully understood. Highlighted here are the two key theoretical
components that must be modeled. The first is the I-V relationship when the memristor is
in a fixed state. The second is the evolution of the memristor state with the application of
an electric field.
This work makes use of two types of memristors with different I-V and switching char-
acteristics. The first is a silver chalcogenide device that exhibits incremental conductance
switching and a weak exponential I-V relationship. The second is a bi-stable (two conduc-
tance states) CBRAM device that has stochastic switching behavior. Critically, both de-
vices are compatible with CMOS fabrication and can be integrated into a back end of line
(BEOL) process [53, 66]. This enables the heterogeneous design of MOSFET/memristor
circuits.
3.2.1 Silver Chalcogenide Memristor
A memristor is a thin, nominally insulating film, sandwiched between two electrodes. How
insulating the film is depends on its state, which effectively modifies the film’s band gap
in different regions. Figure 3.4 shows a memristor with a particular distribution of defects
in the insulating film. A simplified energy band diagram of a 1-dimensional slice (dashed
rectangle) is shown on the bottom. It is assumed that a voltage vm has been applied across
the electrodes, from the right to the left. If the insulator film were actually a conductor,
then electrons would drift in the electric field from the left electrode to the right electrode.
However, the insulator imposes a potential energy barrier which has to be overcome. Notice
that there are valleys in the energy barrier corresponding to local defects. If the distance a
between the valleys is small enough, then electrons can tunnel between defect sites, giving
Chapter 3. Device Models, Simulation Strategy, and Design Methodology 29
Metal MetalInsulator
Fermi
Level
Fermi
Level
𝒆𝒗𝒎
𝐷
a
Vacuum
Level
Memristor with band diagram showing electron tunneling.
= Electron= Defect
Figure 3.4: Memristor with energy band diagra showing dips in the energy level of the
insulator corresponding to defect states.
rise to a current. One of the first to accurately model this phenomenon was Simmons
in 1963 [67]. Outlined here are the key points of his derivation and how it leads to a
semi-empirical memristor model that shows excellent agreement with a wide variety of
experimental devices.
To start, consider the time-independent Schro¨dinger equation in one dimension:
− ~
2
2m
d2ψ
dx2
+ U(x)ψ = Eψ, (3.10)
where ~ is Planck’s constant divided by 2pi, m is the mass of an electron, ψ is the wave
function, x is the horizontal dimension, U is the potential energy, and E is the total energy.
Now, consider an electron in Figure 3.4. On approaching one of the potential hills within
3.2. Memristor Models 30
the insulator, part of its incident wave function will be reflected, and part of it will be trans-
mitted to the next valley. Therefore, the probability that the electron will tunnel between
two valleys will be the ratio of the squares of the wave function amplitudes of the incident
wave and the transmitted wave. Using the WKB approximation, it can be shown that the
tunneling probability is [68]
T (Ex) ≈ exp
−2
~
a∫
0
|p(x)| dx
 , (3.11)
where p is the electron momentum, and x = 0 is taken to be the left side of the energy
barrier in question. Now, the total probability of tunneling will be the sum of all the prob-
abilities at each energy level E times the number of electrons in that energy level, which
can be described by Fermi-Dirac statistics. Simmons showed that the total probability can
be written as
T = exp
(
α2a
√
E¯a
)
, (3.12)
where α2 is a constant and E¯a is the mean height of the barrier, or activation energy. Now,
the electron current can be written as
im ∝ ν (TR − TL) , (3.13)
where ν is the attempt-to-escape frequency, TR and TL are the probabilities of an electron
tunneling to the right and left, respectively. Of course, the difference between E¯aR and E¯aL
will depend on the electric field, which creates higher barriers at the cathode (left electrode),
and lower barriers at the anode (right electrode). With a few more approximations, it can
Chapter 3. Device Models, Simulation Strategy, and Design Methodology 31
be shown that [67, 69]
im = α1exp
(
α2
(
E¯a −
√
F0a
2
))
sinh
(√
α3vma2E¯a − F0a/2
)
, (3.14)
where α1, α2, and α3 are constants, and F0 is the electric field at x = 0.
The state of a memristor γ can be defined in several ways. In this work, γ characterizes
the distribution of defects within the memristor film. Like electrons and holes, defects,
whether they are interstitial ions (Frenkel defects), vacancies (Schottky defects), etc., can
drift in an electric field. Suppose, for example, that two interstitial sites are separated by
a potential energy barrier of width a and height Ea. The probability that an interstitial ion
will move from one site to the other is related to the fraction of time that its energy E is
above Ea, which is given by Boltzmann’s law as exp
(−Ea
kT
)
. Now, when an electric field
is applied to the film, it is easy to show that the mean drift velocity and, hence, the rate of
change of the memristor state can be expressed as [70]
dγ
dt
= χ (vm(t)) = νaexp
(
−Ea
kT
)
sinh
(
qavm/D
kT
)
, (3.15)
where ν is the lattice vibration frequency. Notice that at low voltages and nominal tem-
peratures (e.g. room temperature), changes in γ are very small, resulting in non-volatile
behavior.
Equations (3.14) and (3.15) can be modified to create a semi-empirical memristor
model, with fitting parameters that can be adjusted to fit a wide range of experimental
devices. This idea has been previously explored by Yakopcic et al., [71]. In this work, a
3.2. Memristor Models 32
few small adjustments are made in order to make the model more amenable to constrained
non-linear curve fitting. First, the state variable γ must be incorporated into the (3.14).
Since γ characterizes the distribution of defects within the memristor film, it should have
some dependence on the tunneling barrier width a, which appears in the exponential and the
sinh functions in (3.14). However, only including the exponential dependence still yields
an accurate model with fewer fitting parameters. This leads to the following expression for
the memristor I-V characteristic:
im =

γGmonvm + (1− γ)Gmoffξ+1 sinh
(
vm
ξ+1
)
, vm ≥ 0
γGmonvm + (1− γ)Gmoffξ−1 sinh
(
vm
ξ−1
)
, vm < 0
, (3.16)
ξ
+(−)
1 are fitting parameters for the positive (negative) portion of the I-V characteristic.
Notice that, when vm is small, sinh
(
vm/ξ
+(−)
1
)
≈ vm/ξ+(−)1 , so
im ≈ [γGmon + (1− γ)Gmoff ] vm, vm ≈ 0, (3.17)
which agrees with other models, such as the linear ionic drift model [19]. In contrast, the
model presented in [27] has im ∝ γsinh (vm), implying that the state variable must be
non-zero to have any current flow, which is not in agreement with other models.
Next, (3.15) is modified to generate a tractable expression for the evolution of γ. First,
the activation energy Ea is taken as a hard threshold voltage, below which there will be no
Chapter 3. Device Models, Simulation Strategy, and Design Methodology 33
change in state. The change in the state variable becomes
∆γ
∆t
= χ (vm(t)) =

ξ+4 sinh
(
ξ+5 vm(t)− ξ+6 Vtp
)
fwin (γ) , vm > Vtp
ξ−4 sinh
(
ξ−5 vm(t)− ξ−6 Vtn
)
fwin (γ) , vm < Vtn
0, otherwise
, (3.18)
where Vtp and Vtn are the positive and negative memristor threshold voltages, which are
usually easy to identify from experimental data. In addition, the ξ+(−)i parameters are for
fitting, and fwin is a window function that accounts for degradation in state variable evolu-
tion near the boundaries of the film. The use of window functions in memristor models was
first proposed by Biolek et al. [23], where fwin was a concave geometric function. This
work adopts an exponential decay for the window function, as in [72]. The function takes
the form
fwin (γ) =

exp
[−ξ+2 (γ − ξ+3 ]) 1−γ1−ξ+3 , vm ≥ 0, γ ≥ ξ+3
exp
[
ξ−2
(
γ − ξ−3
)]
γ
ξ−3
, vm < 0, γ ≤ ξ−3
1, otherwise
, (3.19)
where ξ+(−)i are fitting parameters. For vm ≥ 0, fwin = 1 from γ = 0 up to γ = ξ+3 , then
exponentially decays, and becomes 0 at γ = 1. For vm ≤ 0, fwin = 1 from γ = 1 down to
γ = ξ−3 , then exponentially decays, and becomes 0 at γ = 0. These boundary conditions
ensure that γ stays between 0 and 1. Physically, this models the fact that defects generally
won’t drift into the electrode materials.
The model described above was fit to experimental data from a silver chalcogenide
device published in [66]. The material stack is shown in Figure 3.5. Figure 3.6 shows the
3.2. Memristor Models 34
W Top Electrode
Ag (50 nm)
W Bottom Electrode
Ag2Se (50 nm)
Ge2Se3 (50 nm)Nitride
180 nm
Silver chalcogenide memristor from Boise.
agmemstack.pdf
Figure 3.5: Silver chalcogenide memristor stack (adapted from [66]).
memristor current versus time when a sinusoidal voltage is applied [73]. The top plot shows
the voltage across the memristor, which was placed in series with a 1.6 kΩ resistor. The
bottom plot shows the memristor current vs. time for 8 different devices. Inter-device and
inter-cycle variations are observed. The devices show small inter-cycle variations, which
are not modeled in this work. Instead, the mean is taken over the five cycles to generate
I-V characteristics for each of the 8 devices. Since the inter-cycle variations are small, the
averaging does not cause significant distortion to the characteristic features.
A single I-V curve needs to be generated from the 8 devices to determine the parameters
for the nominal memristor model. One approach is to take the mean response over all of
the devices. However, given the large inter-device variation, especially inGmon andGmoff ,
this leads to smoothing that does not faithfully represent the behavior of any one device.
Instead, the device with values of Gmon and Gmoff that are closest to the mean is chosen as
the nominal device. Figure 3.7 shows the on and off conductances for each of the 8 devices.
Horizontal dashed lines indicate the mean values, 〈Gmon〉 and 〈Gmoff〉. Above each device
is the sum of the differences (in percentage) of that device’s on and off conductances from
Chapter 3. Device Models, Simulation Strategy, and Design Methodology 35
-25 -20 -15 -10 -5 0 5 10 15 20 25
v m
 
[V
]
-1
0
1
t [ms]
-25 -20 -15 -10 -5 0 5 10 15 20 25
i m
 
[7
A
]
-500
0
500
Figure 3.6: Silver chalcogenide current versus time resulting from the application of a
sinusoidal voltage. (Top) Voltage across the memristor vs. time. (Bottom) Current flowing
through the memristor vs. time for 8 devices [73].
Device
1 2 3 4 5 6 7 8
C
on
du
ct
an
ce
 [S
]
10 -6
10 -4
10 -2 Gmon
Gmoff
hGmoni
hGmoffi
274% 145%  50%   18%   85%   56%   76%   118%
Figure 3.7: on and off conductances of 8 silver chalcogenide memristor devices. Dashed
lines show the mean values. Shown above the bars for each device is the sum of the percent
difference between that device’s on and off conductance and the mean values.
the mean values:
%diff ≡ 100%
( |Gmoni − 〈Gmon〉|
〈Gmon〉 +
|Gmoffi − 〈Gmoff〉|
〈Gmoff〉
)
, (3.20)
3.2. Memristor Models 36
where i is the device number. Device 4 has the smallest sum of differences from the means,
so it is used as the nominal device. Table 3.3 shows the fitting parameters, which were
found in MATLAB using a constrained non-linear optimization technique. Figure 3.8
shows the I-V plot of the experimental data and the model. The model matches the ex-
perimental data with a mean error of 25%.
Table 3.3: Model parameters for a silver chalcogenide memristor.
Parameter Value Bounds
ξ+1 0.9934 [0,+∞]
ξ+2 2.5275 [0,+∞]
ξ+3 0.3394 [0,1]
ξ+4 113.5 [0,+∞]
ξ+5 3.8153 [0,+∞]
ξ+6 -2.0429 [−∞,+∞]
ξ−1 0.2727 [0,+∞]
ξ−2 4.2894 [0,+∞]
ξ−3 0.4837 [0,1]
ξ−4 106.2875 [0,+∞]
ξ−5 4.0992 [0,+∞]
ξ−6 -3.0634 [−∞,+∞]
Vtp 0.4 V
Vtn -0.55 V
Gmon (1800Ω)
−1
Gmoff (4.637× 104Ω)−1
g 25.76
The evolution of the memristor state variable γ is critical to an NMS’s learning process.
Figure 3.9 shows several different state transitions when multiple write pulses of magni-
tude vw and duration tw are applied to the silver chalcogenide device. In Figure 3.9(a),
|vw| =0.75 V, and tw=1 ns. The transitions from 0 to 1 and 1 to 0 both take ≈ 107 pulses.
When the pulse magnitude is increased to 1 V (Figure 3.9(b)), the number of pulses re-
quired to transition is reduced by 2-3 orders of magnitude. Figures 3.9(c) and 3.9(d) show
Chapter 3. Device Models, Simulation Strategy, and Design Methodology 37
vm [V]
-1 -0.5 0 0.5 1
i m
 
[m
A]
-0.4
-0.2
0
0.2
0.4
Experimental Data
Model
Figure 3.8: Silver chacogenide memristor I-V characteristic: Experimental data [73] and
semi-empirical model.
the evolution of γ when the pulse duration is increased to 1 µs, with pulse magnitudes of
0.75 V and 1 V, respectively. The number of write pulses required to change γ between the
two extreme values is significantly reduced. However, the energy consumption per write
pulse will obviously increase with increased tw and vw. Therefore, there will be a tradeoff
between an NMS’s power consumption and the rate at which it can adapt/learn. This is
explored in more detail throughout this dissertation. Finally, it should be noted that there is
an inherent asymmetry in the memristor switching characteristics, which is evident from all
of the plots in Figure 3.9. In fact, most memristive devices show some form of switching
asymmetry. As a consequence, synapses and other memristor circuits in an NMS will gen-
erally adapt at different rates depending on whether they or being potentiated or depressed.
3.2.2 CBRAM Memristor
CBRAM memristors operate on the principle of reversible electrochemical reactions. The
device is an MIM type two-terminal structure, with an active solid electrolyte sandwiched
3.2. Memristor Models 38
Pulse Number
10 0 10 5 10 10
.
0
0.2
0.4
0.6
0.8
1
0!  1
1!  0
(a)
Pulse Number
10 0 10 5 10 10
.
0
0.2
0.4
0.6
0.8
1
0!  1
1!  0
(b)
Pulse Number
10 0 10 5 10 10
.
0
0.2
0.4
0.6
0.8
1
0!  1
1!  0
(c)
Pulse Number
10 0 10 5 10 10
.
0
0.2
0.4
0.6
0.8
1
0! 1
1! 0
(d)
Figure 3.9: Change in state variable γ vs. number of applied write pulses. (a) |vw| = 0.75
V, tw = 1 ns. (b) |vw| = 1.0 V, tw = 1 ns. (c) |vw| = 0.75 V, tw = 1µs. (d) |vw| = 1.0 V,
tw = 1µs.
between two metal electrodes [74]. In particular, the devices studied in this work 2 consisted
of an active Ag top electrode (anode), inert W bottom electrode, and 30 nm thick GeS2
electrolyte [75] (See Figure 3.10). On application of a positive write voltage vw, Ag atoms
from the anode are oxidized and the resulting ions enter the switching layer where they
drift to the cathode. At the cathode, the Ag ions are reduced. Over time there is formation
of a conductive Ag filament in the GeS2 layer, changing the CBRAM conductance Gm to a
high conductance on state. On reversing the polarity of the applied voltage, the conductive
Ag filament gets dissolved, changing Gm to a low conductance off state [76].
2The CBRAM devices used for this work were fabricated and tested at CEA-LETI-Grenoble.
Chapter 3. Device Models, Simulation Strategy, and Design Methodology 39
Ag
GeS2
W
50 nm
CBRAM device from Suri, et al.
cbramstack.pdf
Figure 3.10: SEM image of the CBRAM device used in this work [75].
Previously, CBRAM devices have been modeled by considering the growth/dissolution
rate of the metallic filament in the switching layer [77, 78]:
dh
dt
= vhexp
(−Ea
kT
)
sinh
(
zqEa
2kT
)
(3.21)
dr
dt
= vrexp
(−Ea
kT
)
sinh
(
βqvw
kT
)
. (3.22)
In (3.21), h is the filament height or length, Ea is the activation energy required for the
metal ions to drift, k is Boltzmann’s constant, T is temperature, q is the ion charge, Z is
the number of charged ions, E is the electric field, and vh and a are fitting parameters. In
(3.22), r is the radius of the metallic filament, V is the applied voltage, and vr and β are
fitting parameters. Finally, the on (Gmon) and off (Gmoff ) conductance values can easily
be obtained from the geometries of the conductive filament and CBRAM cell, as well as
the resistivities of the materials in the CBRAM stack. All of the model parameters for
Ag/GeS2/W devices can be found in [77].
Recently, Suri et al., showed that CBRAM devices switch stochastically under weak
programming conditions (small applied write voltages and programming durations) [53].
For example, it was shown that write voltages around 1.5 V applied for 500 ns can be
used to achieve switching probabilities pswitch in Ag/GeS2/W devices that are below 0.1.
3.2. Memristor Models 40
Later, it will be shown that this probabilistic switching phenomena is very useful for de-
signing and training CBRAM-based synapse circuits. Unfortunately, this behavior is not
easily described within the theoretical framework discussed above. Therefore, a simple
phenomenological model is developed based on the experimental data presented in [53].
For simplicity, it is assumed that the CBRAM switching behavior is symmetric. That is,
the metallic filament growth and dissolution rates are equal when voltages of equal magni-
tude and opposite sign are applied to the CBRAM device. This assumption is supported by
(3.21) and (3.22) since the sinh function is symmetric about the origin. Figure 3.11 shows
the switching probability of CBRAM devices versus the applied flux (φ =
∫
vwdt) from
the data presented in [53]. The data are fit to a log-normal cumulative distribution function
(CDF):
pswitch =
1
2
+
1
2
erf
[
lnφ− µ√
2σ
]
(3.23)
erf is the error function, and µ and σ are fitting parameters which are associated with the
corresponding normal distribution. Other functions such as the gamma and beta CDFs also
fit the data well. However, it was shown in [53] that the CBRAM on and off conductance
states are log-normally distributed, so it is hypothesized that the processes governing the
switching probability are also associated with a log-normal distribution. It is important to
note that the pswitch model presented here is only valid if the applied write voltage is large
enough so that the Ag ions can drift inside the GeS2 lattice. It is estimated in [78] that the
required activation energy is 0.4 eV. This work uses write voltages that have experimentally
been shown to induce change in the device’s conductance states, thereby ensuring that the
model is valid.
Chapter 3. Device Models, Simulation Strategy, and Design Methodology 41
The overall CBRAM model is described by
∆γ = Ssgnφ (γ − sgnφ) (3.24)
and
Gm :=

Gm : sgn (∆γ) = 0
Gmoff : sgn (∆γ) = −1
Gmon : sgn (∆γ) = +1
. (3.25)
Here, S is a Bernoulli-distributed random variable with a success probability equal to
pswitch. The sgn function is -1, 0, or 1 for negative, zero, or positive arguments, respec-
tively. The variable γ ∈ {−1, 1} represents the state of the CBRAM device, with -1 cor-
responding to an off state and 1 corresponding to an on state. Note that if the sign of the
applied flux is the same as the sign of the state variable, then the device will not switch.
This is a simplified model as, for example, applying a positive flux to a device already in
the on state may still create some small changes in the conductance. From (3.25), it can be
seen that if the state variable changes to a negative (positive) value, then the device’s con-
ductance will change to Gmoff (Gmon) which is sampled from a log-normal distribution.
The variations in these conductances and the switching probabilities are discussed below.
3.2.3 Process Variations
There are several process variations and wearout mechanisms that affect memristor I-V
and switching characteristics. This work considers two variations that are particularly im-
portant when using a memristor as a synapse in an NMS. The first is the variation in the
3.2. Memristor Models 42
Applied Flux [μVs]
0 0.5 1 1.5 2 2.5
Sw
itc
hi
ng
 P
ro
ba
bi
lit
y 
(p
sw
it
ch
)
0
0.2
0.4
0.6
0.8
1
1.2
Experimental Data
Fit
Figure 3.11: CBRAM switching probability vs. applied flux.
minimum and maximum memristor conductances, Gmoff and Gmon. These variations are
caused by various non-idealities in the fabrication process such as line edge roughness,
memristor thickness fluctuation, and random discrete doping, among others [79–82]. In
general, the most important characteristic to control during fabrication is the switching
layer’s defect profile [13, 15] (e.g. number of vacancies, interstitial defects, grain bound-
aries, etc.), which affects the device’s on and off conductances as well as its switching time,
and threshold voltages. The effects of these variations on Gmon and Gmoff have been es-
timated in previous work. For example, in [81], the authors present a device with nominal
conductance values of Gmon = 1.0×10−2 f and Gmoff = 6.25×10−5 f and estimate the
3σ variation in the on and off conductances to be ≈ 7%. In this work, the variations in
Gmon and Gmoff were measured from the Ag chalcogenide and CBRAM memristor exper-
imental data. The results are shown in Table 3.4. The variations in these maximum and
minimum conductances will directly affect the maximum and minimum weight values that
can be achieved in synapse circuits, which are discussed in the next section. This can be
particularly problematic when the network weights are limited to a small range, such as
Chapter 3. Device Models, Simulation Strategy, and Design Methodology 43
Table 3.4: Memristor variation parameters for Ag chalcogenide and CBRAM devices.
Device Parameter Ag Chalcogenide CBRAM
µ (pswitch) [ln (µVs)] N/A 0.024
σ (pswitch) [ln (µVs)] N/A 0.587
µ (Gmoff ) [S] 2.08×10−5 1.12×10−6
σ (Gmoff ) [%] 119 128
µ (Gmon) [S] 7.26×10−4 0.38×10−3
σ (Gmon) [%] 28.3 9.46
σ (tw) [%] 10 N/A
[-1,1], for the following reason: The distribution of weights within a trained network with
unrestricted weight values is typically Gaussian [83]. However, when the weights are re-
stricted, the distribution changes such that most of the values lie at the extrema of the range.
Therefore, many memristive synapses affected by process variations may not be able to be
programmed to an ideal weight value for a specific NMS application.
In addition to Gmon and Gmoff , the write time tw of a memristor (the time that it takes
to change its conductance between two values) is also affected by process variations [81].
In particular, variations in the thickness of the memristor film will have a non-linear effect
on the write time. In [81], the authors estimate variations in the film thickness to be ≈ 2%.
Assuming that the write time is proportional to the inverse of the thickness squared, and the
nominal thickness is≈50 nm, the standard deviation of the write time will be 4%. However,
a more conservative estimate of 10% is used in this work to account for other factors that
will affect the write time, including variations in defect mobility, etc. The variations in
write time will affect the learning/programming rate of the weights within the NMS. For
CBRAM devices, there are variations in the switching probability pswitch instead of tw. The
variation parameters for tw and pswitch are shown in Table 3.4.
3.3. Simulation Strategy 44
Number of Vectors (m)
10 0 10 1 10 2 10 3
Si
m
ul
at
io
n 
Ti
m
e [
s]
10 -1
10 0
10 1
10 2
10 3
N=1
N=10
N=100
N=1000
(a)
Number of Vectors (m)
10 0 10 1 10 2 10 3
Si
m
ul
at
io
n 
Ti
m
e [
s]
10 -6
10 -5
10 -4
10 -3
10 -2
N=1
N=10
N=100
N=1000
(b)
Figure 3.12: Simulation times for a single-layer perceptron with different numbers of inputs
N and test vectors m applied. (a) Mean HSPICE simulation time. (b) Mean MATLAB
simulation time.
3.3 Simulation Strategy
The most accurate way to verify an NMS is to simulate it entirely in SPICE. SPICE simu-
lators use nodal analysis to to numerically solve circuits composed of linear and non-linear
elements in the steady state, time, or frequency domains. However, SPICE simulation is
prohibitively time consuming and doesn’t allow for rapid feedback in the design process.
To illustrate this, a single-layer perceptron was simulated in both HSPICE (a SPICE sim-
ulator from Synopsys) and MATLAB, a mathematical modeling language and simulation
environment that is optimized for linear algebra computations. The perceptron is a good
sub-architecture to evaluate simulation time because it is an essential building block in an
NMS (and most neuromorphic systems), and is compounded many times to create a large-
scale system. Figure 3.12(a) shows the simulation time of a perceptron in HSPICE. A very
simple current mirror synapse (discussed in the next chapter) connects each linear input
neuron to a linear output neuron. A number of vectors between 1 and 1000 were applied to
Chapter 3. Device Models, Simulation Strategy, and Design Methodology 45
the inputs (which were either all ‘0’s or all ‘1’s) for perceptrons of different sizes. From the
plot, the simulation time appears to be superlinear in both N and m. More precisely, the
simulation time will grow with the switching activity and the number of circuit elements
present because each time the voltage or current at a particular node is changed, the SPICE
simulator has to reevaluate device models and solve nodal equations. Indeed, it has been
shown that neural network simulation times in SPICE grow almost quadratically with the
size of the network [84]. Figure 3.12(b) shows the same perceptrons simulated in MAT-
LAB, where the output is simply given by y = u · w. Notice that the simulation time is
several orders of magnitude lower than those from HSPICE. There is still a linear depen-
dence on the number of input vectors. This is because MATLAB has to evaluate a new
dot product for each input. Importantly, however, the simulation time has a weak sublinear
dependence on the size of the perceptron. This is critical because, for a given problem, the
number of training/test patterns applied to an NMS is usually fixed, while the size of the
NMS will grow to achieve better performance. The speedup (calculated by dividing the
HSPICE simulation time by the MATLAB simulation time) is shown in Figure 3.13. For
small perceptron sizes, the speedup drops off linearly with the number of input vectors.
This is because each dot product is computed in the body of a for loop. Since MATLAB
is an interpreted language the time spent evaluating the for loop code is comparable to
the time spent in the body when N is small. However, as N becomes large, the dot product
evaluation takes the majority of the time, and the speedup curve starts to flatten out. A
speedup of ≈105 is observed for N=1000, m=1000, which is a modest size, considering
that each NMS will have more than 1 perceptron, and each input pattern will be evaluated
multiple times.
3.3. Simulation Strategy 46
Number of Vectors (m)
10 0 10 1 10 2 10 3
Sp
ee
du
p 
(M
A
T
L
A
B
/H
SP
IC
E
)
10 2
10 3
10 4
10 5
10 6
N=1 
N=10 
N=100 
N=1000
Figure 3.13: Speedup of MATLAB simulation compared to HSPICE for a single-layer
perceptron.
Circuit 
Block
Analytical 
Model 
Creation
…
Neuron 
Circuits
Synapse 
Circuits
HSPICE Simulation of Circuit Blocks
DC, AC, 
and 
Transient 
Simulation 
Results
MATLAB Simulation of NMS
Model Verification
Behavioral, 
Variation, 
Power, and 
Area 
Models
… … …
NMS simulation strategy
smssimulation.pdf
Estimated 
Performance, 
Power, and 
Area
Memristor 
Models
MOSFET 
Models
Variability 
Parameters
Technology 
Node
Figure 3.14: NMS simulation strategy, where HSPICE is used for circuit-level design and
analysis, and MATLAB is used for system-level simulation.
In order to take advantage of this large speedup, this dissertation adopts a simulation
strategy that makes use of both HSPICE and MATLAB (Figure 3.14). Small circuit blocks
are designed and simulated using HSPICE. These include neurons, synapses, and other
neuromemristive primitives, which are discussed in the next chapter. Then, the simula-
tion results are used to create analytical models of the circuits’ behavior, variability, power
consumption, and area. These models are checked against the HSPICE simulations and ad-
justed to maximize their accuracy. Once the models have been verified, they are integrated
into a MATLAB toolbox developed in this dissertation, which is used for all system-level
Chapter 3. Device Models, Simulation Strategy, and Design Methodology 47
simulations. The advantages of this approach are that it reduces NMS simulation time by
several orders of magnitude and it affords the user the convenience of the MATLAB envi-
ronment, which easily integrates with computer vision toolboxes. The obvious drawback
is that there will be some loss of accuracy when approximating circuit behavior. In addi-
tion, the behavioral models developed will not include any information about timing and
transient behavior, so one has to rely on rough estimates to measure overall latency.
3.4 Current-mode Designs and Analog Signal Representation
So far, this chapter has outlined the device models and simulation strategy that are used
in this work. In the next chapters, those models and simulation techniques will be ap-
plied to the design of synapse, neuron, and plasticity circuits for NMSs. However, there
is one important design choice that must be made before continuing with the design of
primitive circuits: Should NMSs perform computations on information represented by cur-
rents (current-mode), voltages (voltage-mode), or a combination of the two? Current-mode
design techniques date back to 1975, when Gilbert proposed a general class of circuits
composed of devices (called translinear elements) that have an exponential current-voltage
relationship [85]. Gilbert’s translinear principle states that circuits configured with loops
of translinear elements (translinear circuits) behave in a very predictable way: The prod-
ucts of the currents flowing in one direction equal the products of the currents flowing in the
other direction. Initially, the translinear principle was demonstrated with bipolar junction
transistor (BJT) devices, but it is also applicable to MOS devices operating in weak inver-
sion. Complex computations such as vector magnitude calculations can be implemented
in current-mode using the translinear principle with a handful of transistors. There is no
3.4. Current-mode Designs and Analog Signal Representation 48
similar design principle for voltage-mode circuits.
Current-mode circuits have several other advantages over voltage-mode designs. They
are generally able to operate at lower supply voltages and typically can achieve higher
bandwidths, sometimes approaching the MOSFET intrinsic frequency fT [86]. In addition,
current representations of information have an inherent advantage in terms of communi-
cation. When voltages are sent along long routing paths, they incur losses due to series
resistances, diminishing the integrity of the signal. Biology’s solution to this problem has
been to send long-range communications in the form of spikes which are regenerated along
myelenated axons. However, it is still largely unknown how neural information is encoded
in spikes and rate encoding is still the dominant scheme used in hardware implementations
of spiking networks. It is often easier to represent spiking rates in hardware as continuous
analog values, albeit with some reduced noise tolerance. However, buffering analog volt-
ages requires expensive hardware, with carefully-designed amplifier circuits (e.g. common
drain amplifiers) to achieve unity gain. In addition, simple analog voltage buffers typically
operate in small-signal operating regions and require very careful biasing to obtain zero
offset. Better designs typically employ a source follower op-amp configuration which can
handle rail-to-rail input and output signals. However, even the simplest op-amps consisting
of differential and gain stages require 7 transistors. Contrast that with current-mode de-
signs, which can communicate information over long distances with relatively little signal
degradation.
In addition to the general advantages of current-mode circuits, there are also specific
benefits when these designs are used to implement neuromemrisitive architectures and sys-
tems. Consider the two configurations in Figure 3.15. In the first case (Figure 3.15(a)),
Chapter 3. Device Models, Simulation Strategy, and Design Methodology 49
Voltage and current-mode NMSs
voltage_mode_neuron_fanout.pdf
current_mode_neuron_fanout.pdf
𝑖
𝑥𝑗
(𝑙)
𝑤𝑖−1,𝑗
(𝑙+1)
𝑤𝑖,𝑗
(𝑙+1)
𝑤𝑖+1,𝑗
(𝑙+1)
𝑖
𝑥𝑗
(𝑙) 𝑖
𝑥𝑗
(𝑙) 𝑖
𝑥𝑗
(𝑙)
… …
𝑣𝑥𝑗
(𝑙)
𝑤𝑖−1,𝑗
(𝑙+1)
𝑤𝑖,𝑗
(𝑙+1)
𝑤𝑖+1,𝑗
(𝑙+1)
𝑣
𝑥𝑗
(𝑙) 𝑣
𝑥𝑗
(𝑙) 𝑣
𝑥𝑗
(𝑙)
… …
𝑅𝑜
(a)
Voltage and current-mode NMSs
voltage_mode_neuron_fanout.pdf
current_mode_neuron_fanout.pdf
𝑖
𝑥𝑗
(𝑙)
𝑤𝑖−1,𝑗
(𝑙+1)
𝑤𝑖,𝑗
(𝑙+1)
𝑤𝑖+1,𝑗
(𝑙+1)
𝑖
𝑥𝑗
(𝑙) 𝑖
𝑥𝑗
(𝑙) 𝑖
𝑥𝑗
(𝑙)
… …
𝑣𝑥𝑗
(𝑙)
𝑤𝑖−1,𝑗
(𝑙+1)
𝑤𝑖,𝑗
(𝑙+1)
𝑤𝑖+1,𝑗
(𝑙+1)
𝑣
𝑥𝑗
(𝑙) 𝑣
𝑥𝑗
(𝑙) 𝑣
𝑥𝑗
(𝑙)
… …
𝑅𝑜
(b)
Figure 3.15: (a) Voltage-mode NMS, where neuronal activations are represented as volt-
ages and synapses typically operate via transconductance. (b) Current-mode NMS where
neuronal activations and the results of synaptic weighting are represented as currents.
a pre-synaptic neuron has a voltage output v
x
(l)
j
which falls across an output impedance
Ro connected to a small-signal ground. Here, each neuron is modeled as an ideal voltage
source that implements a linear activation functio . However, the pre- and post-synaptic
neurons may have generally have any activation function. The gain of the pre-synaptic
neuron can be described as Av = gmRo, where gm is a transconductance factor. Now,
consider the direct connection of the pre-synaptic neuron to all of the outgoing synapses,
which can be characterized by several parallel conductance values. In this case, the gain of
the pre-synaptic neuron becomes Av = gm
(
Ro|| · · · 1
G
(l+1)
i−1,j
|| 1
G
(l+1)
i,j
|| 1
G
(l+1)
i+1,j
· · ·
)
. Therefore,
if the neuron’s fanout is high, then small weight values in the outgoing synapses (typically
represented as high conductances) will significantly diminish the neuron gain. Therefore, it
is usually best practice to add a buffer before each outgoing synapse to reduce the loading
effect on the pre-synaptic neuron. In fact, this technique is analogous to biological ner-
vous systems, where each outgoing synapse is buffered using synaptic vesicles held within
the pre-synaptic terminal (synaptic bouton). As discussed earlier, buffering voltage-mode
3.5. Summary 50
analog neurons is expensive in terms of hardware area. In contrast, a current-mode design
(Figure 3.15(b)) affords the ability to buffer pre-synaptic neuron outputs using current-
controlled current sources (typically simple CMOS current mirrors), which have smaller
area and power requirements. One voltage-mode design that has overcome the loading is-
sue is the memristor bridge synapse proposed by Kim et al. [39, 51, 87, 88]. The design
uses complementary memristive devices in series combinations such that the total synapse
input resistance is always high, regardless of the weight state. However, the design is com-
plex, requiring 4 memristive devices and 3 MOSFETs for each synapse circuit.
Although the current-mode design approach is attractive, there are some challenges to
consider when designing NMSs using current-mode circuits. First, a current can’t be dis-
tributed through multiple circuit branches without buffering. Second, since current mirrors
are employed extensively, current-mode designs are especially prone to mismatch-related
problems. The effects of mismatch on the circuit, architecture, and system-level perfor-
mance are studied extensively in this work.
3.5 Summary
This chapter provided a detailed discussion on the device models (MOSFETs and memris-
tors), simulation strategy, and design methodology adopted in this work. The key contribu-
tions and results from this chapter are:
• Semi-empirical MOSFET (45 nm high Vth0) and memristor (Ag chalcogenide and
CBRAM) models were designed to capture essential device behavior while minimiz-
ing model complexity. Key memristor properties that can be improved for future
Chapter 3. Device Models, Simulation Strategy, and Design Methodology 51
work are the on resistance, which should be increased for easier programming, and
the conductance ratio g, which should also be increased to improve sense margins.
• The device models designed in this work can be simulated in SPICE or MATLAB.
While SPICE provides more detailed simulation results (especially transient behav-
ior), MATLAB simulations yield orders-of-magnitude reduction in simulation time.
• MOSFET and memristor devices will be integrated, primarily, into subthreshold
current-mode circuits for primitive NMS operations. The current-mode approach
has several advantages, including reduced design complexity, improved input/output
range, etc. There are also some challenges to consider with a current-mode design,
including device mismatch, which is exacerbated in subthreshold.
Chapter 4
Synapse and Neuron Circuits
This chapter presents the synapse and neuron circuits designed in this work. Synapses
were designed for constant weight values, random weight values, and adjustable weight
values. Careful attention is given to minimizing the synaptic area. This is critical, since
the number of synapses in an NMS (or any neural network) grows quadratically with the
number of neurons. Previous synapse designs have assumed ideal memristor behavior,
where switching is deterministic, and continuous [33–40, 89]. For example, many designs
have been simulated using a linear ionic drift memristor model [19], characterized by a
smooth hysteresis curve. However, memristor switching behavior is usually discontinuous,
indicating the devices can only achieve a small set of conductance states. Furthermore,
the exact values of these states and the required conditions (e.g. write voltage) have high
variability. This work captures memristive circuit behavior that accurately reflects the ex-
perimental characteristics of the devices. The neuron designs presented in this chapter
include common activation functions (e.g. sigmoid, linear, and threshold), as well as peri-
odic and rectified linear. Behavioral, power, area, and variation models are presented for
each circuit. In later chapters, these models are integrated into system-level simulations of
NMSs for visual information processing.
52
Chapter 4. Synapse and Neuron Circuits 53
4.1 Synapse Circuits
4.1.1 Constant Current Mirror Synapse
4.1.1.1 Basic Operation
𝑉𝐷𝐷
𝑖𝑥𝑗 𝑖𝑠𝑖,𝑗
𝛽1𝑖,𝑗: 𝛽2𝑖,𝑗
𝑣𝑠𝑑1𝑖,𝑗
+
−
𝑣𝑠𝑑2𝑖,𝑗
+
−
Excitatory simple current mirror synapse.
iconstantpositivvesynapsesimple
Pre-synaptic Neuron Synapse
(a)
Inhibitory simple current mirror synapse. 
iconstantnegativesynapsesimple
−𝑉𝑆𝑆
𝑖𝑥𝑗 𝑖𝑠𝑖,𝑗
𝛽1𝑖,𝑗: 𝛽2𝑖,𝑗
𝑣𝑑𝑠1𝑖,𝑗
+
−
𝑣𝑑𝑠2𝑖,𝑗
+
−
Pre-synaptic Neuron Synapse
(b)
Figure 4.1: Constant current mirror synapse circuits for (a) positive weights and (b) nega-
tive weights.
NMSs often employ synapses with constant and sometimes random weight values.
Two network topologies where these are encountered are MLPs with random hidden layer
weights [90] and reservoirs [91]. In these topologies, constant and random weights are
used to map data from lower to higher dimensional spaces and vice versa. Additionally,
they sometimes form feedback connections in recurrent networks. One approach to design
constant weights is to use a CMOS current mirror, as shown in Figure 4.1. The synapse
is the output MOSFET, while the input MOSFET is part of the pre-synaptic neuron. The
weight value is defined as
wi,j ≡
isi,j
ixj
=
β2i,j
β1i,j
exp
(
∆Vth0
nVT
)
Λ (vds2, L2)i,j
Λ (vds1, L1)i,j
= w′i,jΠi,j, (4.1)
The ratio of Λ functions occurs frequently, so it has been given the designation Πi,j . If both
4.1. Synapse Circuits 54
MOSFETs in the mirror have the same length and the same vds value, then the synaptic
weight w′i,j will be the ratio of the current factors β2i,j : β1i,j of the two MOSFETs (the
gain of the current mirror), which is controlled via sizing. PMOS mirrors allow the synapse
to excite the post-synaptic neuron (positive weights), whereas NMOS mirrors inhibit the
post-synaptic neuron (negative weights).
There are several important design aspects related to the synapse circuits in Figure 4.1.
First, the maximum or minimum voltages at the output nodes are equal to VDD−Vdssat and
−VSS +Vdssat , respectively, where Vdssat ≈ 100 mV. Second, the inputs to the synapses are
unipolar, so
si,j = H (xj)wi,j = H (xj)w
′
i,jΠi,j (4.2)
where H (·) is the Heaviside step function, vds1i,j is the drain-source (source-drain for
PMOS) voltage of the input MOSFET and vds2i,j is the drain-source voltage for the output
MOSFET. Note that this equation is approximate, because si,j 6= 0 when xj =≤ 0. Rather,
it will be bounded by the leakage currents of the MOSFETs.
It is also important to consider the case where the vds values are different for each MOS-
FET in the mirror. For a given current flowing through the input transistor, Equation (3.8)
can be used to find the value of vds1i,j . If vds2i,j is constant, then (3.8) can be calculated
immediately. In general, however, vds2i,j will vary as a function of the synaptic current.
This means that, especially for small channel lengths, |wi,j| will not be constant, as it is
typically assumed to be in artificial neural networks. To reduce the effect of the Λ ratio in
(4.2), one can increase the channel lengths. For example, just doubling the channel lengths
(i.e. from 45 nm to 90 nm) will significantly reduce the dependence of the weight values
Chapter 4. Synapse and Neuron Circuits 55
on variations in vds. Another alternative would be use a current mirror with higher output
impedance, such as a cascode configuration. Both options, however (doubling the channel
length and using cascode mirrors with minimum channel length) will double the synaptic
area. In addition, the use of a cascode configuration results in additional MOSFETs be-
tween the supply rails, reducing (increasing) the maximum (minimum) synaptic input and
output voltages.
4.1.1.2 Area and Power
The area of a constant current mirror synapse is equal to the area of the output MOSFET,
W2i,jL2i,j , which will depend on the synaptic weight. Let A
∗
csi,j
be defined as
A∗csi,j = Amin
w′i,j
(
W1i,j/L1i,j
)
L2i,j
Amin
, (4.3)
where Amin =45 nm × 45 nm is the minimum area of a transistor. Now, the synaptic area
can be expressed as
Acsi,j = A
∗
csi,j

⌈
Amincs
A∗csi,j
⌉1/2
2
, (4.4)
where Amincs is a minimum area constraint. Equation (4.4) warrants some explanation. If
Amincs ≤ A∗csi,j , then Acsi,j = A∗csi,j . Otherwise, the area increases to meet the minimum
area requirement. While increasing the area, one must maintain the same β2i,j to keep w
′
i,j
constant. This amounts to increasing both the length and width by the same integer factor,
which increases the area by the square of that factor.
The power consumption of the constant current mirror synapse can be approximated at
4.1. Synapse Circuits 56
low frequencies as
P avgcsi,j ≈ ηjwi,jImaxVDD, (4.5)
where ηj is the activity factor of pre-synaptic neuron. A value of η = 0.5 is usually a good
estimate.
4.1.1.3 Process Variations
Variations in threshold voltage will have a large impact on most of the NMS primitive cir-
cuits. For the constant current mirror synapse, the effect will be reflected in the distribution
of w′i,j , defined in (4.1). Recall that ∆Vth0 is a Gaussian-distributed random variable with
zero mean and variance given by (3.9). In addition, any desired (discrete) distribution may
be chosen for the current factors by sizing transistors appropriately. The probability density
function for |w′| becomes
f (|w′|) =

∑
φ
Pr (Φ = φ) β1
β2
lnN
|w′|β1β2 ; 0,
√
1
2
A2
Vth0
W1L1
+ 1
2
A2
Vth0
W2L2
nVT
 , |w′| > 0
Pr (|w′| = 0) δ (|w′|) , |w′| = 0
(4.6)
where Φ is a random set ofW1, L1,W2, and L2, and φ is a particular value of Φ. In addition,
lnN (x;µ, σ) is the log-normal PDF with µ and σ being the mean and standard deviation of
the associated normal distribution. Finally, δ (·) is the Dirac delta function. The distribution
represented by (4.6) can be derived based on Rohatgi’s result for the distribution of products
of random variables [92]. The delta function accounts for the fact that weight values equal
to 0 are represented by removing the synaptic connection between a pre- and post-synaptic
Chapter 4. Synapse and Neuron Circuits 57
neuron, so it no longer makes sense to define the distribution in terms of transistor geometry
and threshold voltage variation when |w′| =0.
w'
-2 -1 0 1 2
P
ro
ba
bi
lit
y 
D
en
si
ty
0
0.2
0.4
0.6
0.8
1
Monte Carlo
Model
(a)
w'
-2 -1 0 1 2
P
ro
ba
bi
lit
y 
D
en
si
ty
0
0.2
0.4
0.6
0.8
1
Monte Carlo 
Model
(b)
w'
-10 -5 0 5 10
Pr
ob
ab
ili
ty
 D
en
sit
y
0
0.2
0.4
0.6
0.8
1
Monte Carlo 
Model
(c)
w'
-2 -1 0 1 2
Pr
ob
ab
ili
ty
 D
en
sit
y
0
0.2
0.4
0.6
0.8
1
Monte Carlo 
Model
(d)
Figure 4.2: Probability densities for w′ in the constant current mirror synapse. (a)
W1/L1 =10/1, Amincs = Amin. (b) W1/L1 =10/1, A
min
cs = 25Amin. (c) W1/L1 =1/1,
Amincs = Amin. (d) W1/L1 =100/1, A
min
cs = Amin. In all cases W2/L2 is drawn from a
discrete uniform distribution between 1 and W1/L1. Zero-valued weights are also added
with probability 1/(1 +W1/L1).
The effects of threshold variation on constant current mirror weight distributions are
shown in Figure 4.2. Each plot shows the results of 100,000 Monte Carlo simulations
(performed in HSPICE) for both excitatory and inhibitory synapses, resulting in a total
of 200,000 samples. For each sample, W1, L1, and L2 were held constant while W2 and
4.1. Synapse Circuits 58
∆Vth0 were drawn from a discrete uniform distribution and a continuous Gaussian distri-
bution, respectively. Specifically, W2 varied such that β2/β1 followed a discrete uniform
distribution between L1/W1 and 1. In addition, zero-valued weights were added with prob-
ability 1/(1 + W1/L1), which modified β2/β1 to be drawn a uniform distribution between
0 and 1 a resolution of L1/W2. In Figure 4.2(a), W1/L1 =10/1 and Amincs = Amin. The
normally-distributed threshold variation changes the shape of the distribution of |w′| from
discrete uniform (what would be expected if AVth0 =0) to a continuous distribution with
long tails. This enables very high resolution weights at a much lower area cost than might
be predicted by (4.4). Based on the fact that the variance is related to the inverse of the
transistor area, one might expect to tighten the distribution by increasing the minimum
synapse transistor area Amincs . Figure 4.2(b) shows the case where W1/L1 =10/1 and A
min
cs
has been increased to Amin. We see two effects. As expected, the distribution tightens
with very low density outside of the [-1,1] domain. In addition, one notices the formation
of peaks around the values ± 1
10
,± 2
10
, . . ., because the continuous distribution approaches
the discrete distribution (where AVth0 =0) as the transistor areas increase. It’s also possi-
ble to use minimum sizing for all of the MOSFETs in the synapse circuit, resulting in the
distribution shown in Figure 4.2(c). Notice that the tails of the distribution are very long.
While using minimum sizing does reduce the synapse area, (4.5) reveals that it will also
increase the average synaptic power consumption. Therefore, there is a three-way trade-
off between the distribution shape, the synapse area, and the synaptic power consumption.
Figure 4.2(d) shows a fourth case where W1/L1 =100 and Amincs = Amin. Observe the
difference between Figures 4.2(d) and 4.2(b). In both cases, the average synaptic area has
Chapter 4. Synapse and Neuron Circuits 59
been increased, resulting in a tightening of the distribution. However, there are no promi-
nent peaks (except for the one at |w′| =0) in Figure 4.2(d). The reason is that, for larger
weight values (e.g. |w′|≈1), the large variance resulting from the β2/β1 ratio is cancelled
by the small variance from the larger synapse area. The opposite is true for smaller weight
values. This is also the reason that the peaks in Figure 4.2(b) are large near the center of
the distribution and smaller moving outward. Until now, we have made the tacit assump-
tion that synapses do not share pre-synaptic neurons. In practice, however, each neuron
will drive several post-synaptic neurons through synaptic connections. In Figure 4.1, this
amounts to multiple output MOSFETs being driven by the same diode-connected input
MOSFET. Consequently, every set of synapses belonging to a particular pre-synaptic neu-
ron will have its own distribution of weights, which could be problematic. For example,
consider the simple case where an NMS has a single input neuron, driving several hidden-
layer neurons through constant current mirror synapses. If we are lucky, the threshold
voltage of the input MOSFET will fall somewhere in the middle of the distribution. How-
ever, if the threshold voltage falls at an extreme value of the distribution, then all of the
magnitudes of the synaptic weights will be either very large or very small. In other words,
the mean of the distribution depends on particular value of the threshold voltage of the
input MOSFET, which is reflected in the following conditional distribution:
f (|w′||Vth01) =

∑
φ
Pr (Φ = φ) β1
β2
lnN
|w′|β1β2 ; Vth0p(n)−Vth01nVT ,
√
1
2
A2
Vth0
W2L2
nVT
 , |w′| > 0
Pr (|w′| = 0) δ (|w′|) , |w′| = 0
(4.7)
Figure 4.3 shows distributions of w′ from an excitatory synapse with different areas for
4.1. Synapse Circuits 60
w'
0 1 2 3 4 5
Pr
ob
ab
ili
ty
 D
en
sit
y
0
0.5
1
1.5
2
(a)
w'
0 1 2 3 4 5
Pr
ob
ab
ili
ty
 D
en
sit
y
0
0.5
1
1.5
2
(b)
Figure 4.3: Conditional distributions of w′ for excitatory synapses connected to different
pre-synaptic neurons with W2L2 = Amin. (a) W1L1 = Amin resulting in a large variation
in the distributions’ means. (b) W1L1 = 10Amin, resulting in reduced variation of the
distributions’ means.
the input MOSFET. In Figure 4.3(a), W1L1 = Amin and W2L2 = Amin, resulting in
a large variation in the distributions means. In contrast, Figure 4.3(b) shows results for
W1L1 = 10Amin and W2L2 = Amin, which yields a tighter distribution of the means.
Therefore, one can reduce pre-synaptic neuron-dependent bias, by increasing the size of
the input MOSFET.
As a final note, it is important to remember that MOSFETs were assumed to be closely
spaced in the variation analysis, allowing one to neglect the spacing-dependent variation
of threshold voltage [62]. In fact, this assumption is used for all variation analyses in
this work. However, as a pre-synaptic neuron’s fanout increases, the spacing between the
input and output MOSFETs will also increase, making the assumption less valid. One
solution may be to have each pre-synaptic neuron drive multiple subsets of synapses though
dedicated input MOSFETs, each of which is sized large enough to ignore the effect of
threshold voltage variation. However, it would be very difficult to determine the number
of subsets required to keep the closely spaced assumption valid, and would likely require
Chapter 4. Synapse and Neuron Circuits 61
statistical data from fabricated test chips.
4.1.1.4 Constant Current Mirror Synapse with Bipolar Input
𝑉𝐷𝐷
𝑖𝑥𝑗
+
𝑖𝑠𝑖,𝑗
𝛽1𝑖,𝑗: 𝛽2𝑖,𝑗
𝑣𝑠𝑑1𝑖,𝑗
+
+
−
𝑣𝑠𝑑2𝑖,𝑗
+
+
−
Pre-synaptic Neuron Synapse
−𝑉𝑆𝑆
𝑖𝑥𝑗
−
𝛽1𝑖,𝑗: 𝛽2𝑖,𝑗
𝑣𝑠𝑑1𝑖,𝑗
−
+
−
𝑣𝑠𝑑2𝑖,𝑗
−
+
−
Simple current mirror synapse with bipolar. 
input  iconstantsynapsesimplebipolarinput
Figure 4.4: Constant weight synapse circuit with bipolar input.
The constant current mirror synapse, as it has been presented so far, has a unipolar
input, making it incompatible with pre-synaptic neurons that have bipolar activation func-
tions, such as tanh. However, the design can be modified to accommodate bipolar inputs,
provided that the pre-synaptic activation function can be expressed as a difference of posi-
tive and negative components, xj = x+j −x−j . The circuit schematic is shown in Figure 4.4.
The circuit is essentially a combination of the excitatory and inhibitory constant weight
synapses, where the excitatory part is driven by the positive component of the pre-synaptic
neuron’s output, and the inhibitory part is driven by the negative component of the pre-
synaptic neuron’s output. The output is expressed as
si,j = x
+
j w
′+
i,jΠ
+
i,j − x−j w′−i,jΠ−i,j. (4.8)
4.1. Synapse Circuits 62
Of course, if the effects of vds are ignored and it is assumed that w′+i,j = w
′−
i,j = wi,j , then
(4.8) just becomes si,j = xjwi,j .
The area and power of the synapse in Figure 4.4 will be double the area and power
of the unipolar input synapse. In addition, the effect of threshold voltage variation can be
modeled by following the same approach as in the unipolar case.
4.1.2 Bipolar Weight Memristive Synapse
4.1.2.1 Basic Operation
In addition to the constant weight synapses presented above, an NMS requires adjustable
synapses to facilitate learning. Furthermore, it is advantageous to design synapse circuits
that can have bipolar weight values. The reason is that neural networks composed of bipo-
lar weights can generally separate and fit data better than those that have unipolar weight
values, especially with little or no input pre-processing. To this end, a current-mode mem-
ristive synapse was designed to achieve adjustable bipolar weight values. The design is in-
spired by the cortical microcircuit shown in Figure 4.5(a). Here, a post-synaptic pyramidal
neuron is driven by both excitatory (glutamatergic) and inhibitory (GABAergic) synapses.
The relative strength of the two synaptic connections determines an effect weight. The
circuit schematic is shown in Figure 4.5(b). The synapse’s input current is the output cur-
rent of the pre-synaptic neuron. Notice that both the diode-connected PMOSFET and the
diode-connected NMOSFET from the pre-synaptic neuron are used to mirror the input in
two places. The PMOS mirror has a 1:2 size ratio, so the output of the mirror is 2ixj . The
Chapter 4. Synapse and Neuron Circuits 63
Pre-Synaptic 
Spike Rate 𝑥𝑗
Post-synaptic
Spike Rate 𝑥𝑖
Pyramidal 
Neurons
Interneuron
Glutamate
GABA
Pre-Synaptic Neuron Effective
Synapse
Post-Synaptic
Neuron
Cortical microcircuit modeled using memristive synapse circuit 
excitationandinhibition.pdf
(a)
−𝑉𝑆𝑆
𝑉𝐷𝐷
𝐺𝑚1 𝐺𝑚2
𝑣𝑤
we
𝑖𝑥𝑗
𝑖𝑠𝑖,𝑗
Pre-synaptic 
Neuron
Synapse
1: 2
𝑖𝑥𝑗
1: 1
Current-mode bipolar weight memristive synapse
imemsynapsebipolar.pdf
…
𝑅𝑖𝑛
Post-synaptic 
Neuron
we
(b)
Figure 4.5: Bipolar weight memristive synapse. Two anti-parallel memristors control the
relative ratio of excitation to inhibition at the output.
memristors in the circuit are in parallel, since they share a common top node, and their bot-
tom nodes are both at 0 V. The memristor m1 is connected to the input of a post-synaptic
neuron. Notice that the input of this neuron is a virtual ground. Assuming this synapse
connects a jth pre-synaptic neuron to an ith post-synaptic neuron, then its output si,j can
4.1. Synapse Circuits 64
be described as (assuming infinite opamp gain).
si,j = 2xjΠ
+
i,j
Gm1
Gm1 +Gm2
− xjΠ−i,j (4.9)
where Gm1 and Gm2 are the conductances of the two memristors. Therefore, the synaptic
weight is given by
wi,j = 2Π
+
i,j
Gm1
Gm1 +Gm2
− Π−i,j. (4.10)
If the effects of the drain-source voltages are negligible, then this becomes
w′i,j = 2
Gm1
Gm1 +Gm2
− 1. (4.11)
When both memristors have a high g ratio, thenw′i,j will range approximately from -1 to +1.
Figure 4.6 shows the synaptic output characteristics for circuits with two different values
of the channel length L. In Figure 4.6(a), L = 45 nm, resulting in an increased range of the
synaptic weight value to [-1.6, 2.2]. Despite this increased range, the synapse’s linearity is
relatively unaffected by effects of drain-source voltages. The reason is that both Π+i,j and
Π−i,j are approximately constant. This stems from i.) the fact that the drain-source voltage
of the NMOS transistor is constant (because the drain is at a virtual ground), and ii.) the
small currents flowing through the memristors cause only very small changes in the PMOS
transistor’s drain voltage. The effects of the drain-source voltages are diminished further
when the channel length is increased, as shown in Figure 4.6(b). Here, L is increased by
a factor of 4, resulting in a synaptic weight range close to the ideal one. Of course, the
question becomes whether or not it matters if the weight range is increased (as in the case
Chapter 4. Synapse and Neuron Circuits 65
x j
0 0.5 1
s i
,j
-2
-1
0
1
2 -1.4 5  w i,j  5  1.9
Increasing wi,j
(a)
x j
0 0.5 1
s i
,j
-2
-1
0
1
2 -0.9 5 wi,j 5 1.2
Increasing wi,j
(b)
Figure 4.6: Bipolar weight memristive synapse output. (a) L = 45 nm, (b) L = 180 nm.
 
𝐺𝑚1 𝐺𝑚2
𝑅𝑇𝐺1
Equivalent write circuit for the current-mode bipolar 
weight memristive synapse
imemsynapsebipolarwriteequiv.pdf
𝑣𝑤
𝑅𝑇𝐺2
Figure 4.7: Equivalent write circuit for the bipolar weight memristive synapse.
of L = 45 nm). Since the synapse shows excellent linearity, its behavior can be modeled
its behavior as a constant factor wi,j that multiplies the pre-synaptic output, with a range
that is determined from SPICE simulations.
It is critical that the states of the memristors are not changed unintentionally. To ensure
this, the maximum quotient of the memristor current and memristor conductance should be
well below the threshold voltage. Since the maximum current value is on the order of 109,
and the minimum conductance is on the order of 10−5, the voltage across the memristors
will nominally be in the 100s of microvolts range, which is 3 orders of magnitude below
the threshold voltages.
Modification of the synaptic weight value is accomplished through two switches in the
4.1. Synapse Circuits 66
synapse and post-synaptic neuron. The switch is controlled by a write enable signal we,
which allows a write voltage vw to be applied to the memristors. Notice that the memristors
are anti-parallel, so application of a positive write voltage will increase Gm1 and decrease
Gm2, while the application of a negative write voltage will decreaseGm1 and increaseGm2.
A second switch, which is part of the post-synaptic neuron, creates a strong connection to
ground at the negative terminal of the first memristor. Both switches are implemented using
transmission gates. A critical design consideration stems from the large ON resistance
of the 45 nm MOSFETs used in this work. During a write operation, if vw is not large
enough, then the series resistances (see Figure 4.7) of the switches will cause the voltage
across the memristors to be below their thresholds. In turn, the weight of the synapse will
not be modifiable. This is a critical point, which has often been missed in the literature,
where switches are frequently modeled with zero impedance. Figures 4.8(a) and 4.8(b)
show the transition of the synaptic weight from -1 to 1 and 1 to -1, respectively. In both
cases, a 1 µs, 3.5 V pulse was applied repeatedly, causing the memristors’ state variables
to transition. Both transitions take ≈103 pulses to complete. The transition rate is directly
related to the learning rate α, and can be controlled via the applied pulse width. Shorter
pulse widths correspond to smaller α values and larger pulse widths correspond to larger
α values. Furthermore, from (3.18), it can be seen that ∆γ is linear in ∆t, so it will be
assumed that ∆wi,j is also linear in the applied pulse width.
Chapter 4. Synapse and Neuron Circuits 67
Pulse Number
. 1
, 
. 2
 , 
w
i,j
-1
-0.5
0
0.5
1
.
1
.
2
wi,j
 100   101   102   103  104
(a)
Pulse Number
 10 0   10 1    10 2    10 3   10 4
. 1
, .
2 
, w
i,j
-1
-0.5
0
0.5
1
. 1
. 2
wi,j
(b)
Figure 4.8: Evolution of the weight in the bipolar weight memristive synapse. (a) The
weight is changed from -1 to 1 by applying positive write pulses. (b) The weight is changed
from 1 to -1 by applying negative write pulses. In both cases, |vw| = 3.5 V, and the write
pulse width is 1 µs.
4.1.2.2 Area and Power
The bipolar weight memristive synapse designed in this work is composed of 2 MOSFETs,
2 memristors, and a transmission gate, making the total area
Ams ≈ 3Aminms + 2ATG + 8F 2, (4.12)
whereAminms is a minimum area constraint for the two current mirror output transistors, ATG
is the area of the transmission gate transistors, and F is the wire pitch. The 8F 2 term is not
necessarily needed, since the memristors are fabricated as a back end of line process, above
MOSFETs. Compared to the memristor bridge synapse proposed by Kim et al. [93], the
memristive synapse designed in this work requires 2 fewer memristors and 1 less MOSFET.
IfAminms = Amin, then this translates to a 38% reduction in area. However, asA
min
ms becomes
larger, this reduction in area becomes smaller.
The power consumption of memristive synapse designed in this work can be broken
4.1. Synapse Circuits 68
into two components. The first component is from normal signaling operation (when the
weight is not being adjusted):
P avgms,signal ≈ ηj
(
2Π+i,j + Π
−
i,j
)
ImaxVDD (4.13)
This is an order-of-magnitude approximation, since the maximum output of the pre-synaptic
neuron may, in general, be larger or small than Imax depending on process variations. One
major advantage of the proposed design over the memristor bridge synapse [93] is that the
power consumption is tied to the activity of the network. In [93], there is a constant power
consumption originating from a differential pair bias current. As a result, the synapse de-
sign in this work reduces power consumption by ≈25%. Although it hasn’t been explored
in this work, the proposed synapse design also affords the opportunity to reduce power
consumption at the system level by making network activity sparse.
Another key attribute of the memristive synapse designed in this work is that its power
consumption is independent of the memristors’ states, which is not the case for other
synapse designs proposed in the literature [33–38, 40, 41]. This independence stems from
the current-mode design paradigm adopted in this work and has implications for defect
tolerance. For example, consider a short-circuit defect in the second memristor in Figure
4.5(b). Since the synapse’s input is a current, the defect will not affect the average power
consumption. However, if the input (which is the output of the pre-synaptic neuron) was
a voltage, then a short-circuit to ground could cause a large increase in the circuit’s power
consumption, which could cause damage to the chip.
Chapter 4. Synapse and Neuron Circuits 69
The second component of the power consumption is from adjusting the synaptic weight:
P avgms,write ≈ Dtrain
1
2
v2w
RTG1 + [(Rmavg +RTG2) ‖Rmavg] , (4.14)
where Dtrain is the duty factor for the training clock, which is in its positive phase while
writing and negative phase otherwise. The mean memristor resistance, Rmavg is given as
Rmavg ≡ (Rmon +Rmoff ) /2.
4.1.2.3 Process Variations
The memristive synapse in Figure 4.5(b) is affected by both MOSFET and memristor varia-
tions. The MOSFET variations are modeled in a similiar way as the constant current mirror
synapse, where the dominant effect is from variations in Vth0. To start, notice that (4.9) can
be written as
si,j = s
+′
i,jΠ
+
i,j
Gm1
Gm1 +Gm2
− s−′i,jΠ−i,j = w+′i,jxjΠ+i,j
Gm1
Gm1 +Gm2
− w−′i,jxi,jΠ−i,j. (4.15)
Here, w+′i,j and w
−′
i,j are the weight values associted with the two current mirrors, which can
be modeled using (4.6) or (4.7).
Memristor variations, discussed in Section 3.2.3, also have an effect on the behavior
of the synapse circuit. In particular, variations in the on and off conductance values will
have a direct impact on the maximum and minimum weight values, which can be seen from
(4.11). This can be especially problematic since a large fraction of trained weight values
within an NMS will lie at the extrema of the weight range. This will be discussed in more
4.1. Synapse Circuits 70
detail in Chapter 7, where the effect of device variations on system-level performance is
analyzed.
4.1.2.4 Crossbar Implementation
The current-mode memristive synapse described above can also be integrated into a cross-
bar structure as shown in Figure 4.9. Memristors in the top row inhibit, or contribute a
negative component to the output, while memristors in the bottom row excite, or contribute
a positive component to the output. Therefore, each crossbar column represents one com-
ponent of one weight vectorwi, which can be positive or negative. If the opamp is assumed
to have high open loop gain and the wire resistances are small, then
xi =
N∑
j=1
u
(p)
j ImaxR
(
Gm2 −Gm1
Gm1 +Gm2
)
i,j
, (4.16)
where Gm1 and Gm2 are the top and bottom memristors in each column, respectively. The
advantages of this circuit over the stand-alone memristive synapse is that the area is reduced
by one transistor, and the inputs are bipolar (i.e. the input to each synapse can be a positive
or negative current). The power consumption is also reduced to
P avgms,signal ≈ ηj2Πi,jImaxVDD, (4.17)
which is about 2/3 of the power consumption for the non-crossbar based design. The pri-
mary disadvantage arises from unwanted current sneak paths that result from the crossbar
design.
Chapter 4. Synapse and Neuron Circuits 71
𝑖𝑢𝑁𝑖𝑢2𝑖𝑢1
Memristor Crossbar for 𝐰𝑖
…
…
𝑅
𝑅
𝑣𝑤𝑖,1 𝑣𝑤𝑖,2 𝑣𝑤𝑖,𝑁
𝑣𝑥𝑖
Pre-synaptic 
Neuron Outputs
Post-synaptic Neuron
…
we
Crossbar version of current-mode synapse.
crossbarsynapse.pdf
Figure 4.9: Crossbar and summing amplifier circuit for computing the distance between
the input and a weight vector.
4.2 Neuron Circuits
Neurons are composed of two stages. An input stage integrates all of the signals coming
from pre-synaptic neurons (convergence). The integrated signal is then passed to a sec-
ond stage that applies an activation function and then distributes the new information to
other neurons through outgoing synapses (divergence). This section discusses circuits and
models of both of these stages.
4.2.1 Input Stages
There are two types of neuronal input stages implemented in this work. The first is an
opamp input stage, where all converging synapses are connected to a virtual ground via
an inverting opamp configuration. If the opamp has infinite open loop gain, then the input
4.2. Neuron Circuits 72
stage will sum the incoming signals as
si = −
∑
j
wi,jxjImaxRin, (4.18)
where the − sign comes from the inverting opamp configuration.1 However, practical
opamp designs will not have infinite gain, so it is important to model the behavior of si
for arbitrary values of the open loop gain A0.
Figure 4.10(a) shows a simplified version of the bipolar weight memristive synapse
circuit, where the memristors have been replaced with resistors, the programming switch
has been removed, and the inputs are ideal current sources. The small signal model of the
input stage of the post-synaptic neuron is also shown. Generally, each neuron will have
multiple synapses connected to its input, labeled as common node (this is also the negative
input of the post-synaptic neuron’s opamp). Shown here is the synapse that connects the
jth neuron in the network to the ith neuron. When the finite gain is considered, the value
of si changes such that each synapse has an effective weight value of:
wi,j =
(
2
Gm1i,j
Gm1i,j +Gm2i,j
) 1 + A0 + Gm2i,jRini∗si2ixj
1 + A0 +
∑
k
Gm1i,kGm2i,kRin
Gm1i,k+Gm2i,k
− 1, (4.19)
where the index k has the same range as j, and i∗si is the sum of all of the individual synapse
input currents. When the opamp gain A0 is large, wi,j reduces to the original definition.
However, when the gain is smaller, the accuracy of the synapse circuit is degraded. This is
illustrated in Figure 4.10(b), where the maximum absolute value of the fractional error in si
1Note that here, si is normalized to a voltage instead of a current.
Chapter 4. Synapse and Neuron Circuits 73
𝑉𝐷𝐷
−𝑉𝑆𝑆
𝑖𝑥𝑗
2𝑖𝑥𝑗
𝐺𝑚1𝑖,𝑗 𝐺𝑚2𝑖,𝑗
… …
Common
Node
𝑅𝑖𝑛
𝑅𝑜
𝐴0𝑣𝑠𝑖𝑣𝑠𝑖
−
+
Synapse 𝒊, 𝒋 Post-synaptic Neuron 𝒊
Small signal model of bipolar weight memristive synapse
currentdividersynapsesmallsignal.pdf
(a)
020
4060
80100 6070 
90100
0
0.2
0.4
0.6
0.8
1
M
ax
. A
bs
. V
al
ue
 o
f 
F
ra
ct
io
na
l E
rr
or
 in
 s
Number of Inputs Opamp Gain [dB]
80
(b)
Figure 4.10: (a) Simplified model of the bipolar weight memristive synapse connected to
the input of a post-synaptic neuron. (b) Number of neuron inputs and opamp gain vs. the
maximum absolute value of the fractional error between the total synaptic output current
and the ideal total synaptic output current.
is plotted versus the number of synapses connected to neuron i and the gain of the neuron’s
opamp. The fractional error is defined as |(s¯i − si)/s¯i|, where s¯i is the expected (A0 =∞)
sum of all synapse outputs divided by Imax. 1000 sets of random values for each synapse’s
conductance and input current were picked for each set of independent parameters (number
4.2. Neuron Circuits 74
of inputs and opamp gain). The value of si was determined using (4.18) and (4.19), while
s¯i was determined using (4.18) and (4.11). The maximum value plotted in Figure 4.10(b)
is the maximum over all 1000 sets of random parameters. For this work, an opamp was
designed with gain A0 > 100 dB, which enables neurons to have ≈50 synaptic inputs
while keeping the fractional error below≈20%. The opamp design uses a high-gain folded
cascode input stage and a common source output stage.
When an opamp is not needed (for virtual ground or otherwise), a single grounded
resistor or memristor can be used for the neuron’s input stage. A memristor may be a better
choice because it will be significantly smaller and can be adjusted to change the shape of
the neuronal response. In that case, the ideal input current from the synapses is summed
across the resistor (or memristor) as
si =
∑
j
wi,jxjImaxRin. (4.20)
The only difference between (4.18) and (4.20) is the sign. However, consider the case
where the converging synapses are current mirror-based designs, as in Figure 4.11. If the
current mirrors have short channel lengths, then their output will depend exponentially on
vds, which is related to vsi in this case. However, the output current also depends linearly
on vsi through Ohm’s law: isi = vsi/Rin. This leads to a transcendental equation in vsi that
can be approximated as (see Appendix A)
si ≈ 2C
Imax
(−B +√B2 − 4AC) , (4.21)
Chapter 4. Synapse and Neuron Circuits 75
𝑖𝑠𝑖,1 𝑖𝑠𝑖,𝑁𝑒
𝑖𝑠𝑖,𝑁𝑒+1
𝑅𝑖𝑛
𝑣𝑠𝑖
𝑖𝑠𝑖
𝑉𝐷𝐷
−𝑉𝑆𝑆
…
… 𝑖𝑠𝑖,𝑁𝑒+𝑁𝑖
Several simple current mirror synapses converging at the 
input to a post-synaptic neuron.
Post-synaptic 
Neuron Input
…
Constant Current 
Mirror Synapses
Figure 4.11: Constant current mirror synapses converging on a resistive input stage of a
post-synaptic neuron.
whereA, B, and C are constants that depend on the synaptic weights, pre-synaptic outputs,
and channel lengths.
The effect of this non-linear input is illustrated in Figure 4.12, where the circuit in
Figure 4.11 was simulated in HSPICE for 6 different cases. In the first 3 simulations
(4.12(a)-4.12(c)) all transistor channel lengths are 45 nm, and the number of input synapses
N is 2, 100, and 1000 for 4.12(a), 4.12(b), and 4.12(c), respectively. In each case 100000
Monte Carlo runs were simulated, varying the outputs of the pre-synaptic neurons (between
0 and 1). The magnitude of the weights was kept constant at 1. Similarly, Figures 4.12(a)-
4.12(c) show simulation results for 2, 100, and 1000 inputs, but with the channel length
set at 90 nm. Along with the Monte Carlo data are linear fits, with the equations shown
in each plot. Ideally, the slope and intercept would be 1 and 0, respectively. However, it
is observed that the slope is larger for smaller channel lengths and the intercept increases
with and increasing number of inputs. This behavior is predicted by the model in (4.21) and
is intuitive once one realizes that the effect of vds is stronger for smaller channel lengths
(yielding the increasing slope) and also stronger for PMOS devices (yielding the increasing
4.2. Neuron Circuits 76
is (Ideal) [nA]
-1 -0.5 0 0.5 1
i s
 
(S
im
ula
ted
) [
nA
]
-2
-1
0
1
2
y = 1.5*x + 0.029
Monte Carlo
Linear Fit
(a)
is (Ideal) [nA]
-20 0 20
i s
 
(S
im
ula
ted
) [
nA
]
-20
-10
0
10
20
y = 1.5*x + 1.5
Monte Carlo
Linear Fit
(b)
is (Ideal) [nA]
-40 -20 0 20 40
i s
 
(S
im
ula
ted
) [
nA
]
-40
-20
0
20
40
60
y = 1.5*x + 15
Monte Carlo
Linear Fit
(c)
is (Ideal) [nA]
-1 -0.5 0 0.5 1
i s
 
(S
im
ula
ted
) [
nA
]
-2
-1
0
1
2
y = 1.1*x + 0.0023
Monte Carlo
Linear Fit
(d)
is (Ideal) [nA]
-20 0 20
i s
 
(S
im
ula
ted
) [
nA
]
-20
-10
0
10
20
y = 1.1*x + 0.11
Monte Carlo
Linear Fit
(e)
is (Ideal) [nA]
-40 -20 0 20 40
i s
 
(S
im
ula
ted
) [
nA
]
-40
-20
0
20
40
y = 1.1*x + 1.1
Monte Carlo
Linear Fit
(f)
Figure 4.12: isi vs. ideal isi for a numberN of constant current mirror synapses converging
on a resistive input stage of a neuron. (a) L = 45 nm, N = 2. (b) L = 45 nm, N = 100.
(c) L = 45 nm, N = 1000. (d) L = 90 nm, N = 2. (e) L = 90 nm, N = 100. (f) L = 90
nm, N = 1000.
intercept). This analysis also reveals that a fairly large fanin is possible without much non-
linearity. However, it is important to realize that this work has not considered the layout of
such circuits, which would be challenging in terms of interconnect routing.
Chapter 4. Synapse and Neuron Circuits 77
4.2.2 Sigmoid and Hyperbolic Tangent Activation Functions
4.2.2.1 Basic Operation
The second stage of a neuron applies an activation function to the summation of its inputs.
Sigmoid and hyperbolic tangent activation functions are most commonly used in artifi-
cial neural networks. Their saturating behavior closely approximates firing rates exhibited
by biological neurons. More importantly, because they are smooth and have continuous
derivatives, these activation functions can be used in gradient-based neural network train-
ing algorithms such as backpropagation.
The current-mode sigmoid/tanh neuron circuit is shown in Figure 4.13. The circuit
is essentially an NMOS differential amplifier. It is biased with a sink current equal to
Imax, which is mirrored in with a simple current mirror. The lengths of the transistors in
the current mirror are L = 180 nm to get better output impedance and current matching.
Since there are fewer neurons than synapses, we can afford to have larger transistors in the
neurons, allowing us to simplify the models by removing vds dependence. It can be shown
[94] that, if xi = x+i1 = x
+
i2, then
xi = fsig (si) =
1
1 + exp
(
−ζ1siImaxRin
nVT
) , (4.22)
where Rin is from the input stage, and controls the slope of the sigmoid function at si = 0.
Here, it is assumed that the input voltage vsi comes from an (inverting) opamp input stage.
If the input stage is non-inverting (e.g. resistive), then the inputs to the differential pair
would be switched. The positive output current is mirrored to another branch where it is
4.2. Neuron Circuits 78
𝑉𝐷𝐷
−𝑉𝑆𝑆
1:1
1:1
𝐼𝑚𝑎𝑥
𝑉𝐷𝐷
𝑣𝑠𝑖
𝑖𝑥𝑖1
+ 𝑖𝑥𝑖2
+
Current-mode activation function circuit.
Isigmoidactivation.pdf
𝑖𝑥𝑖
−
Figure 4.13: Current-mode sigmoid/tanh activation function circuit.
pulled down to −VSS though a diode-connected NMOS transistor. Therefore, the positive
output current can be mirrored to a synapse through either a PMOS or an NMOS current
mirror. This allows the neuron to connect to both excitatory and inhibitory constant current
mirror synapses, as well the bipolar weight memristive synapse.
si
-1 -0.5 0 0.5 1
x i
0
0.5
1
1.5
2
2.5
x+i1
x+i2 (Cascode)
Model
x+i2 (Simple)
-0.01 0 0.01
0.495
0.5
0.505
Figure 4.14: Transfer characteristics of the sigmoid activation function circuit.
Chapter 4. Synapse and Neuron Circuits 79
The activation function’s transfer characteristics are shown in Figure 4.14. The solid
and dotted lines are results from HSPICE simulations for x+i1 and x
+
i2, respectively. The
dashed line is from the model given in (4.22). The model gives high accuracy (see inset
for a zoomed view at si = 0), with mean absolute errors being 0.013 and 0.009 for x+i1 and
x+i2, respectively. Now, it is important to justify the use of a cascode current mirror in the
activation function circuit, rather than a simple current mirror, which has only half of the
area. The solid line in Figure 4.14 shows the HSPICE simulation result for x+i2, when a
simple current mirror is used. The range of the output is > 2× the range when the cascode
mirror is used. This is a direct result of a large vds value across the mirror’s output transistor,
which results in a large Π function for the mirror. Equivalently, the simple current mirror’s
output impedance is small, so the vds value of the output transistor can cause a large current
mismatch. Since the neuron’s power consumption will be related to the output currents, it
is important to reduce x+i2. The current matching can be improved in several ways, such
as increasing the channel lengths of the transistors in the simple current mirror, or adding
voltage dividers to the x+i2 branch (i.e. reducing vds across the output transistor. However, it
was found that the most area-efficient way to improve matching is to use a cascode mirror,
which causes the output impedance to be squared (i.e. if the simple current mirror’s output
impedance is ro, then the cascode mirror’s output impedance will be r2o .
The circuit in Figure 4.13 can also be used to implement a hyperbolic tangent activation
function. If xi = x+i − x−i , then
xi = ftanh (si) = tanh
(
ζ1siImaxRin
nVT
)
. (4.23)
4.2. Neuron Circuits 80
Notice that the two current components would have to be subtracted and then applied to the
input of a synapse circuit. This work mostly focuses on the sigmoid, rather than the tanh
activation function.
The sigmoid neuron used in this work has several advantages over other designs. For
example, voltage-mode designs [95] have a limited input/output voltage range and have
more complex designs. Current-mode neurons that operate in superthreshold [96] have in-
creased power consumption and do not exhibit a true sigmoid activation. Digital designs,
such as the one in [97] rely on piecewise linear approximations, yielding large area over-
head.
4.2.2.2 Area and Power Consumption
The sigmoid activation function circuit is composed of 10 transistors (the input transistor
for the bias current mirror is shared):
Asn ≈ Asn,bias + 2Asn,dp + 6Asn,pmirror + Asn,nmirror, (4.24)
where Asn,bias is the area of the NMOS bias transistor, Asn,dp is the area of the differential
pair transistors,Asn,pmirror is the area of the PMOS transistors in the cascode configuration,
and Asn,nmirror is the area of the NMOS transistor for mirroring ix+i2 . Note that Asn,pmirror
andAsn,nmirror will depend on the minimum sizing requirements for the outgoing synapses.
The activation function’s power consumption is estimated as
Psn ≈ 2VDD
(
ix−i + ix
+
i1
+ ηiix+i2
)
, (4.25)
Chapter 4. Synapse and Neuron Circuits 81
where ηi is the neuron’s activity factor. Nominally, ix−i + ix+i1 = Imax. However, the sum
may be larger or smaller due to process variations.
4.2.2.3 Process Variations
Variations in Vth0 and Rin have four primary effects on the behavior of the sigmoid activa-
tion function. Figure 4.15 shows the results of 1000 Monte Carlo simulations while varying
Vth0. Variations in the current mirror for Imax increase or decrease the maximum value of
x+i1. Then, variations in the cascode current mirror cause x
+
i2 to vary from x
+
i1. Variations
in the differential pair creates an offset current, effectively shifting the activation function
left or right. Finally, variations in Rin (normally-distributed with 30% variation) cause the
sigmoid slope to increase or decrease. The overall effect can be written as
x+i1 = fˆsigi (si) = xmaxifsig
((
1 +
∆Rini
Rin
)
si + ∆si
)
. (4.26)
The distributions of the maximum output xmaxi and its mirrored value x
+
i2 are modeled in
the same way as the constant current mirror synapses. The offset ∆si can be modeled as a
Gaussian distribution with mean si and variance σ2 (∆Vth0) /
(
Rin + ∆R
2
ini
)
.
4.2.3 Periodic Activation Functions
4.2.3.1 Basic Operation
The sigmoid and tanh activation functions are monotonic and, as a result, can only imple-
ment one decision boundary in their input space. Non-monotonic activation functions, on
the other hand, can implement several decision boundaries. Networks that non-monotonic
4.2. Neuron Circuits 82
si
-1 0 1
x
+ i1
0
2
4
6
(a)
si
-1 0 1
x
+ i2
0
5
10
15
20
25
(b)
si
-1 0 1
x
+ i1
0
2
4
6
(c)
si
-1 0 1
x+ i
2
0
5
10
15
20
25
(d)
Figure 4.15: Monte Carlo simulations of the sigmoid neuron activation function with vari-
ation in Vth0. (a) and (b) have minimum sizing, except for the Imax current mirror, which
has L = 180 nm. (c) and (d) have W = 90 nm and L = 90 nm, except for the Imax current
mirror, which has L = 180 nm.
activation functions can learn more complex tasks with fewer neurons. This translates to
lower area and power overhead. For example, in [98, 99], Soltiz et al., showed that neural
networks employing digital square wave activation functions yield energy-delay products
(EDPs) which are orders of magnitude less than other designs. This work introduces an
analog counterpart to Soltiz’s neuron design.
The activation function design is based on CMOS folding amplifiers, which are com-
monly used in high-speed analog-to-digital converters [100]. Folding amplifiers produce
voltage outputs that are periodic in their input voltages. That functionality is exploited to
Chapter 4. Synapse and Neuron Circuits 83
−𝑉𝑆𝑆
𝑉𝐷𝐷
𝑖𝑠𝑖
𝑅𝑖𝑛
𝑉𝐷𝐷 𝑉𝐷𝐷 𝑉𝐷𝐷
𝑖𝑥𝑖
𝑉𝐷𝐷
𝐼𝑚𝑎𝑥 𝐼𝑚𝑎𝑥
…
…
…
𝑉𝑡ℎ1 𝑉𝑡ℎ2 𝑉𝑡ℎ𝐹
Current-mode folding amp activation function.
ifanactivation.pdf
𝑣𝑠𝑖
Figure 4.16: Folding amplifier activation function with an opamp input stage.
-1 -0.5 0 0.5 1
0
1
2
-1 -0.5 0 0.5 1
0
1
2
-1 -0.5 0 0.5 1
0
1
2
si
-1 -0.5 0 0.5 1
x
i
0
1
2
F=1
F=2
F=3
F=4
Figure 4.17: Folding amplifier activation function with different fold factors (F ).
design an analog neuron with a periodic activation function (Figure 4.16). The folding fac-
tor F is the number of times that the amplifier’s output makes a low-to-high or high-to-low
transition. The folding amplifier’s operation is straightforward. For each fold, there is an
NMOS differential pair, biased by Imax. One differential input is connected to vsi , while
4.2. Neuron Circuits 84
the other is connected to a threshold voltage Vthk. When vsi  Vthk, all of the bias current
flows through the transistor connected to the threshold voltage. When vsi = Vthk, an equal
amount of current flows through both branches. Finally, when vsi  Vthk, all of the bias
current flows through the transistor connected to the input voltage. Therefore, vsi can be
used to modulate the current flowing through the output node. Figure 4.17 shows the volt-
age transfer characteristics of four different folding amplifiers with F = 1, 2, 3, and 4 from
top to bottom. The first plot is similar to the commonly-used sigmoid transfer function.
The second one, with F = 2 is similar to a radial basis function. The shapes of the func-
tions can be adjusted by changing the threshold voltages and gain of the circuit. However,
the exploration of that parameter space is saved for future work. The third and fourth plots
are not commonly used in neural networks, but allow each neuron to learn a broader class
of non-linearly separable functions. In fact, a single neuron with folding factor F can learn
functions with F decision boundaries.
4.2.3.2 Area, Power Consumption, and Process Variations
The area of the folding amplifier’s periodic activation function can be estimated as
Apn ≈ F (Apn,bias + 2Amin) +
⌊
F
2
⌋
Apn,bias, (4.27)
where Apn,bias is the area of the current mirror and current source transistors, which have
longer channels to reduce the effect of vds. The power consumption of the periodic activa-
tion function circuit is
P avgpn ≈ (F + 1) ImaxVDD. (4.28)
Chapter 4. Synapse and Neuron Circuits 85
The effect of process variations on the folding amplifier activation function has not been
modeled in this work. However, the development of such a model will follow a similar
derivation as for the sigmoid/tanh activation function circuit. In particular, variations will
affect the maximum output and the position(s) of the function’s transistion(s) from 0 to 1
or 1 to 0.
4.2.4 Additional Activation Functions
Additional activation function circuits used in this work include linear, rectified linear and
threshold functions, which are described by
flin (si) = Bsi, (4.29)
frlin (si) = H (si)Bsi, (4.30)
and
fthresh (si) = H (si − θ) , (4.31)
where B is a slope factor (sometimes called a boost factor), and θ is a threshold value. The
linear activation functions are implemented using diode-connected MOSFETs and current
mirror, while the threshold activation function is implemented with a current comparator
[101].
4.3. Voltage-Mode CBRAM Synapse and Neuron Circuits 86
...
...
�����
��
��
−��
�� Circuit 
Symbol:
Ag
GeS2 50 nm
W
(a)
0
2
4 0
2
4
−1
0
1
Gi [mS]Ge [mS]
w
(b)
Figure 4.18: (a) Proposed synapse circuit consisting of excitatory and inhibitory groups
of Ag/GeS2 CBRAM devices. (b) Effective synaptic weight vs. individual conductances
(Ge, Gi).
4.3 Voltage-Mode CBRAM Synapse and Neuron Circuits
A large number of the memristive devices reported in the literature exhibit low-resolution
discrete switching operation. Moreover, progress in memristor device and fabrication re-
search is primarily being driven by a push for new non-volatile memory technology. Con-
sequently, the majority of near-term commercial memristor processes will be optimized for
discrete, likely bi-stable, behavior. This work exploits the stochastic behavior of bi-stable
CBRAM devices, to achieve quasi-continuous behavior. Previous work [75, 102] has ex-
plored this idea primarily at the device level and proposed system-level models that embed
the device characteristics. However, there has been little work that comprehensively studies
the implications of experimental device behavior on circuit and system-level design.
The CBRAM-based synapse circuit multiplies an input value vu by a weight value w
to produce an output vs. The design (Figure 4.18(a)) consists of two sets of k CBRAM
Chapter 4. Synapse and Neuron Circuits 87
devices (Ag/GeS2/W stack, adopted from [75]) connected in parallel. The first set Ge
forms the positive (excitatory) input, which allows the capacitor C to charge towards vu.
The second set Gi forms the negative (inhibitory) input, that allows C to charge towards
−vu. After discharging the capacitor, the positive and negative inputs are applied for a
period of time t (we have used t=100 ns). Then, the inputs are disconnected. The final
synaptic output voltage vs, is given by
vs = vu
(
Ge −Gi
Ge +Gi
)[
1− exp
(
−tGe +Gi
C
)]
= vuw. (4.32)
Figure 4.18(b) shows the range of weights w that can be achieved in an individual synapse
w.r.t. CBRAM device conductances. Unlike previous works [17, 103], our synapse can at-
tain both positive and negative values for w which is important for fitting different types of
data. The circuit’s inputs can also be switched from ±vu to ground for programming via a
write voltage vw. Each CBRAM device switches probabilistically (and independently) with
the applied write voltage. This fact is critical for the synapse to achieve multiple weight
states. If all of the devices switched simultaneously with 100% probability, then each
synapse would only have 2 weight states, severely limiting the precision of the proposed
system. In fact, in order to build a similar system with devices that switch deterministically,
we would need individual control of each CBRAM device, which would significantly in-
crease the design complexity and area overhead.
A charge sharing-based neuron can be implemented by connecting outputs from a num-
ber of CBRAM synapsesN together by switches. When the switches are closed, a common
4.4. Summary 88
node is formed with a voltage given by:
vyˆ =
1
1 +N
∑
i
vsi (4.33)
This is accomplished by sharing the charge on all of the synapses’ output nodes using the
switches (transmission gates) in the neuron circuit. Although this type of charge sharing
scheme has been used elsewhere for neuron design (cf. [104]), we note an important prop-
erty that has been previously overlooked. A common issue in regression problems is that
the training process will find a model that fits the training data very well, but the model
will fit unseen data from the same distribution very poorly. This is called overfitting. One
method that is often used to combat overfitting is regularization, where the weights in the
models are kept small. Generally, the more weights there are, the more flexible the model
will be, leading to a higher probability of overfitting. Therefore, in our system, it is desir-
able to have smaller weight values when the number of inputs to the system is larger. It can
be seen from (4.33) that this is a natural property of our system. That is, the effective weight
of each synapse is scaled by 1/(1 +N), leading to smaller weights for larger systems.
4.4 Summary
This chapter presented designs and models for synapse and neuron circuits that can be
integrated into NMSs. The novel contributions discussed in this chapter are
• Design and modeling of current-mode synapses that leverage process variations for
random weight distributions
Chapter 4. Synapse and Neuron Circuits 89
• Design and modeling of a current-mode bipolar weight memristive synapse that has
reduced area and power consumption
• Design and modeling of neuron circuits with sigmoid/tanh, periodic, linear, rectified
linear, and threshold activation functions
• Design and modeling of a voltage-mode regression circuit that leverages the stochas-
tic switching nature of CBRAM synapses
The synapse and neuron circuits discussed in this chapter provide essential building blocks
of an NMS. The next chapter discusses methods for training an NMS by modifying synaptic
weight values.
Chapter 5
Synaptic Plasticity Circuits
Synaptic plasticity facilitates the modulation of connection strength between different neu-
rons in an NMS. It is the primary mechanism of learning and adaptation employed in these
systems. Within an NMS, it is critical to have efficient circuits that facilitate synaptic plas-
ticity. Synapses are modified via a particular training algorithm which can be unsupervised,
supervised, or semi-supervised. However, there are some common characteristics among
each of these. One commonality is that each algorithm typically involves the product of two
quantities, such as a pre-synaptic neuron’s output and an error value. Multiplication of dig-
ital values is easily achieved using an AND gate, but analog multiplication is more costly,
requiring at least a Gilbert Multiplier. This chapter discusses novel training algorithms to
reduce the cost of training circuits. First, a stochastic least-mean-squares algorithm is de-
veloped for both online and batch mode training. Then, another approach to is developed,
where multiplication on the unit square is approximated by a minimum function
90
Chapter 5. Synaptic Plasticity Circuits 91
5.1 Online SLMS Algorithm
5.1.1 Overview
Consider the simple linear regression problem, where m datapoints (u(p), y(p)) ∈ RN+1 are
to be modeled by a straight line fit:
yˆ(p) = hw(u
(p)) = w0 + w1u
(p)
1 + w2u
(p)
2 + · · ·+ wNu(p)N−1. (5.1)
where u(p) ∈ RN are the independent variables, yˆ(p) is the model estimate of y(p), and
w = [w0, w1, · · · , wN ] is a parameter vector of the unknown line coefficients. Note that we
always assume that u(p)0 is a bias input and therefore has a value of 1. A common strategy
to solve this problem is to define a cost, or error function as
J(w) =
1
2k
m∑
p=1
(
hw(u
(p))− y(p))2 (5.2)
and minimize it by adjusting wj as
wj := wj − α ∂
∂wj
J(w) (5.3)
where α is a constant called the learning rate. This leads to the common batch gradient
descent or least-mean-squares (LMS) algorithm:
wj := wj +
α
m
k∑
p=1
(
y(p) − yˆ(p))u(p)j , (5.4)
5.1. Online SLMS Algorithm 92
where each wj is updated simultaneously. The summation in (5.4) presents a challenge
in hardware, especially if m is large (which it usually is). To circumvent this, there is a
simpler version that estimates the partial derivative in (5.3) as [105]
∂ˆ
∂wj
J (w) =
(
y(p) − yˆ(p))u(p)j . (5.5)
Finally, plugging this estimate into (5.3) gives the online version of the LMS algorithm:
w
(p)
j := w
(p)
j + α
(
y(p) − yˆ(p))u(p)j . (5.6)
The weight update rule in (5.6), although fairly simple, involves multiplications, which are
costly to implement in analog hardware. Ideally, we can eliminate them.
The approach used in this work is to convert all of the analog variables in (5.6) to
random variables with Bernoulli distributions. Concretely, for each variable z, a random
variable Z is created such that
Z ∼ B(1, z) (5.7)
where B is the Binomial distribution. This leads to a new weight update equation:
w
(p)
j := w
(p)
j + α
(
Y (p) − Yˆ (p)
)
U
(p)
j (5.8)
where each uppercase variable is defined the same way as in (5.7). Finally, we notice that
using this weight update rule may cause the sign of the weight change to be negative when
it should be positive (or vice versa). Therefore, we make a slight modification, leading to
Chapter 5. Synaptic Plasticity Circuits 93
the proposed algorithm for linear regression:
w
(p)
j := w
(p)
j + αsign
(
y(p) − yˆ(p)) |Y (p) − Yˆ (p)|U (p)j . (5.9)
At first glance (5.9) looks complex, but we will show in the next section that it can be
implemented in hardware using only comparators and digital logic gates. We call this the
stochastic least-mean-squares (SLMS) algorithm. If we treat the change in a weight value
∆w
(p)
j as a random variable, then it is easy to show that its expected value is
E
[
∆w
(p)
j
]
= αsign
(
y(p) − yˆ(p))
× (y(p) + yˆ(p) − 2y(p)yˆ(p))u(p)j . (5.10)
The second term in parenthesis y(p) + yˆ(p)− 2y(p)yˆ(p) behaves similar to the absolute value
function |y(p)−yˆ(p)| except when y(p) and yˆ(p) are approximately equal and midway through
their range (i.e. 0.5). This gives us some mathematical justification for using this algorithm.
In the discussion above, we developed a learning algorithm for NMSs. Specifically,
we have shown that the algorithm works well for linear regression problems. It is easy to
show that the SLMS algorithm will also work well for logistic regression, where several
datapoints are to be classified as belonging to one of the two groups, or classes. In this case
yˆ(p) = hw(u
(p)) =
1
1 + exp (Bwu(p))
, (5.11)
5.1. Online SLMS Algorithm 94
where B is a constant that scales the sigmoid’s slope. By defining the cost function as
J(w, y(p))

−ln(hw(u(p))) y(p) = 1
−ln(1− hw(u(p))) y(p) = 0
(5.12)
we get an online weight update equation identical to (5.6), so the stochastic implementation
can be applied to classification problems as well.
5.1.2 Hardware Implementation
This section discusses a hardware implementations of the proposed SLMS algorithm. For
comparison, we also provide a hardware implementation of the LMS algorithm. We es-
timate the area of both implementations as a function of the number of the number of
independent variables N . Although many hardware memristive synapse (weight) designs
have been proposed [33–36, 38, 39, 89, 106], we make a couple of simplifying general-
izations: 1.) The synapse has bipolar weights which can have values between -1 and +1.
2.) The synapse has two terminals that are used for applying positive and negative write
voltages (vw+ and vw− to change the weight value. 3.) The magnitude of vw+ − vw− must
exceed a threshold before any weight change will be effected. 4.) The synapse’s weight
value will change proportionally to the flux of the write voltage applied to those two termi-
nals (provided condition 3 is met). In other words, ∆wj ∝ (vw+ − vw−)tα, where tα is the
amount of time that we apply the write voltage.
The first step in implementing the proposed SLMS algorithm, shown in (5.9), is to con-
vert the expected NMS output y(p), the actual output yˆ(p), and each input u(p)j to Bernoulli-
distributed random variables, as described in the last section. A straight-forward method
Chapter 5. Synaptic Plasticity Circuits 95
D Q D QD Q
+
𝑉𝐷𝐷 𝑉𝐷𝐷 𝑉𝐷𝐷
𝐼𝑚𝑎𝑥
2
𝐼𝑚𝑎𝑥
4
𝐼𝑚𝑎𝑥
2Ψ
clk
𝑖𝑟
Random current generator circuit.
randomcurrentgenerator.pdf
…
(a)
𝑉𝐷𝐷
−𝑉𝑆𝑆
𝑖𝑧
𝑖𝑟
𝑣𝑍
(b)
Figure 5.1: (a) Random current generator circuit. (b) Random current comparator (RCC)
for converting an analog current iz to a Bernoulli-distributed digital voltage vZ .
Global Circuit Synaptic Trainer
𝑖
𝑦𝑖
𝑝
𝑖𝑟1
𝑖
 𝑦𝑖
𝑝
𝑖𝑟2
𝑖
𝑢𝑗
𝑝
𝑖𝑟3
𝑖
𝑦𝑖
𝑝
𝑖
 𝑦𝑖
𝑝
𝑉𝑤−
𝑉𝑤+
𝑣𝑤𝑖,𝑗
Hardware implementation of online SLMS
onlineslms.pdf
Figure 5.2: Hardware implementation of the online SLMS algorithm.
is compare each value z (where z can be any of the analog variables) to a uniformly-
distributed random value r. If z > r, then Z = 1. Otherwise, Z = 0. Therefore, we need
a method for generating random numbers that take on real values with the same change as
z’s. The proposed circuit is shown in Figure 5.1(a). A linear feedback shift register (LFSR)
is used to generate pseudorandom digital values. Then, a simple binary-weighted digital-
to-analog converter. The output current ir will range from 0 to Imax
∑Ψ−1
i=1 1/2
i. Here, Ψ is
5.1. Online SLMS Algorithm 96
the number of flip flops in the LFSR. The upper bound on the index i is Ψ − 1 rather than
Ψ because the LFSR output will never be all zeros. Futhermore, the resolution ir will be
Imax/2
Ψ. As an example, when Ψ=4, ir will range from 0 to (7/8)Imax with a resolution of
Imax/16. Finally, Imax should be set to
Imax =
max(z)∑Ψ−1
i=1 1/2
i
, (5.13)
where z is the maximum value that an analog variable in the NMS can take on. The random
current ir is compared to a random variable iz (we assume that we have a current-mode
NMS implementation) using the current comparator shown in Figure 5.1(b). Two current
mirrors drive a common node, which is connected to the input of a digital buffer. The
output of the buffer is a Bernoulli-distributed random variable representation of z.
Now, it can be shown that combining the circuits in Figure 5.1 with some simple logic
gates allows a direct implementation of the proposed SLMS algorithm. The circuit archi-
tecture is shown in Figure 5.2. It is split into two parts: a global circuit and synaptic trainer.
In the global circuit, each independent current source is a random current generator like
the one shown in Figure 5.1(a). It is important that the random seed in the LFSRs of each
current source is different, so their outputs will be statistically independent. The global
trainer takes current representations of the expected and actual outputs and converts them
into random variables, as described above. We use an XOR gate to compute the absolute
value of Y (p)− Yˆ (p). Finally, this value is multiplied by an enable value (en) using an AND
gate. The enable signal is held high for tα. This is one of two outputs of the global circuit.
The other output is the sign of the error y(p) − yˆ(p). Again, we use the current comparator
Chapter 5. Synaptic Plasticity Circuits 97
design in Figure 5.1(b), with a current representation of the expected output as the positive
input and a current representation of the actual output as the negative input. The output will
be a digital ’1’ if y(p) > yˆ(p) and ’0’ otherwise.
The two outputs of the global circuit are used by each synaptic trainer to modify the
synaptic weight values. There is one synaptic trainer per input (N ). The input value u(p)j is
converted to a random variable just as the expected and actual outputs. The absolute value
output of the global circuit is multiplied with U (p)j using an AND gate. This value and the
sign of the error from the global circuit are used to generate the synaptic write voltage.
We consider a straightforward implementation of the LMS algorithm based on a four-
quadrant Gilbert multiplier with voltage inputs and current outputs [107, 108]. The biggest
challenge in implementing the LMS algorithm is caused by the threshold assumption that
we made earlier. That is, each synapse has a threshold voltage that must be exceeded in
order to change its weight value. This is illustrated in Figure 5.3. The left plot shows the
product (y(p)i − yˆ(p)i )u(p)j . The synapse’s threshold voltage is also shown in the vertical axis.
When the product is positive, the result needs to be shifted up to the threshold voltage, and
when it’s negative it needs to be shifted down. The result is shown on the right. We can
easily achieve this in hardware using the sign of the error and level shifters.
The circuit is shown in Figure 5.4. A global circuit computes the sign of the error, the
same way as in the SLMS implementation. In the synaptic trainer, the inputs to the Gilbert
multiplier are level shifted (LS block) to satisfy the multiplier’s input range. If the product
is negative (the multiplier’s output is a differential current), then it is shifted down by is.
Otherwise, it is shifted up by is. The value of is depends on the pull-down resistors Rpd
which are used to convert the multiplier’s shifted output to a differential voltage. Output
5.1. Online SLMS Algorithm 98
stages (common source amplifiers in this case) are used to provide low output impedance
in order to drive the synapse load. Finally, as in the case of the SLMS implementation, we
add an enable signal which is held high for tα.
�௜ � −  �௜ �
ݑ௝ሺ�ሻVtp
�௜ � −  �௜ �
ݑ௝ሺ�ሻ
ݑ௝ሺ�ሻ
Product Shifted Product
Vtn
Vtp
Vtn
Figure 5.3: Illustration of the transformation that must be applied to the output of the
Gilbert multiplier to implement the LMS algorithm: (Left) Unshifted multiplier output and
(Right) output after applying the shift.
Global 
Circuit
Synaptic
Trainer
LS LS LS
Gilbert 
Multiplier
CS CS
𝐼𝑠
𝑉𝐷𝐷
𝐼𝑠𝑅𝑝𝑑 𝑅𝑝𝑑
𝑖
𝑦𝑖
𝑝𝑖  𝑦𝑖
𝑝 𝑖𝑢𝑗
𝑝𝑖
𝑦𝑖
𝑝 𝑖  𝑦𝑖
𝑝
Hardware implementation of online LMS
onlinelms.pdf
𝑣𝑤𝑖,𝑗
+ 𝑣𝑤𝑖,𝑗
−
Figure 5.4: Hardware implementation of the online LMS algorithm.
Let us now derive an expressions to estimate the area costs of the SLMS and LMS hard-
ware implementations. We will assume that the total area of the implementation is roughly
Chapter 5. Synaptic Plasticity Circuits 99
equal to the sum of the areas of the components, and the areas of the components are equal
to the areas of their constituent transistors. It is important to note that this is a simplified
estimation. Further optimizations are possible at layout level. In addition, transistors in
digital circuits (e.g. inverters) have minimum sizing (and area) and transistors in analog
circuits have some multiple κ of the minimum sizing. Analog circuits are typically much
larger than digital circuits for various reasons. For example, the length of transistors in ana-
log circuits is often increased to improve the output impedance of a circuit, and the width is
usually increased to carry a specified current. Furthermore, the total transistor area is often
increased to improve matching properties. Table 5.1.2 shows a breakdown of the area of
the global circuit and synaptic trainer for both the SLMS and LMS algorithms. Each entry
is a multiple of the minimum transistor area. As an example, the current comparator circuit
has 2 current mirrors, each with 2 analog transistors, for a total of 4 analaog transistors. It
also has 4 transistors with minimum sizing for the buffer, yielding an area of 4κ+4. There
are 3 of these circuits in the SLMS’s global circuit, one in it’s synaptic trainer, and one in
the LMS’s global circuit.
We notice that the cost for the SLMS’s global circuit is much higher than that of the
LMS circuit and vice versa for the SLMS’s synaptic trainer. However, there is one synaptic
trainer for every input, so the area cost for the SLMS and LMS implementations are
ASLMS = 4κΨ + 22Ψ + 12κ+ 24 + 24N + 4Nκ (5.14)
and
ALMS = 4κ+ 4 + 28Nκ, (5.15)
5.1. Online SLMS Algorithm 100
Table 5.1: Comparison of online SLMS and LMS area.
SLMS LMS
Component Global Circuit Synaptic
Trainer
Global Circuit Synaptic
Trainer
Digital
AND2 6 6 - -
XOR 6Ψ + 6 - - -
DFF 16Ψ - - -
Analog
Switch κΨ 3κ - 4κ
Current Comparator 3(4κ+4) 4κ+4 4κ+4 -
Current Source/Sink 3κΨ - - 2κ
Resistor - - - 2κ
Multiplier - - - 10κ
Level Shift - - - 6κ
CS Amp. - - - 4κ
Total 4κΨ+22Ψ
+12κ+24
7κ+24 4κ+4 28κ
respectively. Based on these expressions, we expect ASLMS > ALMS for small N and
ASLMS < ALMS for larger N . In the next section, we will evaluate the area and perfor-
mance of each implementation for two different applications.
5.1.3 Algorithm Performance
For the linear regression problem we took 11 linearly-spaced points lying on random lines
y = w0 + w1u1 (random values for w0 and w1). We added Gaussian noise to each point
and used the resulting points as the data to be regressed. The results are shown in Figure
5.5. Figure 5.5(a) shows the mean squared error (MSE) versus training epoch for a single
run of both the SLMS and LMS algorithms. In this case, α=0.001. The SLMS algorithm
takes longer to converge. This is an expected result since it will often not adjust weight
values in cases where the LMS algorithm would. Figure 5.5(b) shows random line data and
the linear fits found via the SLMS and LMS algorithms. In Figure 5.6 we show the mean
Chapter 5. Synaptic Plasticity Circuits 101
10
0
10
2
10
4
10
−3
10
−2
10
−1
100
Epoch
M
S
E
SLMS
LMS
(a)
0 0.5 1
0.4
0.5
0.6
0.7
0.8
u1
y
Data 
SLMS 
LMS
(b)
Figure 5.5: Performance of the proposed algorithm on a linear regression problem: (a)
MSE vs. training epoch for the LMS and SLMS algorithms. (b) Datapoints from a straight
line with random Gaussian noise added and the linear fits provided by the LMS and SLMS
algorithms.
10
−4
10
−3
10
−2
10
−1
0.01
0.02
Learning Rate
M
ea
n 
M
S
E
0 
SLMS 
LMS
Figure 5.6: Mean MSE versus α (learning rate) over 10 runs.
MSE value over 10 runs of each algorithm for 4 different learning rates. As expected, the
LMS algorithm is relatively insensitive to learning rate. The SLMS algorithm, however,
will generally have higher MSE with a larger learning rate. This is because the learning
rate defines the resolution of the weight change for the SLMS algorithm. If the resolution is
low (large learning rate), then the weights cannot be fine-tuned enough to fit the data well.
Interestingly, however, at very small values of α, the MSE is also large. We believe this is
tied to the slower convergence of the SLMS algorithm.
5.2. Batch SLMS Algorithm 102
Assuming Ψ=10, κ=10, and withN=2, we have area costs ofASLMS=892 andALMS=604.
Therefore, for such a small number of inputs, the area cost of the proposed algorithm is
much larger (≈1.5×) than the LMS implementation.
5.2 Batch SLMS Algorithm
Gradient descent approaches for training neuromorphic systems can be expensive in terms
of area overhead and design complexity, especially in the case of analog designs. We have
addressed this problem in [109] by designing a stochastic version of the online least-mean-
squares (LMS) training algorithm [110], which was able to achieve ≈3.5× area reduction
and similar accuracy when compared to a conventional LMS implementation. The SLMS
algorithm converts the signals used for training from analog values to stochastic values
from a Bernoulli distribution. In this work, we have designed a batch-mode version of the
SLMS algorithm, which proved to yield significantly better training results for regression
problems than the online version. The batch-mode LMS algorithm is given as
∆wi = −α ∂J
∂wi
=
m∑
p=1
u
(p)
i
(
yˆ(p) − y(p)) , (5.16)
where J is the mean square cost function. Notice that in addition to performing analog
multiplications, this algorithm requires the accumulation of analog values, which will be
costly in hardware.
In contrast, the circuit design for the batch-mode SLMS algorithm (shown in Figure
5.7) uses only digital logic gates and comparators. The circuit consists of a global trainer
and several individual synaptic trainers (one for each synapse). Input signals vy and vu are
Chapter 5. Synaptic Plasticity Circuits 103
converted to the stochastic domain [58] by comparing them with independent uniformly-
distributed random voltages vr. For each training pattern, the SLMS algorithm evaluates
the difference between the neuron output vyˆ and the expected system output vy, and mul-
tiplies it by the system input vui. The result probabilistically increments, decrements, or
holds the value of a counter, which will be used for synaptic weight adjustment. After all
training patterns have been presented, a write voltage signal is applied to each of the synap-
tic CBRAM devices. The write voltage is either +vw, −vw, or 0, depending on the value
stored in the counter and a threshold value θw.
rs
u
p\
co
u
n
t
do
wnen
Global 
Circuit
Synaptic 
Trainer
��−�� ���
�����
�  ���
����
�  ���
��train_rs clk
Figure 5.7: Batch SLMS circuit design.
Algorithm 1 summarizes the training scheme in more detail. First, training is enabled
by setting train en=’1’. When this signal and the train signal are both high, the
switch connected to vw in the synapse circuit is closed (see Figure 4.18(a)), allowing a
write voltage to be applied to the CBRAM devices. Lines 2-26 perform the batch-mode
SLMS training algorithm described above. Initially, the train signal is set to ’0’ and
train rs is pulsed to reset all of the counters in each of the synaptic trainers. Then, lines
5-15 update the counters in the synaptic trainers. Note that the loop in lines 8-14 is actually
unrolled in hardware, so each of the counters is updated in parallel. After the counters
5.2. Batch SLMS Algorithm 104
Algorithm 1 Batch SLMS training algorithm.
1: train en=‘1’ to enter training mode
2: for epoch=1:Nepochs do
3: train=‘0’
4: Pulse train rs for one clock cycle to set counti=0∀i
5: for p=1:m do
6: u = u(p)
7: y = y(p)
8: for i=0:N do
9: vyˆ = vu ·wT/(N + 1)
10: Ui ∼ B
(
1, u
(p)
i
)
11: Yi ∼ B
(
1, y(p)
)
12: Yˆi ∼ B
(
1, yˆ(p)
)
13: counti+ = sgn
(
y(p) − yˆ(p))Ui ∧ (Yi ⊕ Yˆi)
14: end for
15: end for
16: train=’1’
17: for i=0:N do
18: if counti > θw then
19: Apply +vw V to synapse i
20: else if counti < θw then
21: Apply −vw V to synapse i
22: else
23: Apply 0 V to synapse i
24: end if
25: end for
26: end for
27: train en=’0’ to enter test mode
are updated, the train signal is set to ’1’ (line 16) and the write voltages are applied to
the synapses (lines 17-26, which are also unrolled). Finally, training is disabled by setting
train en=’0’.
The reason that the batch-mode SLMS algorithm works can be described intuitively as
follows. If the error gradient for a particular weight is consistently large and positive over
all of training patterns, then that weight is likely to be decreased. If the error gradient is
consistently large and negative over all of the training patterns, then that weight is likely
Chapter 5. Synaptic Plasticity Circuits 105
to be increased. Otherwise, for small error gradients, the synaptic weights are less likely
to be modified. These changes allow the weights to descend the error gradient and settle
at a global minimum. Mathematically, we can express the expected value of each synaptic
write voltage (which relates to the expected change in synaptic weight) as
E[vwi] ≈

+vw, Ei > θw
−vw, Ei < −θw
0, otherwise
, (5.17)
where
Ei ≡ E
[∑
p
v
(p)
ui
(
v(p)y − v(p)yˆ
)]
(5.18)
and E[·] is the expectation. An important free parameter that must be tuned in our system is
θw. If θw is too small, then the algorithm will become very sensitive to noise in the training
data and will not converge. Conversely, if θw is too large, the algorithm will converge
before it reaches the global minimum. In general, the threshold value should be larger for
larger training batch sizes because the expected count values in the synaptic trainers will
be higher. In this work, we have used θw=5, which was determined empirically.
5.3 Min Algorithm
Another approach to training an NMS, shown in Figure 5.3, is to approximate complex
functions such as multiplication with functions that are easier to implement in hardware.
The proposed circuit first converts an input current to a pulse width (left schematic). Then,
using two current-to-pulse width (itop) converters and an AND gate, one can compute the
5.3. Min Algorithm 106
min. In this case, the pulse width of we will be proportional to min(ixj , |ixi − ixˆi |). The
resulting write voltage vw. If the normalized values of ixˆj and |ixi − ixˆi| are both in the unit
interval, then the normalized write voltage vw will be very similar to their product.
𝑉𝐷𝐷
−𝑉𝑆𝑆
𝐼𝑐
𝐶
𝑖𝑖𝑛
𝑣𝑜𝑢𝑡
clk
itopw itopw
𝑖𝑥𝑗 𝑖  𝑥𝑖 − 𝑖𝑥𝑖
we
sign 𝑖  𝑥𝑖 − 𝑖𝑥𝑖
𝑣𝑤−
𝑣𝑤
𝑣𝑤+
clk
Figure 5.8: Circuits for implementing the min function.
The simplification presented above is used to design a learning circuit similar to the
LMS rule [110], which can be written as
wij := wij + αxjxiD = wij + αxj(xˆi − xi). (5.19)
Here, xiD is the difference between the neuron’s actual and expected outputs. In this work,
a novel circuit for implementing a learning rule similar to (5.19). The modified training
rule becomes
wij := wij + αsign(xiD)min(xj, |xiD|). (5.20)
To implement this in hardware, we first find |xiD| using a modified current subtraction
circuit. We also find sign(xiD) using a current comparator. Then, ixiD and ixj are converted
into pulse widths using the circuit in Figure 5.3. If the buffer’s threshold voltage is low,
Chapter 5. Synaptic Plasticity Circuits 107
then the length of the pulse width at vout measured from the rising edge of clk to the
falling edge of vout will be
tw ≈ iin
Ic
Tclk
2
, (5.21)
where Tclk is the clock period. Combining two such circuits and an AND gate gives us the
min function, which is used as a write enable, WE, signal for each synapse. Finally, the
sign of the ixiD = ixˆi−ixi is used to select a positive or negative write voltage. The variation
in current matching will have the largest effect on the functionality of the proposed training
circuit.
5.4 Summary
This chapter has discussed primary design overheads in existing training algorithms for
NMSs. New algorithms were designed to reduce the area and complexity of existing algo-
rithms. Specifically:
• A novel stochastic training algorithm was developed to that reduces the area cost of
the LMS algorithm by ≈3.5×. The algorithm can operate in both online and batch
modes.
• A novel circuit was designed to approximate multiplications in the unit interval as a
min function, reducing the complexity of gradient calculations for NMS training.
Chapter 6
NMSs for Visual Information
Processing
This chapter focuses on NMS designs for visual information processing tasks using the
primitive synapse, neuron, and plasticity circuits discussed in the previous chapters. Here,
the primary design challenge is to choose a network topology that meets constraints from
both the algorithm and the circuit levels. In particular, the topology must facilitate a spe-
cific visual information processing task, and it should also be feasible to implement using
primitive circuits. The vision tasks studied in this work include
• Feature detection
• Classification
• Clustering
Each task corresponds to a different level in the brain’s vision processing hierarchy. In
addition, the topologies explored in this chapter are all previously unstudied within NMS
design. Each was chosen reduce the cost of on-chip training. The rest of this chapter
discusses the design of NMSs for the 3 information processing tasks listed above in detail.
108
Chapter 6. NMSs for Visual Information Processing 109
6.1 Feature Detection
…
…
 𝑦 𝑝
Edge detection setup.
edgesetup.pdf
(a)
Edges
Non-Edges
(b)
Figure 6.1: (a) Simulation setup for testing a single-layer, one-output neural network for an
edge detection application. (b) Training patterns for edge detection (adapted from [111]).
Detection of low-level features such as edges, lines, and corners is a critical step that
takes place in the brain’s early visual system [112]. Edge detection, in particular, is an
important vision task that has wide utility in areas such as automated medical diagnosis
[111] and is a difficult problem for neural networks. The difficulty stems from the fact
that edge detection is a non-linearly separable problem. In fact, it is very similar to a
parity or XOR problem, where an NMS (or any classifier) must determine if the number
of ‘1’ inputs is even or odd. It has been proven by Minsky and Papert [113] that this type
of problem cannot be solved by single-layer neural networks with monotonic activation
functions. However, if the activation function is not monotonic, as is the case for the folding
amplifier design, then a single-layer network should be able to learn multiple decision
boundaries.
This hypothesis was tested by designing a single-layer NMS to detect edges in grayscale
images, as shown in Figure 6.1(a). Each of the NMSs 9 inputs connects to one pixel in a
6.1. Feature Detection 110
3-pixel by 3-pixel window, which scans across the entire input image. At each location, the
network determines if the pixel in the center of the window is part of an edge. A threshold
function is applied to the network’s output to get a binary decision. The 17 patterns shown
in Figure 6.1(b) [111] were used for supervised training. One training cycle consists of
randomly initializing the weights, selecting a training pattern at random, applying it to the
network’s inputs, setting the expected network output to ‘1’ in the case of an edge pattern
or ‘0’ in the case of a non-edge pattern, and applying the Perceptron learning rule.
The accuracy results are shown in Figure 7.2. Figures 7.2(a) and 7.2(b) show results
for the Lenna image with adaptive and fixed learning rates, respectively. Similarly, Figures
7.2(c) and 7.2(d) show results for the Clock image with adaptive and fixed learning rates,
respectively. These input images can be seen in Figures 6.3(a) and 6.3(f). In each plot,
the fraction of correctly-classified pixels is plotted against the number of folds used for
the output neuron. We determined whether or not our networks correctly classified pixels
by comparing our output (edge-detected) images with an edge-detected image produced
by MATLAB. For each fold factor, we plotted the results after 1, 2, and 5 training cycles,
shown as the left, center, and right bar in each group, respectively.
We hypothesized that we would see steady improvement in the fraction of correctly-
classified pixels as the number of output neuron folds increases. Again, this is because
each fold represents another decision boundary for the network. e.g., when F = 4, the
network can theoretically learn functions with four decision boundaries. Our hypothesis is
correct in the cases of one and four folds, which consistently perform the worst and best,
respectively. For the Lenna image, the 4-fold neuron performs ≈20% better than the 1-
fold neuron. The difference is even larger for the Clock image, where the 4-fold neuron
Chapter 6. NMSs for Visual Information Processing 111
1 4
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fr
ac
tio
n 
C
or
re
ct
Lenna with Fixed Learning Rate
0 
Edges Non−Edges
      2 3 
Number of Folds (F)
(a)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fr
ac
tio
n 
C
or
re
ct
0 1 42 3
Number of Folds (F)
Lenna with Adaptive Learning Rate
Edges Non−Edges
(b)
1 4
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fr
ac
tio
n 
C
or
re
ct
Clock with Fixed Learning Rate
0 
Edges Non−Edges
      2 3 
Number of Folds (F)
(c)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fr
ac
tio
n 
C
or
re
ct
0 1 42 3
Number of Folds (F)
Clock with Adaptive Learning Rate
Edges Non−Edges
(d)
Figure 6.2: Quantitative comparison of edge detection simulation results: Lenna image
with (a) adaptive and (b) fixed learning rates. Clock image with (c) adaptive and (d) fixed
learning rates. Each plot shows the fraction of pixels correctly classified by the network for
networks with 1, 2, 3, and 4-fold output neurons. Each group of three bars shows results,
from left to right, after 1, 2, and 5 training cycles, respectively.
performs≈65% better than the one 1-fold neuron. What we did not expect, however, is that
the 2-fold neurons would perform almost as well as the 4-fold neurons and 3-fold neurons
would perform almost as poorly as one fold neurons, consistently across all experiments.
We are not sure if this is true in general (i.e. it may be application specific). However, it
6.1. Feature Detection 112
(a) (b) (c) (d) (e)
(f) (g) (h) (i) (j)
Figure 6.3: Qualitative comparison of edge detection simulation results: (a) Original Lenna
image, and edge-detected images after 5 training cycles with a fixed learning rate and (b)
1-fold, (c) 2-fold, (d) 3-fold, and (e) 4-fold output neuron. (f) Original Clock image, and
edge-detected images after 5 training cycles with a fixed learning rate and (g) 1-fold, (h)
2-fold, (i) 3-fold, (j) 4-fold output neuron [114].
tells us that care should be taken in choosing the type of activation function that can learn
enough decision boundaries for a target application while keeping in mind that, in our case,
neurons with more folds will have a higher area and power cost. Therefore, in this case,
it would make more sense to use a 2-fold neuron, even though it doesn’t perform quite as
well as the 4-fold neuron.
A qualitative comparison of our results is shown in Figure 6.3. Figures 6.3(a) and
6.3(f) show the original grayscale images used for testing. Then, Figures 6.3(b) and 6.3(g),
6.3(c) and 6.3(h), 6.3(d) and 6.3(i), and 6.3(e) and 6.3(j) show the results after five training
cycles with a fixed learning rate for the 1, 2, 3, and 4-fold neurons, respectively. One can
immediately see the effect of only having one decision boundary in the case of the 1-fold
neurons. Instead of classifying pixels as edges or non-edges, the network classifies them as
belonging to light or dark regions of the image, which is a linearly-separable function. In
Chapter 6. NMSs for Visual Information Processing 113
the cases of the 2 and 4-fold neurons, the network is able to classify many of the different
types of edges and non-edges that exist within the original images.
6.2 Classification
Object classification is a critical task in visual information processing. In the primate brain,
objects are classified in the inferotemporal cortex, which receives information from the fea-
ture extraction layers in the primary visual cortex (e.g. the V1 cortex). A key parameter
when designing an NMS for object classification is the dimensionality of the input space.
When the number of dimensions is low, accuracy is improved by first increasing the dimen-
sionality and then training on the higher-dimensional space. In contrast, when the number
of dimensions is high, accuracy (especially generalization) can often be improved through
dimensionality reduction techniques. More importantly, reducing the number of dimen-
sions also reduces the size of the NMS, which improves area and energy efficiency. Both
of these ideas are explored (increasing and reducing the dimensionality of the input space)
in this work.
6.2.1 MNIST Dataset
An NMS with an MLP topology was designed for classifying MNIST images, which are
handwrittend images of the numbers 0-9. To reduce the cost of training, only the output
layer is adjusted (using online SLMS), while the hidden layer is untrained, providing a ran-
dom projection to a higher dimensional space. This approach has been explored previously
in software [90, 115, 116]. However, this work is the first to apply the idea in an NMS. The
6.2. Classification 114
𝑢1
𝑢2
𝑢25
…
𝑏 𝑏
… …
 𝑦1
𝑝
 𝑦2
𝑝
 𝑦10
𝑝
Memristive 
Synapses
Constant
Current 
Mirror
Synapses
0
0
0
1
0
0
0
0
0
0
Original MNIST Image
28 × 28 Pixels
Downsampled Image
5 × 5 Pixels
NMS for Image 
Classification
One-hot 
Output
MNIST Classification setup.
mnistclassificationsteup.pdf
Figure 6.4: Simulation setup for classification on the MNIST database. The original
MNIST images are reduced to 5×5 pixels and fed into the NMS. The NMS’s output layer is
trained using online SLMS, and a winner-take-all output is used to generate the hypothesis.
fixed, random weights in the hidden layer are implemented using constant current mirror
synapses. The outputs of the hidden layer are the inputs to 10 (M = 10) logistic classi-
fiers. Each of the weights for the logistic classifiers are implemented using a bipolar weight
memristive synapse. In addition.
The simulation setup is shown in Figure 6.4. 1000 examples were used for both the
training and test sets. Each sample is first reduced to 5×5 pixels. Then, the average values
are scaled between 0 and 1. These values are used as the inputs to the network. The inputs
are randomly projected to a 50-neuron hidden layer, and the output layer is trained using
online SLMS. The final output is taken using winner-take-all to create a one-hot encoding
of the hypothesized class.
The results are shown in Figure 6.5. Figure 6.5(a) shows the classification accuracy
of the logistic classifiers versus training epoch for a single run, where α=0.001. Both
algorithms have similar convergence in this case. The MSE versus learning rate is plotted
Chapter 6. NMSs for Visual Information Processing 115
0 100 200 300
0
0.2
0.4
0.6
0.8
Epoch
Cl
as
sif
ica
tio
n 
Ac
cu
ra
cy
 
SLMS
LMS
(a)
10−4 10−3 10−2 10−1
0.5
0.6
0.7
0.8
0.9
Learning Rate
M
SE
 
SLMS
LMS
(b)
Figure 6.5: (a) Classification accuracy vs. epoch and (b) classification accuracy vs. learning
rate for the logistic regression problem.
in Figure 6.5(b). For small learning rates, the accuracy is diminished. This is likely because
the number of training epochs is limited to only 250. Also, for large learning rates, the
accuracy starts to fall off because the algorithms are unable to fine tune. Notice that the
SLMS algorithm is able to achieve accuracies similar to the LMS algorithm. Using the
expressions for the area costs of SLMS and LMS, one finds that the LMS algorithm has
≈3.5× larger area overhead than SLMS.
6.2.2 Caltech101 Dataset
This work also studied the effect of dimensionality reduction on an NMS’s classification
accuracy and area cost. Dimensionality reduction occurs in the early stages of visual pro-
cessing in the primate brain. The goal is to map vectors u(p) ∈ RO in a high-dimensional
space to vectors pi(p) ∈ Ro = Lu, where o  O. Here, L is an o × O matrix, which
can be found using a variety of techniques. In this work, L is found using locality pre-
serving projections (LPP) [117]. The LPP dimensionality reduction step captures 99.9%
of the variance in the original space. This particular study used LPP in conjunction with a
6.2. Classification 116
𝜋1
𝑝
𝑏
… …
 𝑦1
𝑝
 𝑦2
𝑝
 𝑦10
𝑝
Memristive 
Synapses
𝜋2
𝑝
𝜋𝑜
𝑝
0
0
0
0
0
0
⋮
1
⋮
0
One-hot 
Output
𝛑 𝑝 = 𝐋𝐮 𝑝
𝑢1
𝑝
𝑢2
𝑝
𝑢𝑂
𝑝
NMS for Image 
Classification
Dimensionality 
Reduction
Pre-processed 
Image
Caltech101 classification setup.
caltech101classificationsetup.pdf
Figure 6.6: Simulation setup for classification on the Caltech101 database. The original
(color) images are first converted to black and white and normalized. Then, their dimen-
sionality is reduced using LPP. The lower-dimensional vectors are classified using a single-
layer perceptron NMS.
20 30 40 50 60 70 80 90 100
23.5
24
24.5
25
25.5
26
26.5
27
27.5
28
Output Dimension from LPP (o)
C
la
ss
ifi
ca
tio
n 
A
cc
ur
ac
y
[%
]
23
SVM 
NMS
(a)
0 20 40 60 80 100
0
10
20
30
40
50
60
Training Epoch
C
la
ss
ifi
ca
tio
n 
A
cc
ur
ac
y 
[%
]
Training Set
Test Set
(b)
Figure 6.7: (a) Classification accuracy of linear SVM and single-layer neuromemristive
architecture as a function of the top d dimensions from LPP. (b) Learning curves for the
NMS using SLMS.
single-layer NMS to classify images from the Caltech101 dataset [118].
A diagram of the simulation setup is shown in Figure 6.6. The Caltech101 dataset
contains between 40 and 800 sample pictures of objects from 101 categories. Each image
was converted to black and white and normalized. Then, a dimensionality reduction matrix
Chapter 6. NMSs for Visual Information Processing 117
L was found using LPP. The lower-dimensional samples were classified using a single-
layer NMS with SLMS (α=0.0001). 5-fold cross-validation was used, whereby each of
the 5 training/test splits used 80% of the dataset for training and 20% for testing. Figure
6.7(a) shows the mean classification accuracy versus o (the number of dimensions from
the dimensionality reduction) for both the NMS network and a suppport vector machine
(SVM). The training accuracies were ≈70% for the NMS and ≈90% for the SVM. On
average, the NMS performs only 1.5% worse than the SVM.
Class Count 
G
ro
u
n
d
 T
ru
th
 C
la
s
s
if
ic
a
ti
o
n
 I
n
 
Classified As 
Figure 6.8: (left) Number of training samples per class; (right) Confusion matrix showing
input ground truth class on the rows, and corresponding mapping on the columns.
Figure 6.8 shows the confusion matrix, along with the relative number of training sam-
ples used for each class in the dataset. For high classification accuracy, we should see a
prominent diagonal and few gray squares off of the diagonal. Although the diagonal is
6.2. Classification 118
100 101 102 103
102
103
104
105
106
Number of LPP Output Dimensions (o)
N
um
be
r 
of
 N
M
S 
C
om
po
ne
nt
s
Neurons
Synapses
Figure 6.9: The NMS network size vs. the number of LPP output dimensions o.
clearly present, there are many off-diagonal non-zero (not white) values, indicating that
the NMS found many common features among different object classes. Also note that the
classes corresponding to the highest classification accuracy (darkest) also have the largest
number of samples in the training set.
In this work, the dimensionality reduction step allows the size of the NMS classifier
to be reduced significantly. This idea is illustrated in Figure 6.9, where we have plotted
the number of NMS components (i.e. neurons and synapses) versus the number outputs
of the LPP dimensionality reduction o. Before the dimensionality reduction step, the input
images have O = 60× 60 pixel values. Feeding these directly into the NMS would require
≈ 3.7× 103 neurons and ≈ 3.6× 105 synapses. However, when o is small, e.g. 100, there
is ≈97% reduction in the size of the NMS needed for classification.
Chapter 6. NMSs for Visual Information Processing 119
  face                 leopard              motorbike           accordion              airplane                anchor                    ant                     barrel 
  bass                   beaver               binocular               bonsai                   brain            brontosaurus            buddha                butterfly 
  camera                  cannon               car_side           ceiling_fan            cellphone                chair                chandelier        cougar_body  
    cougar_face             crab                    crayfish              crocodile       crocodile_head             cup                dalmatian              dollar_bill  
    dolphin             dragonfly          electric_guitar           elephant                emu               euphonium               ewer                     ferry 
  flamingo         flamingo_head          garfield                gerenuk          gramophone        grand_piano           hawksbill          headphone 
     hedgehog           helicopter                  ibis                inline_skate        joshua_tree           kangaroo               ketch                    lamp 
  laptop                   llama                    lobster                 lotus                 mandolin              mayfly               menorah           metronome 
minaret                 nautilus              octopus                  okapi                 pagoda                  panda             pigeon                    pizza 
  platypus                 pyramid             revolver                   rhino                rooster              saxophone            schooner              scissors 
  scorpion             sea_horse              snoopy            soccer_ball             stapler                starfish            stegosaurus            stop_sign 
   strawberry            sunflower                  tick                  trilobite                umbrella               watch              water_lilly           wheelchair 
     wild_cat           windsor_chair          wrench               yin_yang 
Figure 6.10: The Caltech101 dataset. Two example images from each of the 100 classes is
shown, after resizing, normalization, and conversion to grayscale.
6.3 Clustering
6.3.1 Overview
Clustering algorithms uncover structure in a set of m unlabeled input vectors {u(p)} by
identifying M groups, or clusters of vectors that are similar in some way. In one common
6.3. Clustering 120
approach, each cluster is represented by its centroid, so the clustering algorithm is reduced
to finding each of the M centroids. This can be achieved through a simple competitive
learning algorithm: Initialize M vectors wi by assigning them to randomly-chosen input
vectors. These will be referred to as weight vectors. Then, for each input vector, move the
closest weight vector a little closer. After several iterations, the algorithm should converge
with the weight vectors lying at (or close to) the centroids. Of course, there are several
parameters which must be defined, including a distance metric for measuring closeness.
The most obvious choice is the `2-norm. However, computing this is expensive in terms
of hardware because it requires units for calculating squares and square roots. In addition,
as we will discuss later, it is easy to use a high-density memristor circuit called a crossbar
to compute dot products between input and weight vectors. Therefore, it is preferred to
use a dot product as a distance metric. For example, if all of the vectors are normalized
(‖u(p)‖ = ‖wi‖ = 1), then wi∗ · u(p) > wi · u(p)∀wi 6= wi∗, where wi∗ is the closest
weight vector to u(p). However, the constraint that ‖u(p)‖ = ‖wi‖ = 1 creates a large
overhead, because every input vector has to be normalized and every weight vector has to
be re-normalized each time it is updated.
We propose the following solution: Map each input vector to the vertex of a hypercube
centered about the origin: u(p) ∈ {−1, 1}N , where N is the dimensionality of the input
space. Now, wi · u(p) will yield a scalar value d∗i,p between −N and +N . Moreover,
this scalar value can be linearly transformed to a distance di,p which is the `1-norm, or
Chapter 6. NMSs for Visual Information Processing 121
Manhattan distance, between the weight vector and the input:
di,p ≡ N − d∗i,p =
N∑
j=1
|wi,j − u(p)j |. (6.1)
Using this distance metric, we don’t ever need to re-normalize the weight vectors. Fur-
thermore, mapping input vectors to hypercube vertices can usually be accomplished by
thresholding. For example, grayscale images can be mapped by assigning -1 to pixel val-
ues from 0 to 127 and +1 to pixel values from 128 to 255. Algorithm 2 summarizes the
algorithm. The first two lines are initialization steps. Within the double for loop xi is 1
when i corresponds to the index of the closest vector (called the winner) and 0 otherwise.
Then, the weight components of the winner are moved closer to the current input vector
using a Hebbian update rule. The pre-factor α, which is called the learning rate, deter-
mines how far the weight vectors move each time they win. Notice that this algorithm is
completely unsupervised, so there are no labeled input vectors.
Algorithm 2 Proposed clustering algorithm.
1: Map inputs to hypercube vertices.
2: Initialize weight vectors to random input vectors.
3: for epoch = 1:Nepochs do
4: for p = 1:m do
5: d∗i,p = wi · u(p) ∀i = 1, 2, . . . ,M
6: xi =
{
1, d∗i,p = max(d
∗
i,p)
0, otherwise
∀i = 1, 2, . . . ,M
7: ∆wi,j = αxiu
(p)
j ∀i = 1, 2, . . . ,M ∀j = 1, 2, . . . ,m
8: end for
9: end for
The unsupervised clustering algorithm discussed above can be implemented efficiently
in an NMS by representing weight vectors as memristor conductances. A block diagram
6.3. Clustering 122
of the proposed design is shown in Figure 6.11. The inputs, which are represented as
positive and negative currents, are fed throughM crossbar circuits using the crossbar-based
memristive synapse designed in this work. Each crossbar computes the distance between
the current input and the weight vector represented by its memristors’ conductances.
 𝑦1
Memristor
Crossbar
WTA
Inputs
 𝑦2
 𝑦10
Distance 
Calculation
𝑢1
𝑢2
𝑢400
……
Weight Update
MNIST clustering setup.
clustering_architecture.pdf
Figure 6.11: Block diagram of proposed NMS for unsupervised clustering.
So far, only distance calculation part of the algorithm has been discussed 6.11 (line 5 in
Algorithm 2). The winner-takes-all circuit (line 6 in Algorithm 2) can be implemented in a
number of ways. In this work, we used the current-mode design described in [119]. Finally,
the weight update (line 7 in Algorithm 2) can be computed using simple combinational
logic circuits.
6.3.2 Clustering MNIST Images
One exciting application of the proposed hardware is automatically identifying clusters in
sets of images. We took 1000 images (m=1000) from the MNIST handwritten digit dataset
and clustered them using a behavioral model of the NMS described in the last section. Each
Chapter 6. NMSs for Visual Information Processing 123
Figure 6.12: 10 cluster centroids found in a set of 1000 MNIST images using the proposed
NMS.
image was originally 20×20 grayscale pixels (N=400). They were mapped to hypercube
vertices using the thresholding approach discussed earlier. In addition, we used 10 clusters
(M=10), 500 training epochs (Ntrain=500), and α=0.005. The results are shown in Figure
6.12. Here, we have plotted the weight vectors representing the centroid of each cluster.
Figure 6.13 shows the cost versus the training epoch, where the cost is defined as
J =
m∑
p=1
(min di,p∀i) . (6.2)
The cost function for the proposed NMS approaches that of MATLAB’s built-in k-means
clustering after 500 epochs.
Epoch
0 100 200 300 400 500
J
#10 5
1.3
1.4
1.5
1.6
1.7
1.8
MATLAB
Proposed
Figure 6.13: Cost function versus epoch while clustering MNIST images using the pro-
posed NMS.
6.4. Summary 124
6.4 Summary
This chapter discussed 3 visual information processing tasks that can be implemented with
NMSs built from the primitive circuits designed in this work. The novel contributions
presented in this chapter are
• Demonstration of edge detection in a single-layer NMS using a novel periodic acti-
vation function circuit.
• Demonstration of the functionality of the memristive synapse and SLMS algorithm
designed in this work for classification in both multi-layer and single-layer NMSs.
• Design of a novel memristor crossbar-based approach for image clustering that achieves
accuracy similar to the k-means algorithm
Chapter 7
Effects of Device Variations on
System-Level Performance
The NMSs designed in the last chapter were simulated assuming nominal device and cir-
cuit characteristics. However, it was shown in Chapters 3 and 4 that both MOSFET and
memristor variations will lead to large variations in synapse and neuron circuit behavior,
which are further exacerbated when operating in the subthreshold region. Therefore, it is
critical to study the effects of device-level variations on system-level performance. As far
as the author is aware, this is the first study to include active (i.e. MOSFETs) and passive
(i.e. memristors, resisitors, and capacitors) device variations in a system-level NMS sim-
ulation. In the first part of this chapter, the voltage-mode regression circuit discussed in
Chapter 4 is applied to an electrical load forecasting problem. On-chip batch-mode SLMS
is used to train CBRAM-based synapses. As a result of on-chip training, high accuracy
can be achieved with large device-level variations. The later part of this chapter discusses
the effects of device variations when off-chip training is used. A partial on-chip training
method is developed to reduce achieve a balance between the pros and cons of off-chip
training.
125
7.1. Case Study: NMS for Electrical Load Forecasting 126
7.1 Case Study: NMS for Electrical Load Forecasting
An NMS based on the CBRAM synapse and voltage-mode neuron circuits was designed
to study the effects of device-level variations on system-level performance. The NMS
performs linear regression on a set of datapoints {(u(p), y(p))}, where u(p) ∈ RN+1 (N -
dimensional inputs with an additional bias term) and y(p) ∈ R. The overall system block
diagram is shown in Figure 7.1. During training (training mode), each input vector u and
output scalar y are pre-processed to convert them from their original representations (e.g.
binary) to voltages vu and vy that range from 0 to VDD. Then, the synapses S0, S1, . . . , SN
and neuron circuit are used to evaluate the scaled dot product vyˆ = vu · wT/(N + 1),
where w = [w0, w1, . . . , wN ] is a weight vector. Each wi corresponds to one of the synapse
circuits Si. The circuit operates in either training or test mode. In training mode, the batch
SLMS training algorithm uses the error between vy and actual outputs vyˆ to update weight
values. After the training process is complete, the SLMS training circuit is disabled and
the NMS enters test mode. In test mode, the synaptic weights remain constant, and the
system predicts the output vyˆ corresponding to new input vectors using a linear fit. The
post-processing step transforms the voltage representation of the output to another form
such as binary. It may also shift and scale the output depending on the application.
The NMS described above was applied to electrical load forecasting in smart grids
using an autoregression model. Accurate load prediction in smart grids is critical in order
to efficiently manage them and minimize their failures. Before The NMS has 3 inputs (N=2
plus 1 bias input). The training and test data are composed of hourly power consumption
values (in MW) from a power grid (located in US Mid-Atlantic Region) [120], collected
Chapter 7. Effects of Device Variations on System-Level Performance 127
for the month of January, 2012 and January, 2013, respectively. The goal of the system was
to forecast the power consumption for the hour t + 1 given the power consumption values
from hours t, t − 1, and t − 2. More specifically, the inputs to the system during training
were of the form u = [b, Pt−Pt−1, Pt−1−Pt−2], where b is a the constant bias term, and P
is the power consumption. Each of the 3 synapses used 20 CBRAM devices (10 excitatory
and 10 inhibitory). During training, the expected output y = Pt+1 is also supplied as an
input. The total learning duration for the experiment was ≈50 µs (500 epochs).
Post-Process
S
y
n
a
p
se
s
Neuron
…
�଴ �ଵ ��
SLMS 
Learning
train train_en train_rs
��
�� ��
��଴
���
��ଵ…
clk
��଴ ��ଵ ����  �
 �
Level Shift
Pre-Process
Le
v
e
l 
S
h
if
t
Figure 7.1: Block-level diagram of the proposed NMS for linear regression using the SLMS
training algorithm.
7.1.1 Simulation with Nominal Parameters
The nominal values of the simulation parameters for the NMS are shown in in Table 7.1.
These values were used to determine the baseline system accuracy. Figure 7.2(a) shows
the actual and predicted electrical loads vs. time for the test data in 8-hour increments.
Before training, we observe a large error in both the offset and slope of the predicted load.
7.1. Case Study: NMS for Electrical Load Forecasting 128
Table 7.1: Nominal system parameters used for electrical load forecasting.
Parameter/Metric Value
Simulation Parameters
t (Capacitance charge time) 100 ns
CBRAM devices per synapse 20
N+1 3
Training epochs 500
|vw| 1.5 V
tw (Time that write voltage is applied) 500 ns
pswitch 0.05
Training time 50 µs
Simulation Metrics
Mean accuracy 96%
Peak accuracy 97.5%
Energy per training epoch 5.4 mJ
Power 15 µW
Area 14.5 µm2
CBRAM switch events 150
After training, the predicted and actual loads are very similar. Figure 7.2(b) gives a more
detailed look at the data from the first day in January 2013. In addition to the forecasted
and actual load values, we’ve also plotted a forecasted value assuming an ideal synapse.
The ideal synapse can attain any (continuous) synaptic weight value. The weights in this
case were found by solving the normal equation for the associated linear system. Notice
that the forecasted load after training is very close to the ideal fit.
Figure 7.2(c) shows the accuracy for the test data versus training epoch. The accuracy
is calculated as:
Accuracy = 100%
(
1− 1
m
m∑
p=1
∣∣∣∣y(p) − yˆ(p)y(p)
∣∣∣∣
)
, (7.1)
where m is the number of test patterns (number of predictions made on the January 2013
test data). Notice that the trend is generally increasing, but occasionally decreases due to
the stochastic nature of the SLMS training algorithm. The curve in Figure 7.2(c) is for one
Chapter 7. Effects of Device Variations on System-Level Performance 129
Time [hours]
0 100 200 300 400 500 600 700
Lo
ad
 [M
W
]
#10 4
2
3
4
5
6
Actual Load Data
Predicted--CBRAM (Before Training)
Predicted--CBRAM (After Training)
(a)
Time [hours]
0 5 10 15 20 25
Lo
ad
 [M
W
]
#10 4
2.5
3
3.5
4
4.5
5
Actual Load Data
Predicted--Ideal Synapse
Predicted--CBRAM (Before Training)
Predicted--CBRAM (After Training)
(b)
Epoch
0 250 500
Ac
cu
ra
cy
 [%
]
60
70
80
90
100
(c)
Figure 7.2: (a) Predicted load vs. time for the month of January 2013 in 8-hour increments.
(b) Predicted load vs. time for January 1, 2013 (24 hours). (c) Load prediction accuracy
versus training epoch.
simulation run. However, all of the simulations discussed in this section were run 10 times
and averaged.
The simulation metrics for the nominal parameters are listed at bottom of Table 7.1.
After training, the mean accuracy approaches 96%. In comparison, the ideal fit reaches an
accuracy of 98%. The energy consumed per training epoch was 5.4 mJ, and the overall sys-
tem power consumption was 15 µW. We estimated the area of our design to be 14.5 µm2.
An additional metric that is important in our system is the number of CBRAM switch-
ing events. Memristive devices, including CBRAM, have a limited endurance (number of
7.1. Case Study: NMS for Electrical Load Forecasting 130
times they can be switched before they become stuck in one state). Therefore, it is critical
to limit the number of switching events in order to maximize the lifetime of the system.
A deterministic online training algorithm would induce approximately 1 switching event
per synapse per training pattern per epoch. For our system, this amounts to ≈ 1.86×104
switching events per device during training. In contrast, our batch-mode SLMS training
algorithm reduces the number of switching events per device to ≈ 2.5. If we assume the
system lifetime is directly proportional to CBRAM switching events, we see that the pro-
posed training algorithm increases the system lifetime by over 7000×.
7.1.2 Variation Analyses
In addition to the nominal simulations discussed above, we have also investigated the ef-
fects of different device and parameter variations on the system’s forecasting accuracy. The
results are summarized in Figure 7.3. Figure 7.3(a) shows the impact of CBRAM program-
ming window (Gmon/Gmoff ) on accuracy. The programming window can be modulated
by controlling compliance current (maximum current allowed to pass through the devices)
as demonstrated in [75]. The system shows good immunity to variations in the device con-
duction ratio. Surprisingly, the system accuracy is still good even when Gmon/Gmoff=1.
At first, this seems counterintuitive, because a conductance ratio equal to unity implies that
each synapse will have one fixed weight state. However, the log-normally-distributed ran-
dom variations in the CBRAM on and off state conductances allow the synaptic weights
to reach multiple states even if the mean ratio is unity. In fact, as a result of the wide
distribution of the state conductances, we can consider each CBRAM device as having
quasi-continuous conductance states that can be reached probabilistically.
Chapter 7. Effects of Device Variations on System-Level Performance 131
96
98
) )( )(( )((( 
A
cc
ur
ac
yE
[C
]
9(
97
94
Gmon/Gmoff
10000 
(a)
60
70
80
90
100
2 4 8 16 20 32
A
cc
ur
ac
yp
[)
]
CBRAMpDevicespPerpSynapse pppppppppp
lSynapticpRedundancy)
ppOptimal
Redundancy
(b)
80
85
90
95
0.01 0.05 0.1 0.25 0.5
A
cc
ur
ac
y
[%
]
CBRAM Switching Probability
OptimalOSwitching
Probability
(c)
70
75
80
85
90
95
0 10 20 30 40 50
σC [%]
A
cc
ur
ac
y 
[%
]
(d)
Figure 7.3: Impact of device and parameter variations on the system forecasting accuracy.
(a) Impact of variations in the CBRAM programming window (Gmon/Gmoff ) (b) Forecast
accuracy vs. the number of CBRAM devices used in each synapse. (c) Forecast accuracy
vs. CBRAM switching probability. (d) Forecast accuracy vs. σC (variation in the capacitors
used in the synaptic circuits).
An important design parameter is the number of CBRAM devices used per synapse k.
Fewer devices leads to reduced area. However, using more devices allows each synapse
to have higher weight resolution. The results are shown in Figure 7.3(b). We observe an
asymptotic increase in the accuracy as the number of devices increases. In fact, one can
show that as the number of CBRAM devices per synapse becomes large, then the forecast
accuracy will approach the accuracy of the ideal fit (98%). It is also worth noting that the
run-to-run variance of the accuracy becomes smaller with increased synaptic redundancy.
7.1. Case Study: NMS for Electrical Load Forecasting 132
We have empirically chosen a synaptic redundancy of 20 as the optimal value as it exhibits
high accuracy, low variance, and uses relatively few devices (e.g. compared to 32).
Another critical design parameter is the CBRAM switching probability. In Figure
7.3(c), we show the forecasting accuracy vs. pswitch. The shape of the curve created by
the mean accuracies is concave, with the optimal (highest mean accuracy) occurring at a
0.05 switching probability. At lower switching probabilities, the system takes a long time
to converge, so 500 training epochs are not enough to reach a maximum accuracy. On the
other hand, as the switching probability becomes large, it is difficult for each synapse to
achieve high resolution weight values because several CBRAM devices may switch concur-
rently. Therefore, a low switching probability (< 0.25) is desired for > 90% test accuracy.
As discussed in Section 3.2.2, the switching probability can be controlled by adjusting the
flux applied to the CBRAM devices during the write process.
Figure 7.3(d) shows the impact of synapse capacitance variation. The accuracy is fairly
consistent when the capacitance variations are large (e.g. 50%), demonstrating the robust-
ness of the proposed system. This is a critical property, as it is generally difficult to control
capacitance values within 30% error. This result also demonstrates a general property of
neuromoprhic systems, which we have explored in detail in [121]. That is, the system is
able to incorporate and compensate for device-level variations during the learning process
making it inherently robust.
Chapter 7. Effects of Device Variations on System-Level Performance 133
7.2 Off-Chip Training
Training NMSs presents a formidable challenge due to several CMOS and memristor pro-
cess variations. On-chip training is very effective at overcoming these, as demonstrated in
the last section. However, it is not feasible to implement many sophisticated training algo-
rithms on-chip. On the other hand, off-chip training has the advantage of flexible software
training algorithms. A high-level overview of off-chip training for an NMS is illustrated
in Figure 7.4. An ideal model of the network is trained off-chip using a software training
algorithm (e.g. backpropagation, resilient backpropagation, Levenberg-Marquardt, genetic
algorithms, etc.). Then, data from the trained model are used to train the on-chip NMS.
Choosing which data (the weight matrix, the training vectors, the expected outputs of in-
dividual neurons, etc.) to incorporate into this training process is challenging, especially
when the on-chip NMS is affected by process variations.
Neuromemristive 
System (on-chip)
Ideal Model of 
Neuromemristive 
System (off-chip)
Train
Software Outputs
Input Data CMOS and Memristor 
Process Variations
Software 
Training 
Algorithm
MemristorCMOS
Train
W
u
xi
Hardware Outputs
Figure 7.4: High-level depiction of off-chip training for an NMS.
7.2. Off-Chip Training 134
This research presents two fundamentally different approaches to training NMSs off-
chip: weight programming and feature training. The first step in both methods is to train
an ideal (no process variations) software model of the NMS. In the weight programming
method, the weight matrix W is directly transferred onto the chip. That is, each weight
value in the hardware NMS is programmed to match the corresponding value in the ideal
NMS model. This method is the most straight-forward, and has been successfully demon-
strated in [122] for an NMS perceptron (single-layer neural network). In the feature training
method, each neuron in the NMS is trained (using our proposed training rule) sequentially,
using their expected outputs from the trained software model. In both training approaches,
analog and digital interface circuitry are needed including shift registers, logic gates, coun-
ters and current comparators, and sample and hold circuits. The interface circuitry overhead
is similar for both weight programming and feature training.
7.2.1 Weight Programming
The proposed weight programming algorithm is shown in Algorithm 3. First the weight
matrix W is found by training the ideal NMS model using e.g. resilient backpropagation
algorithm. Then, each valid weight value (corresponding to a non-zero entry in the adja-
cency matrix A) is transferred onto the chip. The address of the corresponding hardware
synapses is found using an address mapping Addr and programming is enabled, allowing
write voltages to be applied to the memristors in the synapse. Note that the output node of
the synapse is grounded during programming. After a specified number of epochs Nepochs,
the process repeats for the next weight value.
Chapter 7. Effects of Device Variations on System-Level Performance 135
Algorithm 3 Weight programming
W =weights from off-chip software training
for all i = 1 : Nx do
for all j = 1 : Nx do
prog en=‘0’
if A(i, j) then
addr=Addr(i, j)
waddr = wij
prog en=‘1’
Wait for NepochsTclk
end if
end for
end for
The total training time for weight programming is estimated as
twp ≈ NwTclkNepochs (7.2)
where Nw is the total number of weights to be programmed. The area of the NMS, exclud-
ing the interface circuitry, can roughly be estimated as
Awp
WL
≈ ts[3(2 +N) + 7(Nh +M) + 3Nw] (7.3)
where ts is a transistor size factor, which is nominally 1. The constants N , Nh, and M
are the numbers of inputs, hidden-layer (middle layer) neurons, and the number of outputs,
respectively.
7.2.2 Feature Training
In the weight programming scheme, there was no actual training performed in hardware.
In contrast, feature training treats each of the neurons in the network as a single-layer
7.2. Off-Chip Training 136
perceptron. Each one is trained using our proposed training rule, starting with those in
the hidden layer. The algorithm is shown in Algorithm 4. First, the ideal model of the
network is trained off-chip. Then, each on-chip sigmoid neuron i is trained for Nepochs.
Here, Nx = Nh + M . Within each epoch, the algorithm loops through all of the training
vectors u. An address signal is set to isolate the synapses that are inputs to the ith neuron.
Then, a training enable signal is set and the neuron is trained for lr clock cycles. The lr
constant, which is a positive integer, is related to the learning rate α as
α = (VDD + VSS)lrTclkImax/(2Ic). (7.4)
Algorithm 4 Feature training
Train ideal network model off-chip
for all i = 1 : Nx do
for all epoch = 1 : Nepochs do
for all u do
train en=‘0’
addr=Addr(i)
Set xˆi from trained network
Set u as the input
train en=‘1’
Wait for lrTclk
end for
end for
end for
The total training time for feature training is estimated similar to the weight program-
ming training time as
tft ≈ NxNepochsNuTclklr. (7.5)
The input neurons and the bias neurons are not trained, so they are not counted towards the
total training time. For small networks, this training time may be much larger than that of
Chapter 7. Effects of Device Variations on System-Level Performance 137
the weight programming training time. The area of the NMS for feature training can be
estimated as:
Aft
WL
≈ ts[3(2 +N) + 7(Nh +M) + 3Nw
+ 2(N +Nh + 2)] + 6(N +Nh + 2).
.
(7.6)
Comparing (7.6) and (7.3), we see that extra terms have been added to account for the
training circuitry required by the feature training method.
7.2.3 Results for Classification
Both of the proposed off-chip training methods were tested for MNIST classification. The
simulation setup is identical to the MNIST classification in the last chapter. However, in-
stead of using random hidden-layer weights, the NMS in this section uses a regular single
hidden layer MLP. An ideal model (nominal device parameters) of the NMS with 25 inputs,
20 hidden neurons and 10 outputs was trained in MATLAB using the resilient backprop-
agation algorithm. The results are shown in Figure 7.5(a). After 500 training epochs, the
training and test classification accuracies (percentage of correctly-classified input vectors)
were ≈94% and ≈85%, respectively. Therefore, 85% test classification accuracy was our
target and an upper limit for each of the off-chip training methods.
Behavioral models of the NMS hardware with process variations were trained using the
results from the ideal NMS. The two training algorithms–weight programming and feature
training–were presented in the last section. The simulation parameters for the feature train-
ing algorithm are Nepochs=30 and lr=2000. The value of lr is calculated from (7.4) with
VDD+VSS=1.1 V, Tclk=10 µs, Imax = Ic=100 nA, and α=0.01. All of the same parameters
7.2. Off-Chip Training 138
0 100 200 300 400 500
0
20
40
60
80
100
TrainingSEpoch
C
la
ss
ifi
ca
tio
nS
A
cc
ur
ac
yS
[%
]
TrainingSSet
TestSSet
(a)
0 5 10 15 20 25 300
20
40
60
80
100
ts
C
la
ss
ifi
ca
tio
nm
A
cc
ur
ac
ym
[M
]
WeightmProgramming
FeaturemTraining
IdealmModel
(b)
0 0.2 0.4 0.6 0.8 10
20
40
60
80
100
1/ts
C
la
ss
ifi
ca
tio
nm
A
cc
ur
ac
ym
[M
]
WeightmProgramming
FeaturemTraining
IdealmModel
(c)
0 5 10 15 20 25 300
0.005
0.01
0.015
0.02
0.025
0.03
ts
A
cc
ur
ac
y/
U
ni
t A
re
a 
[%
]
Weight Programming
Feature Training
(d)
Figure 7.5: MNIST classification results: (a) Classification accuracy versus training epoch
for the ideal NMS model (off-chip). (b) Classification accuracy versus the transistor size
factor ts for weight programming and feature training. (c) Classification accuracy versus
1/ts, showing the inverse area dependence of the accuracy for both methods. (d) Accuracy
per unit area versus ts for both training methods.
are used for the weight programming algorithm where applicable, except for Nepochs which
has a value of 1.
Figure 7.5(b) shows the test classification accuracies for both methods versus the tran-
sistor size factor ts. A black horizontal line is placed at 85%, indicating the accuracy of
the ideal network. Each data point is the average over 10 independent runs, and error bars
indicate the standard deviations. We notice a few key points from the plot. First, except
for the case where ts=‘1’, the accuracy of the feature training method is relatively constant
Chapter 7. Effects of Device Variations on System-Level Performance 139
and does not vary much with ts. This indicates that, since part of the training is being per-
formed on-chip, the NMS is learning to compensate for its own process variations. Notice
that very high accuracy can be achieved with relatively low ts values. This is not the case
for the weight programming method, which can not compensate for process variations.
Second, we observe that even with very large ts, the weight programming method is not
quite able to achieve 85% accuracy. This is likely because of the fact that not all process
variations are reduced by increasing the transistor sizes. For example, the resistor Rin at
the input of the sigmoid neuron will vary independently of ts. Also note that ts governs
the transistor sizes, and the transistor areas are inversely proportional to the magnitude of
the process variations. Therefore, we would expect the classification accuracy to also be
inversely proportional to the 1/ts. Figure 7.5(c), shows the classification accuracy versus
1/ts. Straight line fits show that the accuracy is in fact inversely related to the transistor
area. The slope of the lines indicates the sensitivity of the accuracy to variations in 1/ts.
Note that the slope for the feature training method is much smaller than that of the weight
programming algorithm.
So far, the results in this section indicate that the feature training method has superior
accuracy over weight programming, especially when small transistors are used. However,
one must also consider the overheads associated with each method, in terms of area and
performance. In Figure 7.5(d), we show the accuracy per unit area versus ts for both
training methods. The unit area is the area divided by the minimum area WL, and is
found using (7.3) and (7.6). For small ts, the accuracy per unit area is much larger (over
2x at ts=1) for the feature training method. However, as ts increases, the returns quickly
diminish. We have also calculated the approximate training time for both algorithms using
7.3. Summary 140
(7.2) and (7.5). While weight programming takes only ≈ 5.5 ms, feature training takes
close to 5 hours. This time can be reduced by tweaking the circuit parameters, such as
increasing the memristor write voltage of the training circuit. It could also be reduced
significantly by using smaller training sets. Regardless, feature training is the best method
for applications where an NMS needs to be trained infrequently, and there are large area
and accuracy constraints.
7.3 Summary
This chapter studied the effects of device-level variations on system-level NMS perfor-
mance. On-chip, off-chip, and partial off-chip training methods were studied. The signifi-
cant outcomes/conclusions are:
• On-chip training is very robust to device-level variations, but is limited to simple
training algorithms.
• Off-chip training enables the use of sophisticated training algorithms, but must be
implemented carefully to avoid loss of accuracy. This work proposes an off-chip
training method where partial on-chip training is used to improve accuracy.
Chapter 8
Conclusions and Future Work
This dissertation has addressed several open problems related to circuit and architecture
design for NMSs. Memristive circuit designs were based on semi-empirical device models
to capture realistic effects observed in experimental data. To successfully move forward
and design better NMSs, it is critical that device engineers work more closely with circuit
designers. Currently, there is not enough breadth or depth of experimental results for any
one RRAM memristor device to develop accurate models. In addition, models that are de-
veloped by device engineers are often only valid under very restricted operating conditions.
It is imperative that certain device properties, such as the on resistance, and the number of
achievable resistance states be improved. In this work, memristors were programmed by
applying constant or varying-width voltage pulses. However, there are several devices that
exhibit incremental resistance switching based on altering voltage pulse amplitude or com-
pliance current. Future work should explore circuit designs that can leverage these devices
as well.
At the circuit design level, this work provides new designs for synaptic communication,
neuronal computation, and plasticity with reduced area and power consumption over pre-
vious work. Models were developed to describe the effects of process variations. Moving
141
142
forward, it is important that the effects of PVT corners and parasitic R, L, and C com-
ponents are also modeled. The designs presented in this research exhibit some obvious
advantages over analog voltage-mode circuits (e.g. input/output range, reduced buffering,
etc.) and digital circuits (area), but it is unclear at the present time whether NMSs are
generally more efficient when implemented in an analog, digital, or spiking paradigm. It’s
likely that the optimal design choices will be application-dependent.
Novel training algorithms and hardware implementations were proposed to reduce de-
sign and area cost of existing training circuits. The algorithms are shown to converge
through simulation, however, their convergence properties remain to be studied rigorously.
One approach is to apply stochastic approximation methods to approximate the algorithm’s
behavior as a continuous differential equation. Further improvement of the SLMS (both on-
line and batch versions) could be made with reduced-cost random number generation. The
current method uses linear feedback shift registers, which have a large area cost.
The utility of the circuits designed in this research was demonstrated by integrating
them into NMSs for visual information processing tasks. Neural network topologies were
designed for edge detection, image clustering, and image classification. Performance on
these tasks are modest in terms of accuracy, when compared to state-of-the-art algorithms
which do not have the same constraints placed on the designs in this work (e.g. power and
area). A major hurdle at this stage of research is the lack of metrics that can be used to com-
pare designs across power, area, accuracy, and other dimensions. Overall, the results of this
dissertation suggest current-mode NMSs trained with stochastic learning algorithms repre-
sent a feasible approach to designing intelligent and efficient computing architectures. In
the near-term, NMSs are likely to serve as accelerators for special processes (e.g. anomaly
Chapter 8. Conclusions and Future Work 143
detection) alongside conventional architectures.
Appendix A
Derivation of si for Constant
Current Mirror Synapses
From (4.2) the post-synaptic input current is
isi =
Ne∑
j=1
ixjwi,j
Λp (|vds2|, L2)i,j
Λp (|vds1|, L1)i,j
+
Ne+Ni∑
j=Ne+1
ixjwi,j
Λn (|vds2|, L2)i,j
Λn (|vds1|, L1)i,j
(A.1)
Let us now focus on the Λ functions of the output transistors:
Λp(n) (|vds2|, L2)i,j = exp
(
Θp(n)
(
L2i,j
) |vds2i,j |) . (A.2)
For excitatory synapses, |vds2i,j | = VDD−isRin. Substituting this into (A.2) and expanding
into a second order Taylor polynomial around isi = 0 leads to
Λp (|vds2|, L2)i,j ≈ Λp
(
VDD, L2i,j
) [
1−Θp
(
L2i,j
)
Rinisi +
Θ2p
(
L2i,j
)
R2ini
2
si
2
]
. (A.3)
Similarly, since for inhibitory synapses, |vds2i,j | = VSS + isRin,
Λn (|vds2|, L2)i,j ≈ Λn
(
VSS, L2i,j
) [
1 + Θn
(
L2i,j
)
Rinisi +
Θ2n
(
L2i,j
)
R2ini
2
si
2
]
. (A.4)
144
Appendix A. Derivation of si for Constant Current Mirror Synapses 145
Now, plugging (A.3) and (A.4) into (A.1) leads to a quadratic equation in isi , with the
solution
isi ≈
−B ±√B2 − 4AC
2A
(A.5)
where
A =
Ne∑
j=1
ixjwi,j
Λp(VDD,L2i,j)Θ2p(L2i,j)R2in
2Λp(|vds1|,L1)i,j +
Ne+Ni∑
j=Ne+1
ixjwi,j
Λn(VSS ,L2i,j)Θ2n(L2i,j)R2in
2Λn(|vds1|,L1)i,j
B = −
Ne∑
j=1
ixjwi,j
Λp(VDD,L2i,j)Θp(L2i,j)Rin
Λp(|vds1|,L1)i,j +
Ne+Ni∑
j=Ne+1
ixjwi,j
Λn(VSS ,L2i,j)Θn(L2i,j)Rin
Λn(|vds1|,L1)i,j − 1
C =
Ne∑
j=1
ixjwi,j
Λp(VDD,L2i,j)
Λp(|vds1|,L1)i,j +
Ne+Ni∑
j=Ne+1
ixjwi,j
Λn(VSS ,L2i,j)
Λn(|vds1|,L1)i,j
(A.6)
When channel lengths are large or the input resistances are small, B ≈ √B2 − 4AC,
resulting in a loss of precision from (A.5). Therefore, we use the fact that the product of
the two roots is equal to C/A, resulting in:
si ≈ 2C
Imax
(−B +√B2 − 4AC) . (A.7)
Note that this equation is only valid when Rin 6= 0. Otherwise
si ≈ C
Imax
(A.8)
Bibliography
[1] C. Mead, Analog VLSI and Neural Systems. Addison-Wesley, 1989.
[2] http://www.darpa.mil/Our Work/DSO/Programs/Systems of Neuromorphic
Adaptive Plastic Scalable Electronics (SYNAPSE).aspx.
[3] https://www.fbo.gov/index?s=opportunity&mode=form&id=
91bc9e58d6fa024d55d7c0583d38fc21&tab=core& cview=0.
[4] http://www.darpa.mil/Our Work/DSO/Programs/Physical Intelligence.aspx.
[5] http://www.darpa.mil/Our Work/MTO/Programs/Unconventional Processing of
Signals for Intelligent Data Exploitation (UPSIDE).aspx.
[6] https://www.humanbrainproject.eu/.
[7] http://bluebrain.epfl.ch/.
[8] ITRS, “International technology roadmap for semiconductors,” 2013. [Online].
Available: www.itrs.net
[9] L. Chua, “Memristor–The missing circuit element,” IEEE Transactions on Circuit
Theory, vol. CT-18, no. 5, pp. 507–519, 1971.
[10] L. Chua and S.-M. Kang, “Memristive Devices and Systems,” vol. 64, no. 2, 1976.
[11] L. Chua, “Resistance switching memories are memristors,” Applied Physics A, vol.
102, no. 4, pp. 765–783, Jan. 2011.
[12] J. J. Yang, D. B. Strukov, and D. R. Stewart, “Memristive devices for computing.”
Nature nanotechnology, vol. 8, no. 1, pp. 13–24, Jan. 2013. [Online]. Available:
http://www.ncbi.nlm.nih.gov/pubmed/23269430
[13] D. Kuzum, S. Yu, and H.-S. P. Wong, “Synaptic electronics: materials, devices and
applications.” Nanotechnology, vol. 24, no. 38, p. 382001, Sep. 2013. [Online].
Available: http://www.ncbi.nlm.nih.gov/pubmed/23999572
[14] T. Ishigaki, T. Kawahara, R. Takemura, K. Ono, K. Ito, H. Matsuoka, and H. Ohno,
“A multi-level-cell spin-transfer torque memory with series-stacked magnetotunnel
junctions,” in 2010 Symposium on VLSI Technology, Jun. 2010, pp. 47–48.
146
Bibliography 147
[15] R. Waser, R. Dittmann, G. Staikov, and K. Szot, “Redox-Based Resistive Switching
Memories - Nanoionic Mechanisms, Prospects, and Challenges,” Advanced
Materials, vol. 21, no. 25-26, pp. 2632–2663, Jul. 2009. [Online]. Available:
http://doi.wiley.com/10.1002/adma.200900375
[16] S. D. Ha and S. Ramanathan, “Adaptive oxide electronics: A review,” Journal
of Applied Physics, vol. 110, no. 7, pp. 071 101–1, 2011. [Online]. Available:
http://link.aip.org/link/JAPIAU/v110/i7/p071101/s1\&Agg=doi
[17] S. Yu, B. Lee, and H. S. P. Wong, “Metal oxide resistive switching memory,” in
Functional Metal Oxide Nanostructures, ser. Springer Series in Materials Science,
J. Wu, J. Cao, W.-Q. Han, A. Janotti, and H.-C. Kim, Eds. New York, NY: Springer
New York, 2012, vol. 149, pp. 303–335.
[18] Y. Yang and W. Lu, “Nanoscale resistive switching devices: mechanisms and mod-
eling.” Nanoscale, vol. 5, no. 21, pp. 10 076–92, Nov. 2013.
[19] D. B. Strukov and R. S. Williams, “Exponential ionic drift: fast switching and low
volatility of thin-film memristors,” Applied Physics A, vol. 94, no. 3, pp. 515–519,
Nov. 2008.
[20] J. J. Yang, M. D. Pickett, X. Li, D. Ohlberg, D. R. Stewart, and R. S. Williams,
“Memristive switching mechanism for metal/oxide/metal nanodevices.” Nature
Nanotechnology, vol. 3, no. 7, pp. 429–33, Jul. 2008.
[21] D. B. Strukov, J. L. Borghetti, and R. S. Williams, “Coupled ionic and electronic
transport model of thin-film semiconductor memristive behavior,” Small, vol. 5,
no. 9, pp. 1058–63, May 2009.
[22] N. R. Mcdonald, “Al/CuxO/Cu memristive devices : fabrication, characterization,
and modeling,” Master’s Thesis, SUNY Albany, 2012.
[23] Z. Biolek, D. Biolek, and V. Biolkova´, “SPICE Model of Memristor with Nonlinear
Dopant Drift,” Radioengineering, vol. 18, no. 2, pp. 210–214, 2009.
[24] A. Rak and G. Cserey, “Macromodeling of the Memristor in SPICE,” IEEE Transac-
tions on Computer-Aided Design of Integrated Circuits and Systems, vol. 29, no. 4,
pp. 632–636, Apr. 2010.
[25] Y. Zhang, X. Zhang, and J. Yu, “Approximated SPICE model for memristor,”
2009 International Conference on Communications, Circuits and Systems, no. 5, pp.
928–931, Jul. 2009. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/
wrapper.htm?arnumber=5250371
[26] D. Batas and H. Fiedler, “A memristor SPICE implementation and a new
approach for magnetic flux-controlled memristor modeling,” IEEE Transactions
on Nanotechnology, vol. 10, no. 2, pp. 250–255, Mar. 2011. [Online]. Available:
http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5373921
Bibliography 148
[27] C. Yakopcic, T. Taha, G. Subramanyam, and R. Pino, “Memristor SPICE Model
and Crossbar Simulation Based on Devices with Nanosecond Switching Time,” in
International Joint Conference on Neural Networks, 2013, pp. 464–470.
[28] Y. Chen and X. Wang, “Compact Modeling and Corner Analysis of Spintronic Mem-
ristor Invited Paper,” in IEEE/ACM International Symposium on Nanoscale Archi-
tectures, 2009, pp. 7–12.
[29] S. Shin, K. Kim, and S.-M. Kang, “Compact Models for Memristors Based on
Charge-Flux Constitutive Relationships,” IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, vol. 29, no. 4, pp. 590–598, Apr. 2010.
[30] P. Sheridan, K.-H. Kim, S. Gaba, T. Chang, L. Chen, and W. Lu, “Device and SPICE
Modeling of RRAM Devices,” Nanoscale, vol. 3, no. 9, pp. 3833–40, Sep. 2011.
[31] K.-t. Cheng and D. B. Strukov, “3D CMOS-Memristor Hybrid Circuits : Devices ,
Integration , Architecture , and Applications,” in ISPD, 2012, pp. 33–40.
[32] S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder, and W. Lu, “Nanoscale
memristor device as synapse in neuromorphic systems.” Nano Letters, vol. 10, no. 4,
pp. 1297–301, Apr. 2010.
[33] Perez-Carrasco, “On neuromorphic spiking architectures for asynchronous STDP
memristive systems,” in International Symposium on Circuits and Systems, 2010,
pp. 77–80.
[34] A. Afifi, A. Ayatollahi, and F. Raissi, “Implementation of biologically plausible spik-
ing neural network models on the memristor crossbar-based CMOS/nano circuits,”
European Conference on Circuit Theory and Design, pp. 563–566, Aug. 2009.
[35] I. E. Ebong and P. Mazumder, “CMOS and memristor-based neural network design
for position detection,” Proceedings of the IEEE, vol. 100, no. 6, pp. 2050–2060,
Jun. 2012.
[36] M. Laiho and E. Lehtonen, “Cellular nanoscale network cell with memristors for
local implication logic and synapses,” in International Symposium on Circuits and
Systems, May 2010, pp. 2051–2054.
[37] K. D. Cantley, “Artificial Neural Systems Using Memristive Synapses and Nano-
Crystalline Silicon Thin-Film Transistors,” Ph.D. dissertation, University of Texas
at Dallas, 2011.
[38] H. Kim, M. P. Sah, C. Yang, T. Roska, and L. O. Chua, “Neural synaptic weighting
with a pulse-based memristor circuit,” IEEE Transactions on Circuit Theory, vol. 59,
no. 1, pp. 148–158, 2012.
[39] ——, “Memristor Bridge Synapses,” Proceedings of the IEEE, vol. 100, no. 6, pp.
2061–2070, Jun. 2012.
Bibliography 149
[40] B. Liu, Y. Chen, B. Wysocki, and T. Huang, “The Circuit Realization of a Neuro-
morphic Computing System with Memristor-Based Synapse Design,” in ICONIP,
2012, vol. 1, pp. 357–365.
[41] G. S. Rose, R. Pino, and Q. Wu, “A Low-Power Memristive Neuromorphic Circuit
Utilizing a Global/Local Training Mechanism,” in IJCNN, no. 1, 2011, pp. 2080–
2086.
[42] A. Sinha, M. S. Kulkarni, and C. Teuscher, “Evolving nanoscale associative memo-
ries with memristors,” in IEEE International Conference on Nanotechnology, 2011,
pp. 861–864.
[43] M. Hu, H. Li, Q. Wu, G. S. Rose, and Y. Chen, “Memristor crossbar based
hardware realization of BSB recall function,” in International Joint Conference
on Neural Networks, ser. IJCNN’12, Jun. 2012, pp. 1–7. [Online]. Available:
http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6252563
[44] M. S. Kulkarni, “Memristor-based Reservoir Computing,” in Nanoarch, 2012, pp.
226–232.
[45] C. Xu, X. Dong, N. P. Jouppi, and Y. Xie, “Design implications of memristor-
based RRAM cross-point structures,” in Design Automation and Test in Europe, ser.
DATE’11, 2011.
[46] R. Cell, M. Zangeneh, and A. Joshi, “Performance and energy models for memristor-
based 1T1R RRAM cell,” in ACM Great Lakes Symposium on VLSI Design, ser.
GLSVLSI’12, 2012, pp. 9–14.
[47] D. Niu, “Low power memristor-based ReRAM design with Error Correcting
Code,” in Asia and South Pacific Design Automation Conference, ser. ASPDAC’12,
Jan. 2012, pp. 79–84. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/
wrapper.htm?arnumber=6165062
[48] D. Brenner, C. Merkel, and D. Kudithipudi, “Design-time performance evaluation of
thermal management policies for SRAM and RRAM based 3D MPSoCs,” in ACM
Great Lakes Symposium on VLSI Design, ser. GLSVLSI’12, 2012, pp. 177–182.
[49] H. Kim, M. P. Sah, C. Yang, and L. O. Chua, “Memristor-based multilevel memory,”
in International Workshop on Cellular Nanoscale Networks and their Applications,
ser. CNNA’10, vol. 1, no. 5, 2010, pp. 1–6.
[50] C. Merkel, “Thermal Profiling in CMOS/Memristor Hybrid Architectures,” Master’s
Thesis, Rochester Institute of Technology, 2011.
[51] S. P. Adhikari, C. Yang, H. Kim, and L. O. Chua, “Memristor Bridge Synapse-
Based Neural Network and Its Learning,” IEEE Transactions on Neural Networks
and Learning Systems, vol. 23, no. 9, pp. 1426–1435, Sep. 2012.
Bibliography 150
[52] D. Querlioz, O. Bichler, and C. Gamrat, “Simulation of a memristor-based
spiking neural network immune to device variations,” The 2011 International Joint
Conference on Neural Networks, pp. 1775–1781, Jul. 2011. [Online]. Available:
http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6033439
[53] M. Suri, D. Querlioz, O. Bichler, G. Palma, E. Vianello, D. Vuillaume, C. Gam-
rat, and B. Desalvo, “Bio-inspired stochastic computing using binary CBRAM
synapses,” IEEE Transactions on Electron Devices, vol. 60, no. 7, pp. 2402–2409,
2013.
[54] J. V. Neumann, “Probabilistic logics and the synthesis of reliable organsisms from
unreliable components,” in Automata Studies, C. E. Shannon and J. McCarthy, Eds.
Princeton University Press, 1956, pp. 43–98.
[55] B. R. Gaines, “Stochastic Computing Systems,” in Computing Systems, 1965.
[56] B. D. Brown and H. C. Card, “Stochastic Neural Computation I : Computational
Elements,” IEEE Transactions on Computers, vol. 50, no. 9, pp. 891–905, 2001.
[57] S. L. Toral, J. M. Quero, and L. G. Franquelo, “Stochastic pluse coded arithmetic,”
in International Symposium on Circuits and Systems, ser. ISCAS’00, 2000, pp. I–
599–I–602.
[58] W. Qian, M. D. Riedel, and I. Rosenberg, “Uniform approximation and
Bernstein polynomials with coefficients in the unit interval,” European Journal
of Combinatorics, vol. 32, no. 3, pp. 448–463, Apr. 2011. [Online]. Available:
http://linkinghub.elsevier.com/retrieve/pii/S0195669810001666
[59] W. Qian and M. D. Riedel, “The Synthesis of Robust Polynomial Arithmetic with
Stochastic Logic,” in Design Automation Conference, ser. DAC’08, 2008, pp. 648–
653.
[60] W. Qian and J. Backes, “The Synthesis of Stochastic Circuits for Nanoscale Compu-
tation,” International Journal of Nanotechnology, vol. 1, no. December 2009, 2010.
[61] S. Sato, K. Nemoto, S. Akimoto, M. Kinjo, and K. Nakajima, “Implementation of
a new neurochip using stochastic logic.” IEEE transactions on neural networks / a
publication of the IEEE Neural Networks Council, vol. 14, no. 5, pp. 1122–7, Jan.
2003.
[62] M. Pelgrom, A. Duinmaijer, and A. Welbers, “Matching properties of MOS transis-
tors,” IEEE J. Solid-State Circuits, vol. 24, no. 5, pp. 1433–1439, Oct. 1989.
[63] P. R. Kinget, “Device Mismatch and Tradeoffs in the Design of Analog Circuits,”
IEEE Journal of Solid-State Circuits, vol. 40, no. 6, pp. 1212–1224, 2005.
[64] M. Pelgrom, H. Tuinhout, and M. Vertregt, “A designer’s view on mismatch,” in
Nyquist AD Converters, Sensor Interfaces, and Robustness, A. H. van Roermund,
A. Baschirotto, and M. Steyaert, Eds. New York, NY: Springer New York,
Bibliography 151
2013, ch. 13, pp. 245—-267. [Online]. Available: http://link.springer.com/10.1007/
978-1-4614-4587-6
[65] K. Kuhn, C. Kenyon, A. Kornfeld, M. Liu, A. Maheshwari, W.-k. Shih, S. Sivaku-
mar, G. Taylor, P. Vandervoorn, and K. Zawadzki, “Managing process variation in
Intel’s 45 nm CMOS technology,” Intel Technology Journal, vol. 12, no. 2, pp. 93—
-110, 2008.
[66] A. S. Oblea, A. Timilsina, D. Moore, and K. A. Campbell, “Silver Chalcogenide
Based Memristor Devices,” in Proceedings of the IEEE, vol. 3, 2010, pp. 4–6.
[67] J. G. Simmons, “Generalized formula for the electric tunnel effect between similar
electrodes separated by a thin insulating film,” Journal of Applied Physics, vol. 34,
no. 6, p. 1793, 1963.
[68] D. J. Griffiths, Introduction to quantum mechanics, 2nd ed. Pearson Education,
Inc., 2005.
[69] J. G. Simmons and R. R. Verderber, “New conduction and reversible memory phe-
nomena in thin insulating films,” Proceedings of the Royal Society A: Mathematical,
Physical and Engineering Sciences, vol. 301, no. 1464, pp. 77–102, 1967.
[70] N. F. Mott and R. W. Gurney, Electronic processes in ionic crystals. Oxford Uni-
versity Press, 1940.
[71] C. Yakopcic, S. Member, T. M. Taha, G. Subramanyam, S. Member, R. E. Pino, and
S. Rogers, “A Memristor Device Model,” vol. 32, no. 10, pp. 1436–1438, 2011.
[72] C. Yakopcic, T. M. Taha, G. Subramanyam, R. E. Pino, and S. Rogers, “Analysis of
a Memristor based 1T1M Crossbar Architecture,” in IJCNN, 2011, pp. 3243–3247.
[73] Memristor data courtesy of Kris Campbell, Boise State University, and Knowm, Inc.
[74] R. Waser and M. Aono, “Nanoionics-based resistive switching memories,” Nature
materials, vol. 6, no. 11, pp. 833–840, 2007.
[75] M. Suri, O. Bichler, D. Querlioz, G. Palma, E. Vianello, D. Vuillaume, C. Gamrat,
and B. DeSalvo, “Cbram devices as binary synapses for low-power stochastic neu-
romorphic systems: auditory (cochlea) and visual (retina) cognitive processing ap-
plications,” in Electron Devices Meeting (IEDM), 2012 IEEE International. IEEE,
2012, pp. 10–3.
[76] S. Lin, L. Zhao, J. Zhang, H. Wu, Y. Wang, H. Qian, and Z. Yu, “Electrochemical
simulation of filament growth and dissolution in conductive-bridging ram (cbram)
with cylindrical coordinates,” in Electron Devices Meeting (IEDM), 2012 IEEE In-
ternational. IEEE, 2012, pp. 26–3.
[77] S. Yu and H. P. Wong, “Compact Modeling of Conducting-Bridge,” IEEE Transac-
tions on Electron Devices, vol. 58, no. 5, pp. 1352–1360, 2011.
Bibliography 152
[78] G. Palma, E. Vianello, C. Cagli, G. Molas, M. Reyboz, P. Blaise, B. D. Salvo,
F. Longnos, and F. Dahmani, “Experimental investigation and empirical modeling
of the set and reset kinetics of Ag-GeS 2 Conductive Bridging Memories,” in 4th
IEEE International Memory Workshop, 2012.
[79] H. Manem, G. S. Rose, X. He, and W. Wang, “Design Considerations for Variation
Tolerant Multilevel CMOS / Nano Memristor Memory,” Computer Engineering, pp.
287–292, 2010.
[80] D. Niu, Y. Chen, C. Xu, and Y. Xie, “Impact of Process Variations on Emerging
Memristor,” Analysis, pp. 877–882, 2010.
[81] M. Hu, H. Li, Y. Chen, X. Wang, and R. E. Pino, “Geometry variations analysis
of TiO2 thin-film and spintronic memristors,” 16th Asia and South Pacific Design
Automation Conference (ASP-DAC 2011), pp. 25–30, Jan. 2011.
[82] J. Rajendran, H. Maenm, R. Karri, and G. S. Rose, “An Approach to Tolerate Pro-
cess Related Variations in Memristor-Based Applications,” 2011 24th Internatioal
Conference on VLSI Design, pp. 18–23, Jan. 2011.
[83] I. Bellido and E. Fiesler, “Do backpropagation trained neural networks have normal
weight distributions?” in International Conference on Artificial Neural Networks,
ser. ICANN’93, 1993, pp. 772–725.
[84] I. Bayraktaroglu, A. S. Ogrenci, G. Dundar, S. Balkir, and E. Alpaydin, “ANNSyS :
an Analog Neural Network Synthesis System,” Neural Networks, vol. 12, pp. 325–
338, 1999.
[85] B. Gilbert, “Translinear circuits: a proposed classification,” Electronics Letters,
vol. 11, no. 1, p. 14, 1975. [Online]. Available: http://digital-library.theiet.org/
content/journals/10.1049/el\ 19750011
[86] C. Toumazou, F. J. Lidgey, and D. G. Haigh, Eds., Analogue IC Design: The current-
mode approach. Peter Peregrinus Ltd., 1990.
[87] M. P. Sah, C. Yang, H. Kim, T. Roska, and L. Chua, “Memristor bridge
circuit for neural synaptic weighting,” 2012 13th International Workshop on
Cellular Nanoscale Networks and their Applications, vol. 2, no. 3, pp. 1–5, Aug.
2012. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?
arnumber=6331434
[88] M. P. Sah, C. Yang, H. Kim, and L. Chua, “A voltage mode memristor bridge
synaptic circuit with memristor emulators.” Sensors (Basel, Switzerland), vol. 12,
no. 3, pp. 3587–604, Jan. 2012. [Online]. Available: http://www.pubmedcentral.nih.
gov/articlerender.fcgi?artid=3376599\&tool=pmcentrez\&rendertype=abstract
[89] J. Rajendran, H. Manem, R. Karri, and G. S. Rose, “An energy-efficient memristive
threshold logic circuit,” IEEE Transactions on Computers, vol. 61, no. 4, pp. 474–
487, 2012.
Bibliography 153
[90] G.-b. Huang, Q.-y. Zhu, and C.-k. Siew, “Extreme learning machine: a new learn-
ing scheme of feedforward neural networks,” in International Joint Conference on
Neural Networks, vol. 2, 2004, pp. 985–990.
[91] M. Lukosˇevicˇius and H. Jaeger, “Reservoir computing approaches to recurrent neu-
ral network training,” Computer Science Review, vol. 3, no. 3, pp. 127–149, Aug.
2009.
[92] V. K. Rohatgi, An introduction to probability theory and mathematical statistics.
John Wiley & Sons, Inc., 1976.
[93] K.-H. Kim, S. Hyun Jo, S. Gaba, and W. Lu, “Nanoscale resistive memory with
intrinsic diode characteristics and long endurance,” Applied Physics Letters, vol. 96,
no. 5, p. 053106, 2010. [Online]. Available: http://link.aip.org/link/APPLAB/v96/
i5/p053106/s1\&Agg=doi
[94] C. Mead, “Neuromorphic electronic systems,” Proceedings of the IEEE, vol. 78,
no. 10, pp. 1629–1636, 1990. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/
epic03/wrapper.htm?arnumber=58356
[95] C. Lu and B. Shi, “Circuit design of an adjustable neuron activation function and its
derivative,” Electronics Letters, vol. 36, no. 6, pp. 553–555, 2000.
[96] V. S. Babu and M. R. Biju, “Novel circuit realizations of neuron activation function
and its derivative with continuously programmable characteristics and low power
consumption,” International Journal of Advanced Research in Engineering and
Technology, vol. 5, no. 10, pp. 185–200, 2014.
[97] H. Hikawa, “A Digital Hardware Pulse-Mode Neuron With Piecewise Linear Acti-
vation Function,” IEEE Transactions on Neural Networks, vol. 14, no. 5, pp. 1028–
1037, 2003.
[98] M. Soltiz, C. Merkel, D. Kudithipudi, and G. S. Rose, “RRAM-based adaptive neural
logic block for implementing non-linearly separable functions in a single layer,” in
IEEE/ACM International Symposium on Nanoscale Architectures, 2012, pp. 218–
225.
[99] M. Soltiz, S. Member, D. Kudithipudi, C. Merkel, G. S. Rose, and R. E. Pino,
“Memristor-based Neural Logic Blocks for Non-linearly Separable Functions,”
IEEE Transactions on Computers, vol. 62, no. 8, pp. 1597–1606, 2013.
[100] T. C. Carusone, D. A. Johns, and K. W. Martin, Analog Integrated Circuit Design,
2nd ed. John Wiley & Sons, 2012.
[101] C. Merkel, Q. Saleh, C. Donahue, and D. Kudithipudi, “Memristive Reservoir Com-
puting Architecture for Epileptic Seizure Detection,” Procedia Computer Science,
vol. 41, pp. 249—-254, 2014.
Bibliography 154
[102] S. Gaba, P. Sheridan, J. Zhou, S. Choi, and W. Lu, “Stochastic memristive devices
for computing and neuromorphic applications,” Nanoscale, vol. 5, no. 13, pp. 5872–
5878, 2013.
[103] J.-W. Jang, S. Park, Y.-H. Jeong, and H. Hwang, “Reram-based synaptic device for
neuromorphic computing,” in Circuits and Systems (ISCAS), 2014 IEEE Interna-
tional Symposium on. IEEE, 2014, pp. 1054–1057.
[104] A. Schmid, Y. Leblebici, and D. Mlynek, “Two-Stage Charge-Based AnalogLDig-
ital Neuron Circuit with Adjustable Weights,” in International Joint Conference on
Neural Networks, 1999, pp. 2357–2362.
[105] J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the Theory of Neural Compu-
tation. Allan M. Wylde, 1991.
[106] K.-h. Jo, C.-m. Jung, K.-s. Min, and S.-m. S. Kang, “Self-adaptive write circuit for
low-power and variation-tolerant memristors,” IEEE Transactions on Nanotechnol-
ogy, vol. 9, no. 6, pp. 675–678, 2010.
[107] D. McNeill, C. Schneider, and H. Card, “Analog CMOS neural networks based on
Gilbert multipliers with in-circuit learning,” Proceedings of 36th Midwest Sympo-
sium on Circuits and Systems, pp. 1271–1274.
[108] G. Han and S. Edgar, “CMOS Transconductance Multipliers : A Tutorial,” IEEE
Transactions on Circuits and Systems, vol. 45, no. 12, pp. 1550–1563, 1998.
[109] C. Merkel and D. Kudithipudi, “A Stochastic Learning Algorithm for Neuromem-
ristive Systems,” in System on Chip Conference, 2014, pp. 359–364.
[110] B. Widrow, “An adaptive ”ADALINE” neuron using chemical ”Memistors”,” Stan-
ford University, Tech. Rep., 1960.
[111] D. Lu, X.-h. Yu, X. Jin, B. Li, Q. Chen, and J. Zhu, “Neural Network Based Edge
Detection for Automated Medical Diagnosis,” in International Conference on Infor-
mation and Automation, no. June, 2011, pp. 343–348.
[112] E. R. Kandel, J. H. Schwartz, Thomas M. Jessell, S. A. Siegelbaum, and A. J. Hud-
speth, Principles of neural science, 5th ed. McGraw Hill, 2013.
[113] M. Minsky and S. Papert, Perceptrons - Expanded edition: An introduction to com-
putational geometry. MIT Press, 1987.
[114] C. Merkel, D. Kudithipudi, and N. Sereni, “Periodic Activation Functions in
Memristor-based Analog Neural Networks,” in International Joint Conference on
Neural Networks, no. x, 2013, pp. 1–7.
[115] S. Chen, C. F. N. Cowan, and P. M. Grant, “Orthogonal Least Squares Learning
Algorithm,” IEEE Transactions on Neural Networks, vol. 2, no. 2, pp. 302–309,
1991.
Bibliography 155
[116] B. Widrow, A. Greenblatt, Y. Kim, and D. Park, “The No-Prop algorithm: a new
learning algorithm for multilayer neural networks.” Neural Networks, vol. 37, pp.
182–188, Jan. 2013.
[117] X. He and P. Niyogi, “Locality preserving projections,” ser. Advances in Neural
Information Processing Systems, 2003.
[118] http://www.vision.caltech.edu/Image Datasets/Caltech101/.
[119] J. Lazzaro, S. Ryckebusch, M. A. Mahowald, and C. A. Mead, “Winner-take-all net-
works of O(N) complexity,” in Advances in Neural Information Processing Systems,
1988, pp. 703–711.
[120] “PJM.” [Online]. Available: www.pjm.com
[121] C. Merkel and D. Kudithipudi, “Comparison of off-chip training methods for neu-
romemristive systems,” in International Conference on VLSI Design, 2015.
[122] F. Alibart, E. Zamanidoost, and D. B. Strukov, “Pattern classification by memristive
crossbar circuits using ex situ and in situ training.” Nature communications, vol. 4,
no. May, p. 2072, Jun. 2013.
