Wafer Quality Inspection using Memristive LSTM, ANN, DNN and HTM by Adam, Kazybek et al.
Wafer Quality Inspection using Memristive LSTM,
ANN, DNN and HTM
Kazybek Adam, Kamilya Smagulova, Olga Krestinskaya, Alex Pappachen James
Department of Electrical and Computer Engineering
Nazarbayev University
Astana, Kazakhstan
Email: {apj}@ieee.org
Abstract—The automated wafer inspection and quality control
is complex and time consuming task, which can be speed up using
neuromorphic memristive architectures, as a separate inspection
device or integrating directly into sensors. This paper presents the
performance analysis and comparison of different neuromorphic
architectures for patterned wafer quality inspection and clas-
sification. The application of non-volatile memristive devices in
these architectures ensures low power consumption, small on-chip
area scalability. We demonstrate that Long-Short Term Memory
(LSTM) outperforms other architectures for the same number
of training iterations, and has relatively low on-chip area and
power consumption.
I. INTRODUCTION
With the increase of density and complexity of semiconduc-
tor devices on the wafer, wafer surface inspection becomes
increasingly complex, important, and time consuming task.
There are various techniques applied to detect wafer defects,
including image processing [1], optical methods [2] and elec-
tron beam inspection [3]. The automated wafer inspection pro-
cess can be speed up using the machine learning architectures
that can be used as a separate inspection device or integrated
directly into the sensing devices, which can analyze the wafer
without manual inspection and sending data to the computer
for software processing.
In this paper, we investigate and compare the application
of neuromorphic architectures for wafer quality inspection.
We present the performance analysis and comparison of wafer
classification accuracy, on-chip area and power consumption of
a single Perceptron [4], a three-layer Artificial Neural Network
(ANN) [5], Long short-term memory (LSTM) neural netwrok
[6], Deep Neural Network (DNN) [7], [8] and Hierarchical
Temporal Memory (HTM) [9]. The performance of neuro-
morphic architectures is tested using the database of wafer
parameters from [10], consisting of two classes of time-series
data obtained during measurement of inline semiconductor
processing.
II. NEUROMORPHIC ARCHITECTURES FOR WAFER
QUALITY INSPECTION
This paper analyses and compares the ability of hybrid
CMOS-memristive neural architectures such as perceptron,
ANN, DNN, LSTM and modified HTM to perform wafer
classification. Fig. 1 illustrates the circuit blocks of compared
neuromorphic memristive architectures. The weights of these
(a)
(b)
(c)
Fig. 1. Circuit designs of a) DNN b) LSTM gate and c) Modified HTM
architectures are implemented using memristive devices, while
neurons and other computational components are using MOS-
FETs in TSMC CMOS process technology.
Single layer perceptrons are linear binary classifier and
inspired from the biological neuron proposed in [4]. In this
ar
X
iv
:1
80
9.
10
43
8v
1 
 [c
s.E
T]
  2
7 S
ep
 20
18
work, we used a single layer perceptron that has 151 input
values corresponding to the size of the database patterns and
1 output neuron with hyperbolic tangent (tanh) activation
function. The output of the network is yi = tanh(∑xi j ∗wi j),
where yi is an output neuron, xi j are inputs to the perceptron
and wi j are the weights in the network. Three layer ANN, used
in this work, consists of input, output and hidden layers with
151 inputs, 300 hidden layer neurons and single output neuron
showing the binary classes of good and bad wafers. While
DNN shown in Fig. 1 (a) has larger number of layers. In this
work, we investigated the performance of 5 layer DNN with
151-300-50-100-1 nodes in each layer. The tanh activation
function was used in both ANN and DNN.
While ANN and DNN rely on simplistic multilayered in-
formation processing architectures, LSTM and HTM are more
complex and emulate more information processing process in a
human brain incorporating the concepts of gated memories and
contextual processing. LSTM is a neural architecture that uses
the gated computation approach, and LSTM output depends
on the previous state. Fig. 1 (b) illustrates the memristive
hardware implementation of gated LSTM proposed in [6],
[11]. In this work, classification using LSTM was performed
using two layer network topology: LSTM layer with LSTM
units unrolled for 151 time-steps and ANN layer with linear
activation function.
Hierarchical temporal memory is a neural architecture that
mimics computational process in cerebral neocortex [12].
HTM consists two main parts: (1) Spatial Pooler (SP) that
performs encoding of input patterns and outputs Sparse Dis-
tributed Representation (SDR) useful for pattern recognition
and image processing tasks, and (2) Temporal Memory (TM)
that can be used for sequence learning and prediction. There
are several hardware implementations of HTM. In this work,
we explore analog hardware implementation of modified mem-
ristive HTM proposed in [9] and represented in Fig. 1 (c).
III. RESULTS AND DISCUSSION
The system level simulations have been performed per-
formed in Python and MATLAB, while circuit level simu-
lations are performed in SPICE for TSMC 180nm CMOS
process. To investigate the performance of neuromorphic ar-
chitectures, we used the wafer database that includes 151
patterns of inline process control measurements collected from
different sensors during silicon wafer fabrication process [10].
The wafers are divided into two classes: normal and abnormal
wafers. Approximately 14% of patters were used for training
and the remaining 86% were involved in a testing process.
The database is characterized with imbalance of data between
classes, particularly approximately 90% of both training and
testing wafer samples are normal wafers.
Table I shows the performance analysis of different mem-
ristive neural architectures for wafer quality inspection. For all
the architectures, performance analysis was performed for 40
iterations with 1000 training patterns in each with learning
rate λ = 0.001. LSTM architecture consumes less power
comparing to ANN and DNN and allows to achieve highest
Fig. 2. LSTM cell outputs for analog and software implementations for test
wafer 23
accuracy for the same number of iterations. LSTM can be
considered as a best alternative for wafer quality inspection
task. In perceptron, ANN and DNN, the verification were
performed using tanh activation function, as we observed that
linear and sigmoid activation functions did not ensure the
convergence of the neural network weights for wafer database.
Perceptron with a single neuron shows better accuracy for
40× 1000 iterations than ANN and DNN, as the number of
weights that should be trained is smaller. However, even for
4000×1000 training epochs, the maximum accuracy that can
be achieved is approximately 94 %. Also, the DNN accuracy
is lower than ANN due to the large number of layers that
cannot be trained with the small number of iterations. For
ANN and DNN, the performance accuracy can be increased
by increasing the number of training iterations. Modified HTM
method proposed in [9] is not able to converge for the wafer
database, as the number of inputs are limited and the inputs
are the data from various sensors. HTM is more effective
for pattern recognition from the images where there is a
correlation between the features, than for the data from various
separate sensors. The on-chip area and power dissipation for
the corresponding circuits were calculated for offline learning
circuits and pretrained memristive weights based on the data
from [9] and [13].
As LSTM demonstrates the best performance for wafer
classification task, we investigate the system and circuit level
performance of LSTM further. The simulation results for
LSTM are shown in Table II and Fig. 2. Table II demonstrates
the comparison of the LSTM results of system level and
circuit level simulations for several exemplar wafers from the
database. Both of approaches exhibit positive or negative result
depending on class; therefore, even though the circuit and
software simulation results are different, the wafer classifi-
cation can still be successfully performed. Fig. 2 illustrates
time dependent LSTM outputs for a single wafer, comparing
software system level simulation and corresponding circuit
TABLE I
COMPARISON OF LSTM PERFORMANCE WITH ANN, DNN AND HTM FOR 40×1000 ITERATIONS.
Method On-chip area Power consumption ClassificationAccuracy Comments
LSTM + ANN layer
(sequential: 4 hidden
units, 1 input, 152
time steps)
257,503.20µm2
(with non-ideal
current sources)
255.8mW
(maximum input values
were scaled down to 0.5)
98.51%
Learns significantly slower than single LSTM layer
with 1 time step. Exhibits increasing accuracy as
epoch size is increased. Gave accuracy of 97.26% for
25 epochs, 98.51% for 40 epochs, and 98.86%
for 55 epochs
Single LSTM layer
(parallel: 1 hidden
unit, 152 inputs, 1
time step)
115,967.4µm2
(with non-ideal
current sources)
312.4mW
(maximum input values
were scaled down to 0.1)
96.09 %
Exploits windowing method: each time step is
considered as a feature. Can give up to 99.29%
of accuracy when epoch size is increased to 100.
Perceptron
(tangent
activation f-on)
2,994.00µm2
(without buffer)
80mW
(without buffer) 90 %
Learns faster than ANN, however cannot
reach accuracy more than 93.7% even with increased
number of iterations
3 layer ANN
(300 neurons
in a hidden layer)
4,839.90µm2 1072.4mW 83% Converges slower than LSTM for the samelearning rate and number of training iterations
DNN
(5 layers,
300-50-100 neurons
in hidden layers)
0.0121mm2 2681.1mW 64% The number of iterations should beincreased to achieve higher accuracy
Modified HTM 0.096 mm
2
(for sequential processing)
1756 mW
(for sequential processing) 50%
Not effective for the small number of
input features and not able to converge
TABLE II
COMPARISON OF SOFTWARE AND HARDWARE RESULTS BY LSTM.
Wafer test
number
Predicted value
(analog) (-mV)
Predicted value
(software) (1e-3) Class
23 -171.5 -423.7 -1
47 523.2 473.0 1
7 -247.4 -507.4 -1
3 470.7 511.6 1
3838 -485.5 -418.3 -1
193 501.84 497.0 1
6157 -247.3 -460.1 -1
411 531.9 489.8 1
1534 -437.2 -456.2 -1
4507 255.6 493.6 1
level outputs in 4 hidden units.
IV. CONCLUSION
In this work, wafer quality inspection and classification
is performed using different neuromorphic memristive archi-
tectures. LSTM outperforms the other architectures demon-
strating the classification accuracy of 96-98%. This can be
explained by its gated structure that is capable of controlling
the flow of information. The same performance accuracy
can be achieved by ANN and DNN, however more learning
time, on-chip area and power consumption is required. A
single perceptron has the smallest on-chip area and power
consumption, however can not be trained to achieve high
performance accuracy even with the large number of training
iterations. Modified HTM for the given task has been found to
be ineffective. Overall, neuromorphic memristive architectures
can speed up the process of wafer quality inspection and can
be integrated directly into sensors for measurements without
sending data for the software processing and manual analysis.
This can also result in reduced cost of wafer inspection.
REFERENCES
[1] H. Yoda, Y. Ohuchi, Y. Taniguchi, and M. Ejiri, “An automatic wafer
inspection system using pipelined image processing techniques,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 10,
no. 1, pp. 4–16, 1988.
[2] C. R. Fairley, T.-Y. Fu, G. Perelman, and B.-m. B. Tsai, “High through-
put brightfield/darkfield wafer inspection system using advanced optical
techniques,” Jan. 16 2007, uS Patent 7,164,475.
[3] M. C. Sun, S. Jansen, R. Mann, and O. D. Patterson, “Semiconductor
integrated test structures for electron beam inspection of semiconductor
wafers,” Mar. 16 2010, uS Patent 7,679,083.
[4] F. Rosenblatt, “The perceptron: a probabilistic model for information
storage and organization in the brain.” Psychological review, vol. 65,
no. 6, p. 386, 1958.
[5] O. Krestinskaya, K. N. Salama, and A. P. James, “Analog backpropaga-
tion learning circuits for memristive crossbar neural networks,” in 2018
IEEE International Symposium on Circuits and Systems (ISCAS), May
2018, pp. 1–5.
[6] K. Smagulova, O. Krestinskaya, and A. P. James, “A memristor-based
long short term memory circuit,” Analog Integrated Circuits and Signal
Processing, vol. 95, no. 3, pp. 467–472, 2018.
[7] M. Cheng, L. Xia, Z. Zhu, Y. Cai, Y. Xie, Y. Wang, and H. Yang,
“Time: A training-in-memory architecture for memristor-based deep
neural networks,” in 2017 54th ACM/EDAC/IEEE Design Automation
Conference (DAC), June 2017, pp. 1–6.
[8] R. Hasan, T. M. Taha, and C. Yakopcic, “On-chip training of memristor
based deep neural networks,” in 2017 International Joint Conference on
Neural Networks (IJCNN), May 2017, pp. 3527–3534.
[9] O. Krestinskaya, T. Ibrayev, and A. P. James, “Hierarchical temporal
memory features with memristor logic circuits for pattern recognition,”
IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems, vol. 37, no. 6, pp. 1143–1156, June 2018.
[10] R. T. Olszewski, “Generalized feature extraction for structural pattern
recognition in time-series data,” CARNEGIE-MELLON UNIV PITTS-
BURGH PA SCHOOL OF COMPUTER SCIENCE, Tech. Rep., 2001.
[11] K. Smagulova, K. Adam, O. Krestinskaya, and A. P. James, “De-
sign of cmos-memristor circuits for lstm architecture,” arXiv preprint
arXiv:1806.02366, 2018.
[12] J. Hawkins and S. Blakeslee, “On intelligence,” New York St. Martins
Griffin, pp. 156–8, 2004.
[13] O. Krestinskaya, K.Salama, and A. P. James, “Learning in memristive
neural network architectures using analog backpropagation circuits,”
IEEE Transactions on Circuits and Systems I: Regular Papers, 2018.
