Neural Networks for Defect Recognition on Masks and Integrated Circuits: First Result by Surmann, Hartmut et al.
NEURAL NETS FOR DEFECT RECOGNITION ON MASKS
AND INTEGRATED CIRCUITS : FIRST RESULTS
Hartmut Surmann, Benhür Kiziloglu, Ulrich Rückert & Karl Goser
University of Dortmund - Dortmund, Germany
Abstract: The first results of applying art'rficial neural networks to defect detection on masks
and integrated circuits will be presented. The use of neural nets opens up the possibility of
developing rapid and problem-flexible inspection Systems. In this publication two different
models of neural nets - the Kohonen map and a feedforward-net using the backpropagation
learning algorithm - are discussed with regard to their applicability in pattern recognition for
inspection Systems by means of Simulation results.
Keywords : self-organising feature map, backpropagation, automatic inspection System,
defect recognition.
581
The rapid evolution in microelectronics leads to integrated circuits with
minimal structure sizes in the sub-um-range. Such fine geometrical structures
require complex and error free masks for patterning steps during the pro-
duction process . Mask defects can generate local defects on the chips and
can lead to functional failures or to a reduced l i fe t ime of the devices.
For reaching a satisfactory yield, it is necessary to inspect the photomasks
before using them in the production process. After fabricat ion an inspection
of the completed semiconducting devices is essential for keeping a high
quality Standard. In order to cope with decreasing structure sizes and in-
creasing complexity of patterns, automatic inspection Systems are called for
with advantages like computing speed, confident fai lure detection and economy.
Classical inspection Systems which are used in present fabrication processes
are based on comparing one chip pattern to another or to the design data
base [13. Both methods demand for substantial computing resources in
combination with the problem of a very accurate optical and mechanical
adjustment. Another disadvantage is the missing classification of the defects:
Differences are recognized indeed, but will not be interpreted äs fai lures or
A rather new approach is the use of artificial neural networks for defect
detection on masks. Advantages like high speed (because of the massively
parallel processing), adaptability on special problems, fault tolerance, and
cost reduction are expected. In the following first results of accomplished
research studies which point out the practical usefulness of two neural net-
work models in the area of static image recognition (specific application:
detection of mask faults) are presented and discussed.
1. Detection of mask defects
1.1 Recordlng technlques and occurlng problems
Defect recognition on masks and integrated circuits requires the recording
of mask and circuit data using light-optical or electron-microscopic tech-
niques (electron-beam scanning, X-ray scanning). Al l techniques use a kind
of beam (l ight , electrons) to scan the given picture spot by spot and record
the measured intensity values. The resolution of the scanning System depends
on the wavelength of the used radiation: The highest resolution working with
light-optical techniques is obtained by using hard X-rays. A typical scanning
result is shown in Figure 1.
The picture contains a portion of noise because of external fai lures ( l ike
strong lighting, shadows) and inaccurate recording techniques. These fa i lures
have to be intercepted by the inspection System.
The main problem is to differ between real failures and tolerable deviations.
The reduction of structure sizes in microelectronic devices leads to a necessary
detection of mask and circuit defects in the sub-(xm ränge.
1.2 Occuring failure classes
As shown in Figure 2, the possible types of fai lures can be devided into four
main fai lure classes: corner, edge, bridge, pinhole/-spot. The reliabil i ty and
582
Figure 1: Half-tone picture: 512x512 pixel, resolution 8 bit.
. D ,
No. 2
^
L >=lpn
Failure classesi
Pinhole,
Pinspot Pattern No.l, S, 7
Link-Defect Pattern Ho.6, 8
Edge-Defect Pattern Ho.2, 3
Corner-Defect Pattern Ho.4, 9
Significant data!
nininal Structure Length:
nininal detectable Error-Size:
Resolution:
Size of the Mask surface:
L (l jjn)
D= L/ 5 (200 m)
512 Pixel/ 8,1 nn
S pixel/ l un
4 mz= lB1Bpixel
Figure 2: Failure classes and significant mask data
583
functionality of integrated circuits depends on this fa i lures which can eff'ect
cut-offs , short circuits and reduction or enlargement of circuit structures.
Figure 2 also shows some significant data of the inspected mask: The minimal
detectable failure size is 0.2 um, the minimal structure size is l (im. This
leads to a resolution of S pixel/tim. The resolution of the inspection System
must be more than 4 or 5 times higher than the minimal structure size. By
using present light-optical techniques it is possible to reach a resolution of
100 nm, what corresponds to a minimal structure size of 0.4 [im.
1.3 Conventional techniques
The most common technique deals with comparing of one mask pattern to
another spot by spot. The result is only a list of differences but not a
classification of the failures. This classification has to be done afterwards
by the user himself. The main disadvantage of this technique is the required
high adjusting accuracy. Increasing reduction of structure sizes intensifies
this problem.
Another classical inspection technique is based on the verification of the
design rules. Here, violations will be interpreted äs possible failures. In
this case the main disadvantage is given by the fact that there might be
failures which are not design rule violations; these fai lures will not be
detected. On the other hand, this technique allows a higher adjust ing tolerance.
Both described techniques demand powerful computing resources for accept-
able performance with regard to processing time [1].
1.4 Expected advantages of uslng neural networks
By using neural networks the fol lowing advantages are expected in opposite
to conventional techniques:
- high performance by parallel data processing
- f lexibil i ty of the inspection Systems by learning capabili t ies
- insensibil i ty against disturbances (like noise, e.g. )
- high data reduction and good classification capabilities
Parallel data processing leads to real time pattern recognition which is
neccessary for powerful inspection Systems. In addition to this, neural nets
offer a higher f lexib i l i ty because of their ability to learn and therefore to
detect new patterns. This property is important especially for smaller structure
sizes.
2. Feed-forward Networks and Backpropagation
The most common used neural network architecture is a feed-forward
net (FFN) using the well known backpropagation learning algorithm [2].
Therefore, we tried to use this architecture and learning algorithm for
defect recognition äs well. The general working scheme is summarized in
Figure 3. After digitalizing the original mask data (Figure 1) small 8x8
pixel Windows are used äs input data to our feed-forward net. The net-
work was trained with some selected input patterns and the corresponding
584
desired output patterns. We used different coding schemes for the Output
patterns, one of them is shown in Table 1. For example, this coding
scheme incorporates structual Information (see part b) in Table 1) in the
desired Output pattern.
—————— »
Pictui
[512 x
Digitalizati-
on of the
Picture
e
5121-Pixel
— *
-Input Pat-
tern Gene-
ration
-Pattern
Coding
— Learning
[
Defect
Interpreta-
tion
Figure 3: Working process of the feed-forward net
Failure Class
Pinhole
(Pinspot)
Edge
(Edge )
Link
(Link")
Corner(Corner*)
Coding
(Code 1)
xOOOl
xGGIB
xoioe
xieee
Lokal
Characte-
ristik
m
H
m
H
E
E
LZ][ZU
n^ —
Coding
(Code 2)
00000 00001
00830 03813
00000 00100
00000 01000
ooooo IOOOB
38831 38888
30310 08000
00100 00000
01000 00000
(a) (b)
Table 1: Coding technique: x=t for faulty pattern, x-0 for correct pattern
Before simulations we have to fix a couple of System parameters. First
of all the number of layers and the number of processing elments in each
layer have to be selected. So far we actually use a 3 layer and a 4 layer
configuration and variable number of processing elements per layer.
Then the parameters concerning the backpropagation learning algorithm
have to be determined, especially the learning step width s and the amount
of weight change per learning step a [2]. Last but not least the training set
and the number of t ra ining vectors have to be f ixed.
NetHork
Topology
3-Layer Net
4-Layer Het
Input-
Units
(Lagerl)
S4
64
Hidden-
Units
(LagerZ)
38
30
Hidden-
Units
(Layer3)
-
15
Output-
Units
(Lager4)
13
10
Lear-
ning
Steps
2789
568
End-
Error
B, 6168
0.0456
Trai-
ning
Tine
lOnin
140sec
Table 2: 3-layer net and 4-layer net;
oc=0.7, £=0.8, number of patterns = 5
585
Concerning the number of layers our simulations have shown no remarkable
influence on System behaviour except an increasing number of necessary
training Steps for the 3 layer network in Order to get the same recognition
performance (Table 2). The number of processing elements per layer has in
contrary a perceivable impact on the convergence of the learning phase.
According to that an increasing number of processing elements do not
necessarily lead to a better System performance (Table 3).
NetHOrk Topo-
logy
4-Layer Net
Net-fl
Net-B
Het-C
Input-
Units
(Lager!)
100
121
144
Hidden-
Units
(Lager2)
30
38
3B
Hidden-
Units
(Layer3)
15
15
15
Output-
Uni ts
(Lager4)
15
15
15
Lear-
ning
Steps
261
264
4008
End-
error
8.033
8,843
3.2
Trai-
ning
Tine
2 nin
2 nin
3Bnin
Table 3: Number of patterns = 5, <x=0.7, £=0.8
The learning Parameter E (learning step width) has a substantial inf luence
on the convergence behaviour during learning äs can be seen from Tables 4
and 5. A suitable choice of E leads to a faster convergence. The parameter
a influences the convergence behaviour äs well (Tables 4 and 5). The maximal
number of patterns successfully learned by our used network topologies was
25 and especially independent of the choice of the learning set. Further
Simulation results of the learning behaviour for d i f ferent parameter choices
and network topologies are given in Figures 3-5. All the simulations have
be done on a HP 9000 ( U n i x , C-language, MC 68030, SOMHz, 24 MB RAM).
NetMork
Topology
4-Lager Net
Net-B2
Het-BJ
Net-B4
<X
B. 7
8.7
8.8
E
8.8
8.4
8.4
Nunber
of
Patterns
25
25
25
Lear-
ning
Steps
6888
taee
£888
End-
error
17.539
13.2)7
31.15»
Trai-
ning
Tine
3^  h
3ih
'i"
Table 4: Network topology äs net-Bl in Table 3
Network
Topalogy
4-Lager Net
Net-Dl
Net-P2
Het-P3
Net-D4
Nct-05
O
B. 7
B. 7
8.8
8.7
B. 7
E
8.8
B, 4
B. 4
8.4
B.B
Nunber
of
Patterns
25
25
25
18
1B
Lear-
ning
Steps
6BBB
(aaa
317S
362
1BBB
End-
error
35.B435
25.6B6<
1.3491
B.B477
B. 5128
Trai-
ning
Tine
•V-
3ib
2 h
3 nln
13nin
Table 5: Network topology [121 3O 15 10l
586
35
28
25
ZI
15
11
5
Error
oc-O.7, I-O.8
NiMber of Pitttrn: 5
igg 151 zee
^xNt
«_V\
25B
Leirnlng Steps
Figure 3: Simulation results for the net-A and net-Bl
H
55
45
Error
oc-O.7, «-O.8
Net<lfiO.7, «-O.4
«et-BJ!
\ Met-B2
^»»y&tbx^^
Mct-II
see 18 1588 2BBB 25BB
__________le»rnlng Steps
Figure 4: Simulation results for the net-B2 and net-B3
Net-Dl a-0.7. «»0.8
Het-PJ a-O.8, «-O.4
Nuiiber of Pittern= 25
1B88 28BB 3888 48BB 5BBI
Leirnlng Steps
Figure 5: Simulation results for net-Dl and net-D3
In summary, our results conf i rm the experience of many other research groups
dealing with the backpropagation learning rule. The main problem is a proper
choice of the network topology and the System Parameters for a given appli-
587
cation. For our indendet application (defect recognition on masks and inte-
grated circuits) we have not succeeded in solving this Problem so far. Hence,
our first result are not very promising.
3 The self-organizing feature map
The self-organizing feature map consists of a two-dimensional array of
identical processing units. Each of this processing units stores a single
vector, where each vector component is connected to the corresponding
component of an external input vector. As a learning aigorithm we use the
Kohonen aigorithm [3],[4].
Picture
— »
Pattern
Generation — Learning — •
Pattern
Classifica-
tion -
Defect
Inspection
Figure 8: Working process for the feature map
The working process can also be organized in four Segments äs seen in
Figure 8. In the first Step we fil ter out 16 x 16 pixels with 8 Bit resolution.
As the self-organizing feature map can not assign the learning pattern auto-
matically to the error classes, an expert has to do this work after training.
During the classification process, he choose different parts of the feature
map and assign these parts to the fa i lure classes in Table 6.
Code
B
1
l
3
4
5
Classification
no Error
Pinhale-Error
Pinspot-Error
Edge-Error
Corner-Error
Link-Error
Table 6: Failure classes
One problem of the self-organizing feature map is to find an optimal number
of processing units. The number is correlated with the number and differences
in the learning patterns. If we use not enough processing uni t s , the feature
map could not find some characteristics during the inspection process. If we
have to much processing units then the time grows up exponentialy during
the training and inspection process.
In the same way we have simulated the FFN, we have performed several
s imulat ions with d i f fe ren t numbers of processing uni ts (144, 196, 1024) and
patterns. Figure 10 denotes one good result . It shows for every processing
unit in the feature map the adapted 16x16 pattern mask. It can be seen that
similiar input patterns (in sence of the eukl id ian norm) are mapped to pro-
cessing units in the same region of the map. In Table 7 the learning and
inspection times for our Simulation program can be seen (HP 9000) .
588
NetHorfc
Topologu
Net-KR
Net-KB
Ket-KB
hB, hl,
bB, bl,
nl, nj,
Inputunltflnz
B. S, 8. BZ,
I.B, 1.98,
14, 14,
256
B. 3, B. 92,
B.B, l.BB,
14, 14,
2 SS
B.l, B. BZ,
B. 8, 1.B8,
IS, IS,
ZS6
Nunber
of
Patterns
3«
»S
Z5S
Learning
Steps
38
38
38
Training
T Ine
s V
53 »in
88 nln
Error
Evaluation-
Tine
——
25 nln
——
Table 7: Learning and inspection times
hO, bO: learning step and radius size at the beginning
hl, bl: learning step and radius size at the end
ni, nj: number of processing units in x and y direction [4]
During the inspection process we generate 16 x 16 pixel input patterns from
the orginal mask pictures. The user can choose an offset in x and y direction
in our Simulation program. After the generation the input pattern is classified
by the feature map.
IIIIIIIIIIIIIl l l lllllllllllHHIlll!lill IIi i i i IM; i t tu
• [ • t HU! in IH
ri r -H? in s i! r
:J.-H-H!f!-:i:t l l
, 1 1 1 II l: l: l lI I I I l l t l l
H l l l l 4- 1- l
l l l l i i > t- l
1 1 1 1 i i >!• l- 1
1 l
MI
1
IIIIIIIIIIIIII11 l•IIIIIIUIIIIIllllMIIII:
Illllillilllll l.
4l4h!li!llll l l
MUHHIM-I l
r ii u i i MI
r ii i11 HM
r im i 1 1 1 i
IMI Illll l l l
H H a £ KJl! l l
IIIIUUUHHIE'I! IIII
l UUURPIM l l
mmmiM i t r r i i ii i i ii u IM i i
I IHHHI Illl l l
I I I l l l I I I I I I I I lini irn ii ii ii 4i ii?i,irn ii ii ii n im,tu
;f;i..;Hi.'ii inm ' '
•JiW.!
I I
. i * TtMIJl
u t i B II IIJ
 f-l-l t t II II II1
 '1-H l l l l U: II II
' l l ' l " 1 1 1 I l l l »
, l T*; W.! l l 1 1 1 4 l 4l
. • , UI-IVI M,tJ l l t m t
i n i n u
n in uii u u D
II Ü B E
l III
Figure 10: Weight vectors of the self-organizing feature map after learning.
Each weight vector represents a small 16 x 16 picture.
In our examination we could successful inspect masks automaticaly with the
knowledge of the feature map. Figure 11 shows a mask, which has been in-
spected with an offset x=y=8. If we use smaller offsets (e.g. x = y = 4 ) we detect
more errors, but the inspection time grows up. For a real-t ime inspection
System it is necessary to have special neural network hardware [5], 16].
589
Figure 11: Result of an inspection process. Offset x-y-8. Errors are
marked with a frame.
4. Conclusion
In this paper two neural network models - the self-organizing feature map
and several feed-forward nets - have been analysed with regard to their
applicab i l i ty in defect recognition on microelectronic structures.
The Simulation results concerning the feed-forward nets have shown that the
training procedure will be more complicated with a rising complexity in the
number of patterns and input elements. The important question now is whether
the neural network model is generally unsuitable for the problem of defect
recognition or the selected parameters to be adjusted (number of hidden
elements, coding, . . .) prevent a satisfactory learning characteristic. Our
experiences and the appeared problems are similar to those of other research
groups.
On the other hand, the Simulation of Kohonen's algorithm have led to usefu l
results which are applicable for fai lure diagnosis. The Interpretation of the
Kohonen net have shown good generalization results so that noisy patterns
have been correctly classified. In fur ther examinations, modified learning pro-
cedures with different pattern codings äs well äs automatically assigning the
different parts of the feature map to the error classes wi l l be analysed.
Furthermore, we have to develop a "recognition measure", so that we can say
something about the recognition eff icency or qual i ty. Up to now other in-
spection System (see paragraph 1.3) could only show differences but they
cannot classify these differences. So that neural networks can be regardet a
useful addition to conventional inspection System at least.
Acknowledgements
The authors would like to thank the Siemens AG Munich especially Dr. E.
Wolfgang and Dr. U. Ramacher and the Sietec GmbH & Co. OHG (Siemens-
Systemtechnik, Berlin) for supplying us with mask data and Stefan RUping äs
well äs Thomas Will for their assistance in preparing the manuscript .
590
Literature
[1] Sischka, D., Bisek, R.: "Detection of Defects on the Surface of Micro-
electronic Structures", IEEE Transaction on Electron Devices, Vol. 36,
No. l, January 1989, pp 8-13.
12] McLelland, J., Rummelhart , D.: "Explorations in parallel distributed pro-
cessing." MIT Press, London 1988.
13] Kohonen, T.: "Self-Organisation and Associative Memory", Springer Verlag,
1984.
14] Ritter, H., Marinetz, T., Schulten, K.: "Neuroinformatik selbstorganisieren-
der Abbildungen", Addison-Wesley-Verlag, 1989.
15] Tryba, V., Speckmann, H, S., Goser, K.: "A Digital Hardware-Implemen-
tation of a Selforganizing Feature Map äs a Neural coprocessor to a
Von - Neumann Computer", 2st Int. Workshop on Microelectronics for
Neural Networks June 1990, pp 177 - 186.
[6] Ramacher, U., RUckert, U.: "VLSI Design of Neural Networks" Kluwer
Academic, 1991.
591
-tas
