We investigate the estimation of fault probabilities and yield for VLSI implementations of neural computational models. Our analysis is limited to structures that can be mapped directly onto silicon as truly distributed p arallel processing systems. Our work improves on the framework suggested r ecently by Feltham and Maly 3 and is also applicable to analog or mixed analog digital VLSI systems.
Introduction
Much of the recent excitement about hardware implementations of neural network models stems from the belief 6 that, when these models are mapped onto silicon as truly parallel distributed processing systems, they will possess an inherent defect and fault tolerance. In this correspondance, we revisit the work of Feltham and Maly 3 on physically realistic fault models for neural networks. We attempt to improve upon and extend their results to the estimation of yield for VLSI neural systems.
In section 2 we provide a general algorithm for computing system-level fault probabilities. Using these fault probabilities as parameters, we estimate chip yield, following closely a model introduced by Stapper 8 . As an example, we apply these models to a VLSI realization of an 88-neuron Hop eld network 3 . For a more complete discussion of this work, please see 4 .
Computing fault probabilities
A general algorithm for computing fault probabilities can be constructed as follows. Suppose there are N di erent components which together make up a circuit. For example, for the n-input m-output Hop eld network of Fig. 1 we identify three components: 1 the neuron drive circuitry, 2 the connection matrix circuitry, and 3 the comparator circuitry.
One element, or cell, from each of these N components is analyzed separately for the probability of a fault given a defect density distribution for each defect type. When one performs fault analysis, it is generally necessary to include the e ects of neighboring cells for those defects which occur near the border of two adjacent cells.
Each fault should be identi ed by its extent. For example, one fault may be local, causing a single cell to malfunction, while another may be global. Suppose we identify a total of M di erent fault types. Then we de ne pjjd i as the probability of a fault of type j given that a defect occurs in component i. Let a i represent the fraction of the system made up of cells from component i. If we assume that each defect occurs randomly on the chip, then a i equals the probability that a defect occurs in component i given a defect occurs somewhere on the chip. This assumption ignores the e ects of defect clustering within the chip.
The system-level probabilities for each fault type j can now be computed. Let pjjd c be the probability that an individual defect occurring somewhere on the chip causes a fault of type j, where
If we assume that each defect within the chip occurs independently of other defects, then pjjd c times the mean number of defects on the chip represents the average number of faults of type j. The assumption of independence of defects is an approximation.
In a real manufacturing process, defects from one process step may in uence defects in subsequent process steps. However, we do not attempt to treat interactions between defects of di erent l a yers in this work.
Given that a particular defect occurs in one of the components on the chip, let pfjd c be the probability of a fault occurring on the chip, where
Equation 2 assumes that one defect is associated with precisely one fault. In reality, a large defect on a metallization layer may cause both a global and a local fault, e.g., an open power supply line and an open signal line. One can handle this case in one of two ways. The rst method is to assign the fault to the class of the greatest extent, which in this example is global. The second method is to de ne a composite fault class, e.g., a fault class for the occurrence of both a local and a global fault.
Additionally, given this same defect, one may w ant to compute pjjf, the probability that a fault of type j occurs given that a fault of any t ype occurs on the chip. In this case a defect is implied by conditioning on the occurrence of a fault and we h a ve pjjf = pjjd c pfjd c 3
If we assume that each defect occurs independently within the chip, then pjjf represents the frequency of fault type j in relation to other fault types.
Estimating yield
Following a procedure similar to that outlined by Stapper 8, 7 , rst we relate the mean number of defects per chip to the mean number of faults per chip. Let d k represent the mean number of defects of type k per square centimeter. Suppose there are a total of K defect types. Then using A c as the area of the chip in square centimeters, we can estimate d c , the mean number of defects per chip, as follows:
Equation 4 assumes that defects of di erent t ypes are independent.
Next we estimate j, the mean number of faults of type j, b y m ultiplying the mean number of defects per chip by the probability of a fault of type j given that a defect lands on the chip, i.e., j = pjjd c d c 5 Equation 5 assumes one defect is associated with precisely one fault. This assumption rules out the occurrence of two or more defects which, when examined separately, result in no fault, but, when taken in combination, result in a behavioral fault. It also rules out the occurrence of two defects which e ectively cancel each other. Finally, 5 assumes that a single defect cannot cause more than one fault.
The second parameter that we need to obtain is , the degree of fault clustering that occurs between chips on the wafer. We m a y nd that is not the same for each fault type, in which case we m ust de ne j as the amount o f b e t ween-chip fault clustering for each fault type j.
More generally the clustering parameter also depends on the area of the chip 7 . However, with the inclusion of a third parameter, Stapper found the assumption of constant for di ering chip areas to be adequate for estimating yield as a function of chip area for an area range of 1 to 36 7 . The third parameter Stapper employed is Y 0 , gross yield. Gross yield corresponds to the percentage of failures that can be explained either by non-random disturbances in the manufacturing process or by v ery severe defect clustering.
Assuming negative binomial fault statistics, we can apply the yield formula 8 
Example
In the paper of Feltham and Maly 3 individual components in the layout of an 88-neuron Hop eld network circuit are analyzed using the VLASIC catastrophic fault simulator 9 . They assume a 1=x 3 defect density distribution. Relative defect frequencies for di erent defect types are given in Table 1 . Absolute defect densities per square centimeter are proportional to relative defect densities. Examining realistic defect densities given by Walker 9, T able 8-2 , a reasonable choice for the constant of proportionality i s 0 :1.
The circuit faults for each component w ere classi ed into one of three classes 3 depending on their extent: fatal faults, stuck faults, and intermittent faults. Fatal faults are those faults which render the chip completely nonfunctional, such as a short between V d d and ground. Stuck faults are failures which permanently alter the output behavior of a component of the network in a predictable way. One example is a short between a data line and ground. Intermittent faults result in errors in the component output in a manner that depends on the input signals, e.g., a short between two adjacent data lines. Table 2 summarizes the number of occurrences of each fault type for the three components in the Hop eld network shown in Fig. 1 . One hundred thousand defects were randomly placed on one cell of the neuron drive circuitry. Subsequently, all faults were analyzed and categorized into one of three fault classes 3 . For the connection matrix and comparator circuitry, one million defects were randomly introduced to the layouts. However, only the most probable faults were analyzed and put into one of three fault categories. Thus, the total number of faults occurring in each of the three fault groups could be substantially higher if all faults had been analyzed.
A fourth component of the circuitry, the interlayer signal busses, escaped the VLASIC fault simulation experiment o f F eltham and Maly. They provide relative fault frequencies 1 stuck and 99 intermittent, but we h a ve n o w ay of determining the absolute probability of a fault given that a defect occurs in the interlayer signal busses. Therefore, we will omit this portion of the circuitry in our calculation.
From Table 2 , we come up with estimates for pjjd i , the probability of a fault of type j given that a defect occurs in component i, b y dividing the number of defects that caused electrical fault type j by the total number of defects placed in component i. Table 3 presents the estimated fault probabilities for each of the three components of the 88-neuron Hop eld network chip, as well as system-level fault probabilities. To compute the system-level fault probabilities pjjd c , the probability of fault type j given that a defect occurs somewhere on the chip, we add up these probabilities for each component in the system weighted by the percentage of the chip area occupied by that component. Thus, to compute the system-level probability of a stuck fault given that a defect occurs randomly on the chip, we add up the probabilities of a stuck fault for each component weighted by the chip area of each component as follows: pstuckjd c = :00599:025+ :007808:795+ :001307:054 = :00643 7
Relative system-level fault probabilities pjjf, the probability of a fault type j occurring given that a fault occurs somewhere on the chip, are obtained by normalizing the system-level fault probabilities to sum to one. Table 3 does not agree with a similar treatment of this circuit done by F eltham and Maly 3 . It appears that they combine relative fault probabilities for each component weighted by the percentage of the chip occupied by that component to compute relative system-level fault probabilities. However, it is necessary to combine absolute fault probabilities of each component, as described previously, and then normalize the result to obtain relative system-level fault probabilities. Only in the case that layout styles and 3 . The average number of defects landing on a chip of this size can be found by summing the relative n umber of defects of each t ype in Table 1 , multiplying by the constant of proportionality, assumed earlier to be 0.1, to estimate the average number of defects per cm In one yield study, Stapper found a good t to actual defect data by letting gross yield, Y 0 , be .8339 and clustering parameter, , equal 1.1772 7 . We use this value of for each of the three fault types considered. Table 4 gives the values for the estimated fault-limited yield for the 88-neuron Hop eld network chip. The format of this table illustrates the incremental e ect of each failure mechanism on yield, beginning with gross failures and ending with intermittent faults.
Discussion and conclusion
We w ould like to point out several interesting features about the example of section 4. Firstly, w e see that, according to the model, approximately 23 of the chips manufactured with this design would not function at all. If this concerned the designer, he or she could modify the layout so as to minimize that part of the 23 failure rate which is under his or her control, i.e., the rate of failure due to fatal aws, which is roughly 8. Secondly, we see that another 23 of the chips manufactured would function but with a reduced performance. The nature of this performance degradation depends greatly on the circuit and the speci c fault. In the case of a Hop eld network, the overall e ect of stuck and intermittent failures may be a higher classi cation error rate and or reduced storage capacity 6 . For other VLSI neural circuits, such as the silicon retina 2, 5 and the focal plane processor 1 , the e ect of these faults may be dark spots or lines in the former and reduced precision in the output of the latter.
In summary, w e h a ve presented a mathematical framework for estimating fault probabilities and yield for VLSI neural networks. The sample yield calculation of section 4 applies to a speci c layout of an 88-neuron Hop eld network manufactured in a particular CMOS process with hypothetical defect densities. We are by no means attempting to assert that this result applies to a other layouts of the same circuit, b other layouts of other neural circuits, or c this layout subject to a di erent manufacturing environment. Rather, we only wish to illustrate how the methods of computing fault probabilities and predicting yield can be used to assess a design's tolerance of manufacturing defects.
