4 research outputs found
Data Engineering for the Analysis of Semiconductor Manufacturing Data
We have analyzed manufacturing data from several different semiconductor
manufacturing plants, using decision tree induction software called
Q-YIELD. The software generates rules for predicting when a given product
should be rejected. The rules are intended to help the process engineers
improve the yield of the product, by helping them to discover the causes
of rejection. Experience with Q-YIELD has taught us the importance of
data engineering -- preprocessing the data to enable or facilitate
decision tree induction. This paper discusses some of the data engineering
problems we have encountered with semiconductor manufacturing data.
The paper deals with two broad classes of problems: engineering the features
in a feature vector representation and engineering the definition of the
target concept (the classes). Manufacturing process data present special
problems for feature engineering, since the data have multiple levels of
granularity (detail, resolution). Engineering the target concept is important,
due to our focus on understanding the past, as opposed to the more common
focus in machine learning on predicting the future
Technical note: Bias and the quantification of stability
Research on bias in machine learning algorithms has generally been concerned with the
impact of bias on predictive accuracy. We believe that there are other factors that should
also play a role in the evaluation of bias. One such factor is the stability of the algorithm;
in other words, the repeatability of the results. If we obtain two sets of data from the same
phenomenon, with the same underlying probability distribution, then we would like our
learning algorithm to induce approximately the same concepts from both sets of data. This
paper introduces a method for quantifying stability, based on a measure of the agreement
between concepts. We also discuss the relationships among stability, predictive accuracy,
and bias