28 research outputs found

    Estimation of error rate for linear discriminant functions by resampling: Non-Gaussian populations

    Get PDF
    AbstractThis article presents simulation results comparing various resampling estimators of classification error rate for linear discriminant type classification algorithms. Three non-Gaussian multivariate populations are studied namely, exponential, Cauchy and uniform. Simulations are conducted for small sample sizes, two-class and three-class problems and 2-D, 3-D and 5-D distributions. Estimation procedures and sample sizes are the same as in our previous study of Gaussian populations; again 200 bootstrap replications are used for each simulation trial. For exponential and uniform distributions the 0.632 estimator generally performs best. However, for Cauchy distributions the convex bootstrap and the e0 often outperform the 0.632 estimator

    Sampling theory methodology applicable to data validation studies

    No full text
    In data validation studies, surveys are conducted to obtain information about the data collection process and the uses of the data. In many cases standard sampling techniques can be used. Two methods, stratified random sampling and cluster sampling, were used for surveys in the Form 4 data validation study. Form 4 is a data collection system on monthly generation and consumption of fuels by electric power plants. A description of those applications is given. Sometimes time and cost constraints make more sophisticated controlled sampling approaches necessary. One such approach using balanced incomplete block designs is described; an appendix surveys the existence results for these designs. Sequential methods which may prove to be more cost effective are discussed, as are sequential approaches to the problem of determining the size of a population. Problems requiring further research are also discussed. Some preliminary results on the problem of stratification with respect to more than one variable are included. The results were obtained for the Form 4 respondent population. The Form 4 study indicated that standard statistical sampling methods could be useful in data validation surveys. For example, at least 30 percent of the respondents do not report net generation as the instructions define it, and only 25 percent of the state regulatory agencies use the Form 4 data. Such inferences were possible only because statistical sampling procedures were used. 3 tables

    Influence function and its application to data validation

    No full text
    Hampel's influence function has been used by Devlin, Gnanadesikan, and Kettenring to detect bivariate observations which have unusual influence on estimates of correlation. In the validation of energy data systems such observations may sometimes be considered to be outliers. The identification of such outliers may be valuable for the detection of errors in a data base. When data are used in regression equations, those observations which have the greatest effect on the multiple correlation coefficient or the regression coefficients are of interest. The contours of constant influence are derived for the multiple correlation coefficient in the case of regressing two variables on a third. In some problems the analytic form of the influence function may be difficult to derive. In such cases the empiric estimator of the influence function, as proposed by Mallows, may be useful for detecting outliers. For FPC form 4 power plant data, the correlation between generation and consumption is a parameter of interest to users of the data. Estimates of the contours of constant influence were determined and used to detect outliers with respect to bivariate correlation

    Conservative Reliability Estimates in Design Optimization Using Multiple Tail Median

    No full text
    corecore