908,272 research outputs found

    Validation of nonlinear PCA

    Full text link
    Linear principal component analysis (PCA) can be extended to a nonlinear PCA by using artificial neural networks. But the benefit of curved components requires a careful control of the model complexity. Moreover, standard techniques for model selection, including cross-validation and more generally the use of an independent test set, fail when applied to nonlinear PCA because of its inherent unsupervised characteristics. This paper presents a new approach for validating the complexity of nonlinear PCA models by using the error in missing data estimation as a criterion for model selection. It is motivated by the idea that only the model of optimal complexity is able to predict missing values with the highest accuracy. While standard test set validation usually favours over-fitted nonlinear PCA models, the proposed model validation approach correctly selects the optimal model complexity.Comment: 12 pages, 5 figure

    Semiparametric penalty function method in partially linear model selection

    Get PDF
    Model selection in nonparametric and semiparametric regression is of both theoretical and practical interest. Gao and Tong (2004) proposed a semiparametric leave–more–out cross–validation selection procedure for the choice of both the parametric and nonparametric regressors in a nonlinear time series regression model. As recognized by the authors, the implementation of the proposed procedure requires the availability of relatively large sample sizes. In order to address the model selection problem with small or medium sample sizes, we propose a model selection procedure for practical use. By extending the so–called penalty function method proposed in Zheng and Loh (1995, 1997) through the incorporation of features of the leave-one-out cross-validation approach, we develop a semiparametric, consistent selection procedure suitable for the choice of optimum subsets in a partially linear model. The newly proposed method is implemented using the full set of data, and simulations show that it works well for both small and medium sample sizes.Linear model; model selection; nonparametric method; partially linear model; semiparametric method

    Variance Estimates and Model Selection

    Get PDF
    The large majority of the criteria for model selection are functions of the usual variance estimate for a regression model. The validity of the usual variance estimate depends on some assumptions, most critically the validity of the model being estimated. This is often violated in model selection contexts, where model search takes place over invalid models. A cross validated variance estimate is more robust to specification errors (see, for example, Efron, 1983). We consider the effects of replacing the usual variance estimate by a cross validated variance estimate, namely, the Prediction Sum of Squares (PRESS) in the functions of several model selection criteria. Such replacements improve the probability of finding the true model, at least in large samples.Autoregressive Process, Lag Order Determination, Model Selection Criteria, Cross Validation

    Model structure selection using an integrated forward orthogonal search algorithm assisted by squared correlation and mutual information

    No full text
    Model structure selection plays a key role in non-linear system identification. The first step in non-linear system identification is to determine which model terms should be included in the model. Once significant model terms have been determined, a model selection criterion can then be applied to select a suitable model subset. The well known Orthogonal Least Squares (OLS) type algorithms are one of the most efficient and commonly used techniques for model structure selection. However, it has been observed that the OLS type algorithms may occasionally select incorrect model terms or yield a redundant model subset in the presence of particular noise structures or input signals. A very efficient Integrated Forward Orthogonal Search (IFOS) algorithm, which is assisted by the squared correlation and mutual information, and which incorporates a Generalised Cross-Validation (GCV) criterion and hypothesis tests, is introduced to overcome these limitations in model structure selection
    corecore