2,251 research outputs found

    NARX-based nonlinear system identification using orthogonal least squares basis hunting

    No full text
    An orthogonal least squares technique for basis hunting (OLS-BH) is proposed to construct sparse radial basis function (RBF) models for NARX-type nonlinear systems. Unlike most of the existing RBF or kernel modelling methods, whichplaces the RBF or kernel centers at the training input data points and use a fixed common variance for all the regressors, the proposed OLS-BH technique tunes the RBF center and diagonal covariance matrix of individual regressor by minimizing the training mean square error. An efficient optimization method isadopted for this basis hunting to select regressors in an orthogonal forward selection procedure. Experimental results obtained using this OLS-BH technique demonstrate that it offers a state-of-the-art method for constructing parsimonious RBF models with excellent generalization performance

    Local Regularization Assisted Orthogonal Least Squares Regression

    No full text
    A locally regularized orthogonal least squares (LROLS) algorithm is proposed for constructing parsimonious or sparse regression models that generalize well. By associating each orthogonal weight in the regression model with an individual regularization parameter, the ability for the orthogonal least squares (OLS) model selection to produce a very sparse model with good generalization performance is greatly enhanced. Furthermore, with the assistance of local regularization, when to terminate the model selection procedure becomes much clearer. This LROLS algorithm has computational advantages over the recently introduced relevance vector machine (RVM) method

    Improved model identification for non-linear systems using a random subsampling and multifold modelling (RSMM) approach

    Get PDF
    In non-linear system identification, the available observed data are conventionally partitioned into two parts: the training data that are used for model identification and the test data that are used for model performance testing. This sort of 'hold-out' or 'split-sample' data partitioning method is convenient and the associated model identification procedure is in general easy to implement. The resultant model obtained from such a once-partitioned single training dataset, however, may occasionally lack robustness and generalisation to represent future unseen data, because the performance of the identified model may be highly dependent on how the data partition is made. To overcome the drawback of the hold-out data partitioning method, this study presents a new random subsampling and multifold modelling (RSMM) approach to produce less biased or preferably unbiased models. The basic idea and the associated procedure are as follows. First, generate K training datasets (and also K validation datasets), using a K-fold random subsampling method. Secondly, detect significant model terms and identify a common model structure that fits all the K datasets using a new proposed common model selection approach, called the multiple orthogonal search algorithm. Finally, estimate and refine the model parameters for the identified common-structured model using a multifold parameter estimation method. The proposed method can produce robust models with better generalisation performance

    High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables

    Get PDF
    In this work we address the problem of approximating high-dimensional data with a low-dimensional representation. We make the following contributions. We propose an inverse regression method which exchanges the roles of input and response, such that the low-dimensional variable becomes the regressor, and which is tractable. We introduce a mixture of locally-linear probabilistic mapping model that starts with estimating the parameters of inverse regression, and follows with inferring closed-form solutions for the forward parameters of the high-dimensional regression problem of interest. Moreover, we introduce a partially-latent paradigm, such that the vector-valued response variable is composed of both observed and latent entries, thus being able to deal with data contaminated by experimental artifacts that cannot be explained with noise models. The proposed probabilistic formulation could be viewed as a latent-variable augmentation of regression. We devise expectation-maximization (EM) procedures based on a data augmentation strategy which facilitates the maximum-likelihood search over the model parameters. We propose two augmentation schemes and we describe in detail the associated EM inference procedures that may well be viewed as generalizations of a number of EM regression, dimension reduction, and factor analysis algorithms. The proposed framework is validated with both synthetic and real data. We provide experimental evidence that our method outperforms several existing regression techniques

    Model Selection with the Loss Rank Principle

    Full text link
    A key issue in statistics and machine learning is to automatically select the "right" model complexity, e.g., the number of neighbors to be averaged over in k nearest neighbor (kNN) regression or the polynomial degree in regression with polynomials. We suggest a novel principle - the Loss Rank Principle (LoRP) - for model selection in regression and classification. It is based on the loss rank, which counts how many other (fictitious) data would be fitted better. LoRP selects the model that has minimal loss rank. Unlike most penalized maximum likelihood variants (AIC, BIC, MDL), LoRP depends only on the regression functions and the loss function. It works without a stochastic noise model, and is directly applicable to any non-parametric regressor, like kNN.Comment: 31 LaTeX pages, 1 figur
    corecore