2,251 research outputs found
NARX-based nonlinear system identification using orthogonal least squares basis hunting
An orthogonal least squares technique for basis hunting (OLS-BH) is proposed to construct sparse radial basis function (RBF) models for NARX-type nonlinear systems. Unlike most of the existing RBF or kernel modelling methods, whichplaces the RBF or kernel centers at the training input data points and use a fixed common variance for all the regressors, the proposed OLS-BH technique tunes the RBF center and diagonal covariance matrix of individual regressor by minimizing the training mean square error. An efficient optimization method isadopted for this basis hunting to select regressors in an orthogonal forward selection procedure. Experimental results obtained using this OLS-BH technique demonstrate that it offers a state-of-the-art method for constructing parsimonious RBF models with excellent generalization performance
Local Regularization Assisted Orthogonal Least Squares Regression
A locally regularized orthogonal least squares (LROLS) algorithm is proposed for constructing parsimonious or sparse regression models that generalize well. By associating each orthogonal weight in the regression model with an individual regularization parameter, the ability for the orthogonal least squares (OLS) model selection to produce a very sparse model with good generalization performance is greatly enhanced. Furthermore, with the assistance of local regularization, when to terminate the model selection procedure becomes much clearer. This LROLS algorithm has computational advantages over the recently introduced relevance vector machine (RVM) method
Improved model identification for non-linear systems using a random subsampling and multifold modelling (RSMM) approach
In non-linear system identification, the available observed data are conventionally partitioned into two parts: the training data that are used for model identification and the test data that are used for model performance testing. This sort of 'hold-out' or 'split-sample' data partitioning method is convenient and the associated model identification procedure is in general easy to implement. The resultant model obtained from such a once-partitioned single training dataset, however, may occasionally lack robustness and generalisation to represent future unseen data, because the performance of the identified model may be highly dependent on how the data partition is made. To overcome the drawback of the hold-out data partitioning method, this study presents a new random subsampling and multifold modelling (RSMM) approach to produce less biased or preferably unbiased models. The basic idea and the associated procedure are as follows. First, generate K training datasets (and also K validation datasets), using a K-fold random subsampling method. Secondly, detect significant model terms and identify a common model structure that fits all the K datasets using a new proposed common model selection approach, called the multiple orthogonal search algorithm. Finally, estimate and refine the model parameters for the identified common-structured model using a multifold parameter estimation method. The proposed method can produce robust models with better generalisation performance
High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables
In this work we address the problem of approximating high-dimensional data
with a low-dimensional representation. We make the following contributions. We
propose an inverse regression method which exchanges the roles of input and
response, such that the low-dimensional variable becomes the regressor, and
which is tractable. We introduce a mixture of locally-linear probabilistic
mapping model that starts with estimating the parameters of inverse regression,
and follows with inferring closed-form solutions for the forward parameters of
the high-dimensional regression problem of interest. Moreover, we introduce a
partially-latent paradigm, such that the vector-valued response variable is
composed of both observed and latent entries, thus being able to deal with data
contaminated by experimental artifacts that cannot be explained with noise
models. The proposed probabilistic formulation could be viewed as a
latent-variable augmentation of regression. We devise expectation-maximization
(EM) procedures based on a data augmentation strategy which facilitates the
maximum-likelihood search over the model parameters. We propose two
augmentation schemes and we describe in detail the associated EM inference
procedures that may well be viewed as generalizations of a number of EM
regression, dimension reduction, and factor analysis algorithms. The proposed
framework is validated with both synthetic and real data. We provide
experimental evidence that our method outperforms several existing regression
techniques
Model Selection with the Loss Rank Principle
A key issue in statistics and machine learning is to automatically select the
"right" model complexity, e.g., the number of neighbors to be averaged over in
k nearest neighbor (kNN) regression or the polynomial degree in regression with
polynomials. We suggest a novel principle - the Loss Rank Principle (LoRP) -
for model selection in regression and classification. It is based on the loss
rank, which counts how many other (fictitious) data would be fitted better.
LoRP selects the model that has minimal loss rank. Unlike most penalized
maximum likelihood variants (AIC, BIC, MDL), LoRP depends only on the
regression functions and the loss function. It works without a stochastic noise
model, and is directly applicable to any non-parametric regressor, like kNN.Comment: 31 LaTeX pages, 1 figur
- …