139,914 research outputs found
Kernel-based Information Criterion
This paper introduces Kernel-based Information Criterion (KIC) for model
selection in regression analysis. The novel kernel-based complexity measure in
KIC efficiently computes the interdependency between parameters of the model
using a variable-wise variance and yields selection of better, more robust
regressors. Experimental results show superior performance on both simulated
and real data sets compared to Leave-One-Out Cross-Validation (LOOCV),
kernel-based Information Complexity (ICOMP), and maximum log of marginal
likelihood in Gaussian Process Regression (GPR).Comment: We modified the reference 17, and the subcaptions of Figure
Efficient robust nonparametric estimation in a semimartingale regression model
The paper considers the problem of robust estimating a periodic function in a
continuous time regression model with dependent disturbances given by a general
square integrable semimartingale with unknown distribution. An example of such
a noise is non-gaussian Ornstein-Uhlenbeck process with the L\'evy process
subordinator, which is used to model the financial Black-Scholes type markets
with jumps. An adaptive model selection procedure, based on the weighted least
square estimates, is proposed. Under general moment conditions on the noise
distribution, sharp non-asymptotic oracle inequalities for the robust risks
have been derived and the robust efficiency of the model selection procedure
has been shown
Scalable Hierarchical Gaussian Process Models for Regression and Pattern Classification
Gaussian processes, which are distributions over functions, are powerful nonparametric tools for the two major machine learning tasks: regression and classification. Both tasks are concerned with learning input-output mappings from example input-output pairs. In Gaussian process (GP) regression and classification, such mappings are modeled by Gaussian processes. In GP regression, the likelihood is Gaussian for continuous outputs, and hence closed-form solutions for prediction and model selection can be obtained. In GP classification, the likelihood is non-Gaussian for discrete/categorical outputs, and hence closed-form solutions are not available, and approximate inference methods must be resorted
Over-Fitting in Model Selection with Gaussian Process Regression
Model selection in Gaussian Process Regression (GPR) seeks to determine the optimal values of the hyper-parameters governing the covariance function, which allows flexible customization of the GP to the problem at hand. An oft-overlooked issue that is often encountered in the model process is over-fitting the model selection criterion, typically the marginal likelihood. The over-fitting in machine learning refers to the fitting of random noise present in the model selection criterion in addition to features improving the generalisation performance of the statistical model. In this paper, we construct several Gaussian process regression models for a range of high-dimensional datasets from the UCI machine learning repository. Afterwards, we compare both MSE on the test dataset and the negative log marginal likelihood (nlZ), used as the model selection criteria, to find whether the problem of overfitting in model selection also affects GPR. We found that the squared exponential covariance function with Automatic Relevance Determination (SEard) is better than other kernels including squared exponential covariance function with isotropic distance measure (SEiso) according to the nLZ, but it is clearly not the best according to MSE on the test data, and this is an indication of over-fitting problem in model selection
Evolutionary Inference for Function-valued Traits: Gaussian Process Regression on Phylogenies
Biological data objects often have both of the following features: (i) they
are functions rather than single numbers or vectors, and (ii) they are
correlated due to phylogenetic relationships. In this paper we give a flexible
statistical model for such data, by combining assumptions from phylogenetics
with Gaussian processes. We describe its use as a nonparametric Bayesian prior
distribution, both for prediction (placing posterior distributions on ancestral
functions) and model selection (comparing rates of evolution across a
phylogeny, or identifying the most likely phylogenies consistent with the
observed data). Our work is integrative, extending the popular phylogenetic
Brownian Motion and Ornstein-Uhlenbeck models to functional data and Bayesian
inference, and extending Gaussian Process regression to phylogenies. We provide
a brief illustration of the application of our method.Comment: 7 pages, 1 figur
- …