Search CORE

246,251 research outputs found

The Loss Rank Principle for Model Selection

Author: A. Reusken
D.J.C. MacKay
G. Schwarz
J.J. Rissanen
T. Hastie
Z. Bai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

We introduce a new principle for model selection in regression and classification. Many regression models are controlled by some smoothness or flexibility or complexity parameter c, e.g. the number of neighbors to be averaged over in k nearest neighbor (kNN) regression or the polynomial degree in regression with polynomials. Let f_D^c be the (best) regressor of complexity c on data D. A more flexible regressor can fit more data D' well than a more rigid one. If something (here small loss) is easy to achieve it's typically worth less. We define the loss rank of f_D^c as the number of other (fictitious) data D' that are fitted better by f_D'^c than D is fitted by f_D^c. We suggest selecting the model complexity c that has minimal loss rank (LoRP). Unlike most penalized maximum likelihood variants (AIC,BIC,MDL), LoRP only depends on the regression function and loss function. It works without a stochastic noise model, and is directly applicable to any non-parametric regressor, like kNN. In this paper we formalize, discuss, and motivate LoRP, study it for specific regression problems, in particular linear ones, and compare it to other model selection schemes.Comment: 16 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Australian National University

Model Selection with the Loss Rank Principle

Author: Hutter Marcus
Tran Minh-Ngoc
Publication venue
Publication date: 02/03/2010
Field of study

A key issue in statistics and machine learning is to automatically select the "right" model complexity, e.g., the number of neighbors to be averaged over in k nearest neighbor (kNN) regression or the polynomial degree in regression with polynomials. We suggest a novel principle - the Loss Rank Principle (LoRP) - for model selection in regression and classification. It is based on the loss rank, which counts how many other (fictitious) data would be fitted better. LoRP selects the model that has minimal loss rank. Unlike most penalized maximum likelihood variants (AIC, BIC, MDL), LoRP depends only on the regression functions and the loss function. It works without a stochastic noise model, and is directly applicable to any non-parametric regressor, like kNN.Comment: 31 LaTeX pages, 1 figur

arXiv.org e-Print Archive

The Australian National University

Model Selection by Loss Rank for Classification and Unsupervised Learning

Author: Hutter Marcus
Tran Minh-Ngoc
Publication venue: 'Cornell University Press'
Publication date: 22/11/2020
Field of study

Hutter (2007) recently introduced the loss rank principle (LoRP) as a general purpose principle for model selection. The LoRP enjoys many attractive properties and deserves further investigations. The LoRP has been well-studied for regression framework in Hutter and Tran (2010). In this paper, we study the LoRP for classification framework, and develop it further for model selection problems in unsupervised learning where the main interest is to describe the associations between input measurements, like cluster analysis or graphical modelling. Theoretical properties and simulation studies are presented

The Australian National University

The Loss Rank Criterion for Variable Selection in Linear Regression Analysis

Author: Akaike
Bartlett
Burnham
Candes
Chambaz
Craven
Dudley
Efron
Fan
Hutter
Hutter
Koltchinskii
Leng
Meinshausen
Miller
Poetscher
Schwarz
Shao
Stamey
Tibshirani
Tran
Wang
Yang
Zhao
Zou
Zou
Publication venue: 'Wiley'
Publication date: 01/01/2010
Field of study

Lasso and other regularization procedures are attractive methods for variable selection, subject to a proper choice of shrinkage parameter. Given a set of potential subsets produced by a regularization algorithm, a consistent model selection criterion is proposed to select the best one among this preselected set. The approach leads to a fast and efficient procedure for variable selection, especially in high-dimensional settings. Model selection consistency of the suggested criterion is proven when the number of covariates d is fixed. Simulation studies suggest that the criterion still enjoys model selection consistency when d is much larger than the sample size. The simulations also show that our approach for variable selection works surprisingly well in comparison with existing competitors. The method is also applied to a real data set.Comment: 18 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Crossref

Determining Principal Component Cardinality through the Principle of Minimum Description Length

Author: A Blumer
AJ Donald
AP Dawid
AR Barron
C Eckart
DC Hoyle
IT Jolliffe
J Josse
J Rissanen
J Rissanen
J Rissanen
J Rissanen
JI Myung
M Mitzenmacher
M Zhu
MH Hansen
T Hastie
TM Cover
Y Choi
Publication venue
Publication date: 29/06/2019
Field of study

PCA (Principal Component Analysis) and its variants areubiquitous techniques for matrix dimension reduction and reduced-dimensionlatent-factor extraction. One significant challenge in using PCA, is thechoice of the number of principal components. The information-theoreticMDL (Minimum Description Length) principle gives objective compression-based criteria for model selection, but it is difficult to analytically applyits modern definition - NML (Normalized Maximum Likelihood) - to theproblem of PCA. This work shows a general reduction of NML prob-lems to lower-dimension problems. Applying this reduction, it boundsthe NML of PCA, by terms of the NML of linear regression, which areknown.Comment: LOD 201

arXiv.org e-Print Archive

Crossref

Generalized SURE for Exponential Families: Applications to Regularization

Author: Eldar Yonina C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Stein's unbiased risk estimate (SURE) was proposed by Stein for the independent, identically distributed (iid) Gaussian model in order to derive estimates that dominate least-squares (LS). In recent years, the SURE criterion has been employed in a variety of denoising problems for choosing regularization parameters that minimize an estimate of the mean-squared error (MSE). However, its use has been limited to the iid case which precludes many important applications. In this paper we begin by deriving a SURE counterpart for general, not necessarily iid distributions from the exponential family. This enables extending the SURE design technique to a much broader class of problems. Based on this generalization we suggest a new method for choosing regularization parameters in penalized LS estimators. We then demonstrate its superior performance over the conventional generalized cross validation approach and the discrepancy method in the context of image deblurring and deconvolution. The SURE technique can also be used to design estimates without predefining their structure. However, allowing for too many free parameters impairs the performance of the resulting estimates. To address this inherent tradeoff we propose a regularized SURE objective. Based on this design criterion, we derive a wavelet denoising strategy that is similar in sprit to the standard soft-threshold approach but can lead to improved MSE performance.Comment: to appear in the IEEE Transactions on Signal Processin

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hierarchical adaptive polynomial chaos expansions

Author: Mai Chu V.
Sudret Bruno
Publication venue
Publication date: 01/01/2015
Field of study

Polynomial chaos expansions (PCE) are widely used in the framework of uncertainty quantification. However, when dealing with high dimensional complex problems, challenging issues need to be faced. For instance, high-order polynomials may be required, which leads to a large polynomial basis whereas usually only a few of the basis functions are in fact significant. Taking into account the sparse structure of the model, advanced techniques such as sparse PCE (SPCE), have been recently proposed to alleviate the computational issue. In this paper, we propose a novel approach to SPCE, which allows one to exploit the model's hierarchical structure. The proposed approach is based on the adaptive enrichment of the polynomial basis using the so-called principle of heredity. As a result, one can reduce the computational burden related to a large pre-defined candidate set while obtaining higher accuracy with the same computational budget

arXiv.org e-Print Archive

Crossref