246,251 research outputs found
The Loss Rank Principle for Model Selection
We introduce a new principle for model selection in regression and
classification. Many regression models are controlled by some smoothness or
flexibility or complexity parameter c, e.g. the number of neighbors to be
averaged over in k nearest neighbor (kNN) regression or the polynomial degree
in regression with polynomials. Let f_D^c be the (best) regressor of complexity
c on data D. A more flexible regressor can fit more data D' well than a more
rigid one. If something (here small loss) is easy to achieve it's typically
worth less. We define the loss rank of f_D^c as the number of other
(fictitious) data D' that are fitted better by f_D'^c than D is fitted by
f_D^c. We suggest selecting the model complexity c that has minimal loss rank
(LoRP). Unlike most penalized maximum likelihood variants (AIC,BIC,MDL), LoRP
only depends on the regression function and loss function. It works without a
stochastic noise model, and is directly applicable to any non-parametric
regressor, like kNN. In this paper we formalize, discuss, and motivate LoRP,
study it for specific regression problems, in particular linear ones, and
compare it to other model selection schemes.Comment: 16 page
Model Selection with the Loss Rank Principle
A key issue in statistics and machine learning is to automatically select the
"right" model complexity, e.g., the number of neighbors to be averaged over in
k nearest neighbor (kNN) regression or the polynomial degree in regression with
polynomials. We suggest a novel principle - the Loss Rank Principle (LoRP) -
for model selection in regression and classification. It is based on the loss
rank, which counts how many other (fictitious) data would be fitted better.
LoRP selects the model that has minimal loss rank. Unlike most penalized
maximum likelihood variants (AIC, BIC, MDL), LoRP depends only on the
regression functions and the loss function. It works without a stochastic noise
model, and is directly applicable to any non-parametric regressor, like kNN.Comment: 31 LaTeX pages, 1 figur
Model Selection by Loss Rank for Classification and Unsupervised Learning
Hutter (2007) recently introduced the loss rank principle (LoRP) as a general purpose principle for model selection. The LoRP enjoys many attractive properties and deserves further investigations. The LoRP has been well-studied for regression framework in Hutter and Tran (2010). In this paper, we study the LoRP for classification framework, and develop it further for model selection problems in unsupervised learning where the main interest is to describe the associations between input measurements, like cluster analysis or graphical modelling. Theoretical properties and simulation studies are presented
The Loss Rank Criterion for Variable Selection in Linear Regression Analysis
Lasso and other regularization procedures are attractive methods for variable
selection, subject to a proper choice of shrinkage parameter. Given a set of
potential subsets produced by a regularization algorithm, a consistent model
selection criterion is proposed to select the best one among this preselected
set. The approach leads to a fast and efficient procedure for variable
selection, especially in high-dimensional settings. Model selection consistency
of the suggested criterion is proven when the number of covariates d is fixed.
Simulation studies suggest that the criterion still enjoys model selection
consistency when d is much larger than the sample size. The simulations also
show that our approach for variable selection works surprisingly well in
comparison with existing competitors. The method is also applied to a real data
set.Comment: 18 pages, 1 figur
Determining Principal Component Cardinality through the Principle of Minimum Description Length
PCA (Principal Component Analysis) and its variants areubiquitous techniques
for matrix dimension reduction and reduced-dimensionlatent-factor extraction.
One significant challenge in using PCA, is thechoice of the number of principal
components. The information-theoreticMDL (Minimum Description Length) principle
gives objective compression-based criteria for model selection, but it is
difficult to analytically applyits modern definition - NML (Normalized Maximum
Likelihood) - to theproblem of PCA. This work shows a general reduction of NML
prob-lems to lower-dimension problems. Applying this reduction, it boundsthe
NML of PCA, by terms of the NML of linear regression, which areknown.Comment: LOD 201
Generalized SURE for Exponential Families: Applications to Regularization
Stein's unbiased risk estimate (SURE) was proposed by Stein for the
independent, identically distributed (iid) Gaussian model in order to derive
estimates that dominate least-squares (LS). In recent years, the SURE criterion
has been employed in a variety of denoising problems for choosing
regularization parameters that minimize an estimate of the mean-squared error
(MSE). However, its use has been limited to the iid case which precludes many
important applications. In this paper we begin by deriving a SURE counterpart
for general, not necessarily iid distributions from the exponential family.
This enables extending the SURE design technique to a much broader class of
problems. Based on this generalization we suggest a new method for choosing
regularization parameters in penalized LS estimators. We then demonstrate its
superior performance over the conventional generalized cross validation
approach and the discrepancy method in the context of image deblurring and
deconvolution. The SURE technique can also be used to design estimates without
predefining their structure. However, allowing for too many free parameters
impairs the performance of the resulting estimates. To address this inherent
tradeoff we propose a regularized SURE objective. Based on this design
criterion, we derive a wavelet denoising strategy that is similar in sprit to
the standard soft-threshold approach but can lead to improved MSE performance.Comment: to appear in the IEEE Transactions on Signal Processin
Hierarchical adaptive polynomial chaos expansions
Polynomial chaos expansions (PCE) are widely used in the framework of
uncertainty quantification. However, when dealing with high dimensional complex
problems, challenging issues need to be faced. For instance, high-order
polynomials may be required, which leads to a large polynomial basis whereas
usually only a few of the basis functions are in fact significant. Taking into
account the sparse structure of the model, advanced techniques such as sparse
PCE (SPCE), have been recently proposed to alleviate the computational issue.
In this paper, we propose a novel approach to SPCE, which allows one to exploit
the model's hierarchical structure. The proposed approach is based on the
adaptive enrichment of the polynomial basis using the so-called principle of
heredity. As a result, one can reduce the computational burden related to a
large pre-defined candidate set while obtaining higher accuracy with the same
computational budget
- …