15,102 research outputs found
Model Selection with the Loss Rank Principle
A key issue in statistics and machine learning is to automatically select the
"right" model complexity, e.g., the number of neighbors to be averaged over in
k nearest neighbor (kNN) regression or the polynomial degree in regression with
polynomials. We suggest a novel principle - the Loss Rank Principle (LoRP) -
for model selection in regression and classification. It is based on the loss
rank, which counts how many other (fictitious) data would be fitted better.
LoRP selects the model that has minimal loss rank. Unlike most penalized
maximum likelihood variants (AIC, BIC, MDL), LoRP depends only on the
regression functions and the loss function. It works without a stochastic noise
model, and is directly applicable to any non-parametric regressor, like kNN.Comment: 31 LaTeX pages, 1 figur
The Loss Rank Principle for Model Selection
We introduce a new principle for model selection in regression and
classification. Many regression models are controlled by some smoothness or
flexibility or complexity parameter c, e.g. the number of neighbors to be
averaged over in k nearest neighbor (kNN) regression or the polynomial degree
in regression with polynomials. Let f_D^c be the (best) regressor of complexity
c on data D. A more flexible regressor can fit more data D' well than a more
rigid one. If something (here small loss) is easy to achieve it's typically
worth less. We define the loss rank of f_D^c as the number of other
(fictitious) data D' that are fitted better by f_D'^c than D is fitted by
f_D^c. We suggest selecting the model complexity c that has minimal loss rank
(LoRP). Unlike most penalized maximum likelihood variants (AIC,BIC,MDL), LoRP
only depends on the regression function and loss function. It works without a
stochastic noise model, and is directly applicable to any non-parametric
regressor, like kNN. In this paper we formalize, discuss, and motivate LoRP,
study it for specific regression problems, in particular linear ones, and
compare it to other model selection schemes.Comment: 16 page
Kernelized Hashcode Representations for Relation Extraction
Kernel methods have produced state-of-the-art results for a number of NLP
tasks such as relation extraction, but suffer from poor scalability due to the
high cost of computing kernel similarities between natural language structures.
A recently proposed technique, kernelized locality-sensitive hashing (KLSH),
can significantly reduce the computational cost, but is only applicable to
classifiers operating on kNN graphs. Here we propose to use random subspaces of
KLSH codes for efficiently constructing an explicit representation of NLP
structures suitable for general classification methods. Further, we propose an
approach for optimizing the KLSH model for classification problems by
maximizing an approximation of mutual information between the KLSH codes
(feature vectors) and the class labels. We evaluate the proposed approach on
biomedical relation extraction datasets, and observe significant and robust
improvements in accuracy w.r.t. state-of-the-art classifiers, along with
drastic (orders-of-magnitude) speedup compared to conventional kernel methods.Comment: To appear in the proceedings of conference, AAAI-1
Time series kernel similarities for predicting Paroxysmal Atrial Fibrillation from ECGs
We tackle the problem of classifying Electrocardiography (ECG) signals with
the aim of predicting the onset of Paroxysmal Atrial Fibrillation (PAF). Atrial
fibrillation is the most common type of arrhythmia, but in many cases PAF
episodes are asymptomatic. Therefore, in order to help diagnosing PAF, it is
important to design procedures for detecting and, more importantly, predicting
PAF episodes. We propose a method for predicting PAF events whose first step
consists of a feature extraction procedure that represents each ECG as a
multi-variate time series. Successively, we design a classification framework
based on kernel similarities for multi-variate time series, capable of handling
missing data. We consider different approaches to perform classification in the
original space of the multi-variate time series and in an embedding space,
defined by the kernel similarity measure. We achieve a classification accuracy
comparable with state of the art methods, with the additional advantage of
detecting the PAF onset up to 15 minutes in advance
- …