745 research outputs found
On the Consistency of Ordinal Regression Methods
Many of the ordinal regression models that have been proposed in the
literature can be seen as methods that minimize a convex surrogate of the
zero-one, absolute, or squared loss functions. A key property that allows to
study the statistical implications of such approximations is that of Fisher
consistency. Fisher consistency is a desirable property for surrogate loss
functions and implies that in the population setting, i.e., if the probability
distribution that generates the data were available, then optimization of the
surrogate would yield the best possible model. In this paper we will
characterize the Fisher consistency of a rich family of surrogate loss
functions used in the context of ordinal regression, including support vector
ordinal regression, ORBoosting and least absolute deviation. We will see that,
for a family of surrogate loss functions that subsumes support vector ordinal
regression and ORBoosting, consistency can be fully characterized by the
derivative of a real-valued function at zero, as happens for convex
margin-based surrogates in binary classification. We also derive excess risk
bounds for a surrogate of the absolute error that generalize existing risk
bounds for binary classification. Finally, our analysis suggests a novel
surrogate of the squared error loss. We compare this novel surrogate with
competing approaches on 9 different datasets. Our method shows to be highly
competitive in practice, outperforming the least squares loss on 7 out of 9
datasets.Comment: Journal of Machine Learning Research 18 (2017
Calibration of One-Class SVM for MV set estimation
A general approach for anomaly detection or novelty detection consists in
estimating high density regions or Minimum Volume (MV) sets. The One-Class
Support Vector Machine (OCSVM) is a state-of-the-art algorithm for estimating
such regions from high dimensional data. Yet it suffers from practical
limitations. When applied to a limited number of samples it can lead to poor
performance even when picking the best hyperparameters. Moreover the solution
of OCSVM is very sensitive to the selection of hyperparameters which makes it
hard to optimize in an unsupervised setting. We present a new approach to
estimate MV sets using the OCSVM with a different choice of the parameter
controlling the proportion of outliers. The solution function of the OCSVM is
learnt on a training set and the desired probability mass is obtained by
adjusting the offset on a test set to prevent overfitting. Models learnt on
different train/test splits are then aggregated to reduce the variance induced
by such random splits. Our approach makes it possible to tune the
hyperparameters automatically and obtain nested set estimates. Experimental
results show that our approach outperforms the standard OCSVM formulation while
suffering less from the curse of dimensionality than kernel density estimates.
Results on actual data sets are also presented.Comment: IEEE DSAA' 2015, Oct 2015, Paris, Franc
HRF estimation improves sensitivity of fMRI encoding and decoding models
Extracting activation patterns from functional Magnetic Resonance Images
(fMRI) datasets remains challenging in rapid-event designs due to the inherent
delay of blood oxygen level-dependent (BOLD) signal. The general linear model
(GLM) allows to estimate the activation from a design matrix and a fixed
hemodynamic response function (HRF). However, the HRF is known to vary
substantially between subjects and brain regions. In this paper, we propose a
model for jointly estimating the hemodynamic response function (HRF) and the
activation patterns via a low-rank representation of task effects.This model is
based on the linearity assumption behind the GLM and can be computed using
standard gradient-based solvers. We use the activation patterns computed by our
model as input data for encoding and decoding studies and report performance
improvement in both settings.Comment: 3nd International Workshop on Pattern Recognition in NeuroImaging
(2013
GAP Safe screening rules for sparse multi-task and multi-class models
High dimensional regression benefits from sparsity promoting regularizations.
Screening rules leverage the known sparsity of the solution by ignoring some
variables in the optimization, hence speeding up solvers. When the procedure is
proven not to discard features wrongly the rules are said to be \emph{safe}. In
this paper we derive new safe rules for generalized linear models regularized
with and norms. The rules are based on duality gap
computations and spherical safe regions whose diameters converge to zero. This
allows to discard safely more variables, in particular for low regularization
parameters. The GAP Safe rule can cope with any iterative solver and we
illustrate its performance on coordinate descent for multi-task Lasso, binary
and multinomial logistic regression, demonstrating significant speed ups on all
tested datasets with respect to previous safe rules.Comment: in Proceedings of the 29-th Conference on Neural Information
Processing Systems (NIPS), 201
Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression
In high dimensional settings, sparse structures are crucial for efficiency,
both in term of memory, computation and performance. It is customary to
consider penalty to enforce sparsity in such scenarios. Sparsity
enforcing methods, the Lasso being a canonical example, are popular candidates
to address high dimension. For efficiency, they rely on tuning a parameter
trading data fitting versus sparsity. For the Lasso theory to hold this tuning
parameter should be proportional to the noise level, yet the latter is often
unknown in practice. A possible remedy is to jointly optimize over the
regression parameter as well as over the noise level. This has been considered
under several names in the literature: Scaled-Lasso, Square-root Lasso,
Concomitant Lasso estimation for instance, and could be of interest for
confidence sets or uncertainty quantification. In this work, after illustrating
numerical difficulties for the Smoothed Concomitant Lasso formulation, we
propose a modification we coined Smoothed Concomitant Lasso, aimed at
increasing numerical stability. We propose an efficient and accurate solver
leading to a computational cost no more expansive than the one for the Lasso.
We leverage on standard ingredients behind the success of fast Lasso solvers: a
coordinate descent algorithm, combined with safe screening rules to achieve
speed efficiency, by eliminating early irrelevant features
- …