193 research outputs found
Sparse Support Vector Infinite Push
In this paper, we address the problem of embedded feature selection for
ranking on top of the list problems. We pose this problem as a regularized
empirical risk minimization with -norm push loss function () and
sparsity inducing regularizers. We leverage the issues related to this
challenging optimization problem by considering an alternating direction method
of multipliers algorithm which is built upon proximal operators of the loss
function and the regularizer. Our main technical contribution is thus to
provide a numerical scheme for computing the infinite push loss function
proximal operator. Experimental results on toy, DNA microarray and BCI problems
show how our novel algorithm compares favorably to competitors for ranking on
top while using fewer variables in the scoring function.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Histogram of gradients of Time-Frequency Representations for Audio scene detection
This paper addresses the problem of audio scenes classification and
contributes to the state of the art by proposing a novel feature. We build this
feature by considering histogram of gradients (HOG) of time-frequency
representation of an audio scene. Contrarily to classical audio features like
MFCC, we make the hypothesis that histogram of gradients are able to encode
some relevant informations in a time-frequency {representation:} namely, the
local direction of variation (in time and frequency) of the signal spectral
power. In addition, in order to gain more invariance and robustness, histogram
of gradients are locally pooled. We have evaluated the relevance of {the novel
feature} by comparing its performances with state-of-the-art competitors, on
several datasets, including a novel one that we provide, as part of our
contribution. This dataset, that we make publicly available, involves
classes and contains about minutes of audio scene recording. We thus
believe that it may be the next standard dataset for evaluating audio scene
classification algorithms. Our comparison results clearly show that our
HOG-based features outperform its competitor
DC Proximal Newton for Non-Convex Optimization Problems
We introduce a novel algorithm for solving learning problems where both the
loss function and the regularizer are non-convex but belong to the class of
difference of convex (DC) functions. Our contribution is a new general purpose
proximal Newton algorithm that is able to deal with such a situation. The
algorithm consists in obtaining a descent direction from an approximation of
the loss function and then in performing a line search to ensure sufficient
descent. A theoretical analysis is provided showing that the iterates of the
proposed algorithm {admit} as limit points stationary points of the DC
objective function. Numerical experiments show that our approach is more
efficient than current state of the art for a problem with a convex loss
functions and non-convex regularizer. We have also illustrated the benefit of
our algorithm in high-dimensional transductive learning problem where both loss
function and regularizers are non-convex
Large margin filtering for signal sequence labeling
Signal Sequence Labeling consists in predicting a sequence of labels given an
observed sequence of samples. A naive way is to filter the signal in order to
reduce the noise and to apply a classification algorithm on the filtered
samples. We propose in this paper to jointly learn the filter with the
classifier leading to a large margin filtering for classification. This method
allows to learn the optimal cutoff frequency and phase of the filter that may
be different from zero. Two methods are proposed and tested on a toy dataset
and on a real life BCI dataset from BCI Competition III.Comment: IEEE International Conference on Acoustics Speech and Signal
Processing (ICASSP), 2010, Dallas : United States (2010
Generalized conditional gradient: analysis of convergence and applications
The objectives of this technical report is to provide additional results on
the generalized conditional gradient methods introduced by Bredies et al.
[BLM05]. Indeed , when the objective function is smooth, we provide a novel
certificate of optimality and we show that the algorithm has a linear
convergence rate. Applications of this algorithm are also discussed
Importance sampling strategy for non-convex randomized block-coordinate descent
As the number of samples and dimensionality of optimization problems related
to statistics an machine learning explode, block coordinate descent algorithms
have gained popularity since they reduce the original problem to several
smaller ones. Coordinates to be optimized are usually selected randomly
according to a given probability distribution. We introduce an importance
sampling strategy that helps randomized coordinate descent algorithms to focus
on blocks that are still far from convergence. The framework applies to
problems composed of the sum of two possibly non-convex terms, one being
separable and non-smooth. We have compared our algorithm to a full gradient
proximal approach as well as to a randomized block coordinate algorithm that
considers uniform sampling and cyclic block coordinate descent. Experimental
evidences show the clear benefit of using an importance sampling strategy
Differentially Private Sliced Wasserstein Distance
Developing machine learning methods that are privacy preserving is today a
central topic of research, with huge practical impacts. Among the numerous ways
to address privacy-preserving learning, we here take the perspective of
computing the divergences between distributions under the Differential Privacy
(DP) framework -- being able to compute divergences between distributions is
pivotal for many machine learning problems, such as learning generative models
or domain adaptation problems. Instead of resorting to the popular
gradient-based sanitization method for DP, we tackle the problem at its roots
by focusing on the Sliced Wasserstein Distance and seamlessly making it
differentially private. Our main contribution is as follows: we analyze the
property of adding a Gaussian perturbation to the intrinsic randomized
mechanism of the Sliced Wasserstein Distance, and we establish the
sensitivityof the resulting differentially private mechanism. One of our
important findings is that this DP mechanism transforms the Sliced Wasserstein
distance into another distance, that we call the Smoothed Sliced Wasserstein
Distance. This new differentially private distribution distance can be plugged
into generative models and domain adaptation algorithms in a transparent way,
and we empirically show that it yields highly competitive performance compared
with gradient-based DP approaches from the literature, with almost no loss in
accuracy for the domain adaptation problems that we consider
- âŠ