5,953 research outputs found
Efficient Elastic Net Regularization for Sparse Linear Models
This paper presents an algorithm for efficient training of sparse linear
models with elastic net regularization. Extending previous work on delayed
updates, the new algorithm applies stochastic gradient updates to non-zero
features only, bringing weights current as needed with closed-form updates.
Closed-form delayed updates for the , , and rarely used
regularizers have been described previously. This paper provides
closed-form updates for the popular squared norm and elastic net
regularizers.
We provide dynamic programming algorithms that perform each delayed update in
constant time. The new and elastic net methods handle both fixed and
varying learning rates, and both standard {stochastic gradient descent} (SGD)
and {forward backward splitting (FoBoS)}. Experimental results show that on a
bag-of-words dataset with features, but only nonzero features on
average per training example, the dynamic programming method trains a logistic
regression classifier with elastic net regularization over times faster
than otherwise
A Unifying View of Multiple Kernel Learning
Recent research on multiple kernel learning has lead to a number of
approaches for combining kernels in regularized risk minimization. The proposed
approaches include different formulations of objectives and varying
regularization strategies. In this paper we present a unifying general
optimization criterion for multiple kernel learning and show how existing
formulations are subsumed as special cases. We also derive the criterion's dual
representation, which is suitable for general smooth optimization algorithms.
Finally, we evaluate multiple kernel learning in this framework analytically
using a Rademacher complexity bound on the generalization error and empirically
in a set of experiments
Efficient Learning of Sparse Conditional Random Fields for Supervised Sequence Labelling
Conditional Random Fields (CRFs) constitute a popular and efficient approach
for supervised sequence labelling. CRFs can cope with large description spaces
and can integrate some form of structural dependency between labels. In this
contribution, we address the issue of efficient feature selection for CRFs
based on imposing sparsity through an L1 penalty. We first show how sparsity of
the parameter set can be exploited to significantly speed up training and
labelling. We then introduce coordinate descent parameter update schemes for
CRFs with L1 regularization. We finally provide some empirical comparisons of
the proposed approach with state-of-the-art CRF training strategies. In
particular, it is shown that the proposed approach is able to take profit of
the sparsity to speed up processing and hence potentially handle larger
dimensional models
Localized Lasso for High-Dimensional Regression
We introduce the localized Lasso, which is suited for learning models that
are both interpretable and have a high predictive power in problems with high
dimensionality and small sample size . More specifically, we consider a
function defined by local sparse models, one at each data point. We introduce
sample-wise network regularization to borrow strength across the models, and
sample-wise exclusive group sparsity (a.k.a., norm) to introduce
diversity into the choice of feature sets in the local models. The local models
are interpretable in terms of similarity of their sparsity patterns. The cost
function is convex, and thus has a globally optimal solution. Moreover, we
propose a simple yet efficient iterative least-squares based optimization
procedure for the localized Lasso, which does not need a tuning parameter, and
is guaranteed to converge to a globally optimal solution. The solution is
empirically shown to outperform alternatives for both simulated and genomic
personalized medicine data
A Reduction of the Elastic Net to Support Vector Machines with an Application to GPU Computing
The past years have witnessed many dedicated open-source projects that built
and maintain implementations of Support Vector Machines (SVM), parallelized for
GPU, multi-core CPUs and distributed systems. Up to this point, no comparable
effort has been made to parallelize the Elastic Net, despite its popularity in
many high impact applications, including genetics, neuroscience and systems
biology. The first contribution in this paper is of theoretical nature. We
establish a tight link between two seemingly different algorithms and prove
that Elastic Net regression can be reduced to SVM with squared hinge loss
classification. Our second contribution is to derive a practical algorithm
based on this reduction. The reduction enables us to utilize prior efforts in
speeding up and parallelizing SVMs to obtain a highly optimized and parallel
solver for the Elastic Net and Lasso. With a simple wrapper, consisting of only
11 lines of MATLAB code, we obtain an Elastic Net implementation that naturally
utilizes GPU and multi-core CPUs. We demonstrate on twelve real world data
sets, that our algorithm yields identical results as the popular (and highly
optimized) glmnet implementation but is one or several orders of magnitude
faster.Comment: 10 page
Sparsity-accuracy trade-off in MKL
We empirically investigate the best trade-off between sparse and
uniformly-weighted multiple kernel learning (MKL) using the elastic-net
regularization on real and simulated datasets. We find that the best trade-off
parameter depends not only on the sparsity of the true kernel-weight spectrum
but also on the linear dependence among kernels and the number of samples.Comment: 8pages, 2 figure
Task-Driven Dictionary Learning
Modeling data with linear combinations of a few elements from a learned
dictionary has been the focus of much recent research in machine learning,
neuroscience and signal processing. For signals such as natural images that
admit such sparse representations, it is now well established that these models
are well suited to restoration tasks. In this context, learning the dictionary
amounts to solving a large-scale matrix factorization problem, which can be
done efficiently with classical optimization tools. The same approach has also
been used for learning features from data for other purposes, e.g., image
classification, but tuning the dictionary in a supervised way for these tasks
has proven to be more difficult. In this paper, we present a general
formulation for supervised dictionary learning adapted to a wide variety of
tasks, and present an efficient algorithm for solving the corresponding
optimization problem. Experiments on handwritten digit classification, digital
art identification, nonlinear inverse image problems, and compressed sensing
demonstrate that our approach is effective in large-scale settings, and is well
suited to supervised and semi-supervised classification, as well as regression
tasks for data that admit sparse representations.Comment: final draft post-refereein
- …