206 research outputs found
Optimization with Sparsity-Inducing Penalties
Sparse estimation methods are aimed at using or obtaining parsimonious
representations of data or models. They were first dedicated to linear variable
selection but numerous extensions have now emerged such as structured sparsity
or kernel selection. It turns out that many of the related estimation problems
can be cast as convex optimization problems by regularizing the empirical risk
with appropriate non-smooth norms. The goal of this paper is to present from a
general perspective optimization tools and techniques dedicated to such
sparsity-inducing penalties. We cover proximal methods, block-coordinate
descent, reweighted -penalized techniques, working-set and homotopy
methods, as well as non-convex formulations and extensions, and provide an
extensive set of experiments to compare various algorithms from a computational
point of view
GENO -- GENeric Optimization for Classical Machine Learning
Although optimization is the longstanding algorithmic backbone of machine
learning, new models still require the time-consuming implementation of new
solvers. As a result, there are thousands of implementations of optimization
algorithms for machine learning problems. A natural question is, if it is
always necessary to implement a new solver, or if there is one algorithm that
is sufficient for most models. Common belief suggests that such a
one-algorithm-fits-all approach cannot work, because this algorithm cannot
exploit model specific structure and thus cannot be efficient and robust on a
wide variety of problems. Here, we challenge this common belief. We have
designed and implemented the optimization framework GENO (GENeric Optimization)
that combines a modeling language with a generic solver. GENO generates a
solver from the declarative specification of an optimization problem class. The
framework is flexible enough to encompass most of the classical machine
learning problems. We show on a wide variety of classical but also some
recently suggested problems that the automatically generated solvers are (1) as
efficient as well-engineered specialized solvers, (2) more efficient by a
decent margin than recent state-of-the-art solvers, and (3) orders of magnitude
more efficient than classical modeling language plus solver approaches
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Subsequence clustering of multivariate time series is a useful tool for
discovering repeated patterns in temporal data. Once these patterns have been
discovered, seemingly complicated datasets can be interpreted as a temporal
sequence of only a small number of states, or clusters. For example, raw sensor
data from a fitness-tracking application can be expressed as a timeline of a
select few actions (i.e., walking, sitting, running). However, discovering
these patterns is challenging because it requires simultaneous segmentation and
clustering of the time series. Furthermore, interpreting the resulting clusters
is difficult, especially when the data is high-dimensional. Here we propose a
new method of model-based clustering, which we call Toeplitz Inverse
Covariance-based Clustering (TICC). Each cluster in the TICC method is defined
by a correlation network, or Markov random field (MRF), characterizing the
interdependencies between different observations in a typical subsequence of
that cluster. Based on this graphical representation, TICC simultaneously
segments and clusters the time series data. We solve the TICC problem through
alternating minimization, using a variation of the expectation maximization
(EM) algorithm. We derive closed-form solutions to efficiently solve the two
resulting subproblems in a scalable way, through dynamic programming and the
alternating direction method of multipliers (ADMM), respectively. We validate
our approach by comparing TICC to several state-of-the-art baselines in a
series of synthetic experiments, and we then demonstrate on an automobile
sensor dataset how TICC can be used to learn interpretable clusters in
real-world scenarios.Comment: This revised version fixes two small typos in the published versio
Scaling Algorithms for Unbalanced Transport Problems
This article introduces a new class of fast algorithms to approximate
variational problems involving unbalanced optimal transport. While classical
optimal transport considers only normalized probability distributions, it is
important for many applications to be able to compute some sort of relaxed
transportation between arbitrary positive measures. A generic class of such
"unbalanced" optimal transport problems has been recently proposed by several
authors. In this paper, we show how to extend the, now classical, entropic
regularization scheme to these unbalanced problems. This gives rise to fast,
highly parallelizable algorithms that operate by performing only diagonal
scaling (i.e. pointwise multiplications) of the transportation couplings. They
are generalizations of the celebrated Sinkhorn algorithm. We show how these
methods can be used to solve unbalanced transport, unbalanced gradient flows,
and to compute unbalanced barycenters. We showcase applications to 2-D shape
modification, color transfer, and growth models
Supervised classification and mathematical optimization
Data Mining techniques often ask for the resolution of optimization problems. Supervised Classification, and, in particular, Support Vector Machines, can be seen as a paradigmatic instance. In this paper, some links between Mathematical Optimization methods and Supervised Classification are emphasized. It is shown that many different areas of Mathematical Optimization play a central role in off-the-shelf Supervised Classification methods. Moreover, Mathematical Optimization turns out to be extremely
useful to address important issues in Classification, such as identifying relevant variables, improving the interpretability of classifiers or dealing with vagueness/noise in the data.Ministerio de Ciencia e InnovaciónJunta de Andalucí
Supervised Classification and Mathematical Optimization
Data Mining techniques often ask for the resolution of optimization problems. Supervised Classification, and, in particular, Support Vector Machines, can be seen as a paradigmatic instance. In this paper, some links between Mathematical Optimization methods and Supervised Classification are emphasized. It is shown that many different areas of Mathematical Optimization play a central role in off-the-shelf Supervised Classification methods. Moreover, Mathematical Optimization turns out to be extremely useful to address important issues in Classification, such as identifying relevant variables, improving the interpretability of classifiers or dealing with vagueness/noise in the data
Analysis of the Frank-Wolfe Method for Convex Composite Optimization involving a Logarithmically-Homogeneous Barrier
We present and analyze a new generalized Frank-Wolfe method for the composite
optimization problem ,
where is a -logarithmically-homogeneous self-concordant barrier,
is a linear operator and the function has bounded domain but
is possibly non-smooth. We show that our generalized Frank-Wolfe method
requires iterations to produce an -approximate
solution, where denotes the initial optimality gap and is the
variation of on its domain. This result establishes certain intrinsic
connections between -logarithmically homogeneous barriers and the
Frank-Wolfe method. When specialized to the -optimal design problem, we
essentially recover the complexity obtained by Khachiyan using the Frank-Wolfe
method with exact line-search. We also study the (Fenchel) dual problem of
, and we show that our new method is equivalent to an adaptive-step-size
mirror descent method applied to the dual problem. This enables us to provide
iteration complexity bounds for the mirror descent method despite even though
the dual objective function is non-Lipschitz and has unbounded domain. In
addition, we present computational experiments that point to the potential
usefulness of our generalized Frank-Wolfe method on Poisson image de-blurring
problems with TV regularization, and on simulated PET problem instances.Comment: See Version 1 (v1) for the analysis of the Frank-Wolfe method with
adaptive step-size applied to the H\"older smooth function
- …