1,967 research outputs found
Convex Optimization without Projection Steps
For the general problem of minimizing a convex function over a compact convex
domain, we will investigate a simple iterative approximation algorithm based on
the method by Frank & Wolfe 1956, that does not need projection steps in order
to stay inside the optimization domain. Instead of a projection step, the
linearized problem defined by a current subgradient is solved, which gives a
step direction that will naturally stay in the domain. Our framework
generalizes the sparse greedy algorithm of Frank & Wolfe and its primal-dual
analysis by Clarkson 2010 (and the low-rank SDP approach by Hazan 2008) to
arbitrary convex domains. We give a convergence proof guaranteeing
{\epsilon}-small duality gap after O(1/{\epsilon}) iterations.
The method allows us to understand the sparsity of approximate solutions for
any l1-regularized convex optimization problem (and for optimization over the
simplex), expressed as a function of the approximation quality. We obtain
matching upper and lower bounds of {\Theta}(1/{\epsilon}) for the sparsity for
l1-problems. The same bounds apply to low-rank semidefinite optimization with
bounded trace, showing that rank O(1/{\epsilon}) is best possible here as well.
As another application, we obtain sparse matrices of O(1/{\epsilon}) non-zero
entries as {\epsilon}-approximate solutions when optimizing any convex function
over a class of diagonally dominant symmetric matrices.
We show that our proposed first-order method also applies to nuclear norm and
max-norm matrix optimization problems. For nuclear norm regularized
optimization, such as matrix completion and low-rank recovery, we demonstrate
the practical efficiency and scalability of our algorithm for large matrix
problems, as e.g. the Netflix dataset. For general convex optimization over
bounded matrix max-norm, our algorithm is the first with a convergence
guarantee, to the best of our knowledge
A Constrained Matrix-Variate Gaussian Process for Transposable Data
Transposable data represents interactions among two sets of entities, and are
typically represented as a matrix containing the known interaction values.
Additional side information may consist of feature vectors specific to entities
corresponding to the rows and/or columns of such a matrix. Further information
may also be available in the form of interactions or hierarchies among entities
along the same mode (axis). We propose a novel approach for modeling
transposable data with missing interactions given additional side information.
The interactions are modeled as noisy observations from a latent noise free
matrix generated from a matrix-variate Gaussian process. The construction of
row and column covariances using side information provides a flexible mechanism
for specifying a-priori knowledge of the row and column correlations in the
data. Further, the use of such a prior combined with the side information
enables predictions for new rows and columns not observed in the training data.
In this work, we combine the matrix-variate Gaussian process model with low
rank constraints. The constrained Gaussian process approach is applied to the
prediction of hidden associations between genes and diseases using a small set
of observed associations as well as prior covariances induced by gene-gene
interaction networks and disease ontologies. The proposed approach is also
applied to recommender systems data which involves predicting the item ratings
of users using known associations as well as prior covariances induced by
social networks. We present experimental results that highlight the performance
of constrained matrix-variate Gaussian process as compared to state of the art
approaches in each domain.Comment: 23 pages, Preliminary version, Accepted for publication in Machine
Learnin
Path Following in the Exact Penalty Method of Convex Programming
Classical penalty methods solve a sequence of unconstrained problems that put
greater and greater stress on meeting the constraints. In the limit as the
penalty constant tends to , one recovers the constrained solution. In
the exact penalty method, squared penalties are replaced by absolute value
penalties, and the solution is recovered for a finite value of the penalty
constant. In practice, the kinks in the penalty and the unknown magnitude of
the penalty constant prevent wide application of the exact penalty method in
nonlinear programming. In this article, we examine a strategy of path following
consistent with the exact penalty method. Instead of performing optimization at
a single penalty constant, we trace the solution as a continuous function of
the penalty constant. Thus, path following starts at the unconstrained solution
and follows the solution path as the penalty constant increases. In the
process, the solution path hits, slides along, and exits from the various
constraints. For quadratic programming, the solution path is piecewise linear
and takes large jumps from constraint to constraint. For a general convex
program, the solution path is piecewise smooth, and path following operates by
numerically solving an ordinary differential equation segment by segment. Our
diverse applications to a) projection onto a convex set, b) nonnegative least
squares, c) quadratically constrained quadratic programming, d) geometric
programming, and e) semidefinite programming illustrate the mechanics and
potential of path following. The final detour to image denoising demonstrates
the relevance of path following to regularized estimation in inverse problems.
In regularized estimation, one follows the solution path as the penalty
constant decreases from a large value
Optimal Experimental Design for Constrained Inverse Problems
In this paper, we address the challenging problem of optimal experimental
design (OED) of constrained inverse problems. We consider two OED formulations
that allow reducing the experimental costs by minimizing the number of
measurements. The first formulation assumes a fine discretization of the design
parameter space and uses sparsity promoting regularization to obtain an
efficient design. The second formulation parameterizes the design and seeks
optimal placement for these measurements by solving a small-dimensional
optimization problem. We consider both problems in a Bayes risk as well as an
empirical Bayes risk minimization framework. For the unconstrained inverse
state problem, we exploit the closed form solution for the inner problem to
efficiently compute derivatives for the outer OED problem. The empirical
formulation does not require an explicit solution of the inverse problem and
therefore allows to integrate constraints efficiently. A key contribution is an
efficient optimization method for solving the resulting, typically
high-dimensional, bilevel optimization problem using derivative-based methods.
To overcome the lack of non-differentiability in active set methods for
inequality constraints problems, we use a relaxed interior point method. To
address the growing computational complexity of empirical Bayes OED, we
parallelize the computation over the training models. Numerical examples and
illustrations from tomographic reconstruction, for various data sets and under
different constraints, demonstrate the impact of constraints on the optimal
design and highlight the importance of OED for constrained problems.Comment: 19 pages, 8 figure
Decomposition into Low-rank plus Additive Matrices for Background/Foreground Separation: A Review for a Comparative Evaluation with a Large-Scale Dataset
Recent research on problem formulations based on decomposition into low-rank
plus sparse matrices shows a suitable framework to separate moving objects from
the background. The most representative problem formulation is the Robust
Principal Component Analysis (RPCA) solved via Principal Component Pursuit
(PCP) which decomposes a data matrix in a low-rank matrix and a sparse matrix.
However, similar robust implicit or explicit decompositions can be made in the
following problem formulations: Robust Non-negative Matrix Factorization
(RNMF), Robust Matrix Completion (RMC), Robust Subspace Recovery (RSR), Robust
Subspace Tracking (RST) and Robust Low-Rank Minimization (RLRM). The main goal
of these similar problem formulations is to obtain explicitly or implicitly a
decomposition into low-rank matrix plus additive matrices. In this context,
this work aims to initiate a rigorous and comprehensive review of the similar
problem formulations in robust subspace learning and tracking based on
decomposition into low-rank plus additive matrices for testing and ranking
existing algorithms for background/foreground separation. For this, we first
provide a preliminary review of the recent developments in the different
problem formulations which allows us to define a unified view that we called
Decomposition into Low-rank plus Additive Matrices (DLAM). Then, we examine
carefully each method in each robust subspace learning/tracking frameworks with
their decomposition, their loss functions, their optimization problem and their
solvers. Furthermore, we investigate if incremental algorithms and real-time
implementations can be achieved for background/foreground separation. Finally,
experimental results on a large-scale dataset called Background Models
Challenge (BMC 2012) show the comparative performance of 32 different robust
subspace learning/tracking methods.Comment: 121 pages, 5 figures, submitted to Computer Science Review. arXiv
admin note: text overlap with arXiv:1312.7167, arXiv:1109.6297,
arXiv:1207.3438, arXiv:1105.2126, arXiv:1404.7592, arXiv:1210.0805,
arXiv:1403.8067 by other authors, Computer Science Review, November 201
Compressive Conjugate Directions: Linear Theory
We present a powerful and easy-to-implement iterative algorithm for solving
large-scale optimization problems that involve /total-variation (TV)
regularization. The method is based on combining the Alternating Directions
Method of Multipliers (ADMM) with a Conjugate Directions technique in a way
that allows reusing conjugate search directions constructed by the algorithm
across multiple iterations of the ADMM. The new method achieves fast
convergence by trading off multiple applications of the modeling operator for
the increased memory requirement of storing previous conjugate directions. We
illustrate the new method with a series of imaging and inversion applications.Comment: 32 pages, 10 figure
Nonnegative Matrix Factorization for Signal and Data Analytics: Identifiability, Algorithms, and Applications
Nonnegative matrix factorization (NMF) has become a workhorse for signal and
data analytics, triggered by its model parsimony and interpretability. Perhaps
a bit surprisingly, the understanding to its model identifiability---the major
reason behind the interpretability in many applications such as topic mining
and hyperspectral imaging---had been rather limited until recent years.
Beginning from the 2010s, the identifiability research of NMF has progressed
considerably: Many interesting and important results have been discovered by
the signal processing (SP) and machine learning (ML) communities. NMF
identifiability has a great impact on many aspects in practice, such as
ill-posed formulation avoidance and performance-guaranteed algorithm design. On
the other hand, there is no tutorial paper that introduces NMF from an
identifiability viewpoint. In this paper, we aim at filling this gap by
offering a comprehensive and deep tutorial on model identifiability of NMF as
well as the connections to algorithms and applications. This tutorial will help
researchers and graduate students grasp the essence and insights of NMF,
thereby avoiding typical `pitfalls' that are often times due to unidentifiable
NMF formulations. This paper will also help practitioners pick/design suitable
factorization tools for their own problems.Comment: accepted version, IEEE Signal Processing Magazine; supplementary
materials added. Some minor revisions implemente
Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
Substantial progress has been made recently on developing provably accurate
and efficient algorithms for low-rank matrix factorization via nonconvex
optimization. While conventional wisdom often takes a dim view of nonconvex
optimization algorithms due to their susceptibility to spurious local minima,
simple iterative methods such as gradient descent have been remarkably
successful in practice. The theoretical footings, however, had been largely
lacking until recently.
In this tutorial-style overview, we highlight the important role of
statistical models in enabling efficient nonconvex optimization with
performance guarantees. We review two contrasting approaches: (1) two-stage
algorithms, which consist of a tailored initialization step followed by
successive refinement; and (2) global landscape analysis and
initialization-free algorithms. Several canonical matrix factorization problems
are discussed, including but not limited to matrix sensing, phase retrieval,
matrix completion, blind deconvolution, robust principal component analysis,
phase synchronization, and joint alignment. Special care is taken to illustrate
the key technical insights underlying their analyses. This article serves as a
testament that the integrated consideration of optimization and statistics
leads to fruitful research findings.Comment: Invited overview articl
Beating level-set methods for 3D seismic data interpolation: a primal-dual alternating approach
Acquisition cost is a crucial bottleneck for seismic workflows, and low-rank
formulations for data interpolation allow practitioners to `fill in' data
volumes from critically subsampled data acquired in the field. Tremendous size
of seismic data volumes required for seismic processing remains a major
challenge for these techniques.
We propose a new approach to solve residual constrained formulations for
interpolation. We represent the data volume using matrix factors, and build a
block-coordinate algorithm with constrained convex subproblems that are solved
with a primal-dual splitting scheme. The new approach is competitive with state
of the art level-set algorithms that interchange the role of objectives with
constraints. We use the new algorithm to successfully interpolate a large scale
5D seismic data volume, generated from the geologically complex synthetic 3D
Compass velocity model, where 80% of the data has been removed.Comment: 16 pages, 7 figure
Managing Randomization in the Multi-Block Alternating Direction Method of Multipliers for Quadratic Optimization
The Alternating Direction Method of Multipliers (ADMM) has gained a lot of
attention for solving large-scale and objective-separable constrained
optimization. However, the two-block variable structure of the ADMM still
limits the practical computational efficiency of the method, because one big
matrix factorization is needed at least once even for linear and convex
quadratic programming. This drawback may be overcome by enforcing a multi-block
structure of the decision variables in the original optimization problem.
Unfortunately, the multi-block ADMM, with more than two blocks, is not
guaranteed to be convergent. On the other hand, two positive developments have
been made: first, if in each cyclic loop one randomly permutes the updating
order of the multiple blocks, then the method converges in expectation for
solving any system of linear equations with any number of blocks. Secondly,
such a randomly permuted ADMM also works for equality-constrained convex
quadratic programming even when the objective function is not separable. The
goal of this paper is twofold. First, we add more randomness into the ADMM by
developing a randomly assembled cyclic ADMM (RAC-ADMM) where the decision
variables in each block are randomly assembled. We discuss the theoretical
properties of RAC-ADMM and show when random assembling helps and when it hurts,
and develop a criterion to guarantee that it converges almost surely. Secondly,
using the theoretical guidance on RAC-ADMM, we conduct multiple numerical tests
on solving both randomly generated and large-scale benchmark quadratic
optimization problems, which include continuous, and binary graph-partition and
quadratic assignment, and selected machine learning problems. Our numerical
tests show that the RAC-ADMM, with a variable-grouping strategy, could
significantly improve the computation efficiency on solving most quadratic
optimization problems.Comment: Expanded and streamlined theoretical sections. Added comparisons with
other multi-block ADMM variants. Updated Computational Studies Section on
continuous problems -- reporting primal and dual residuals instead of
objective value gap. Added selected machine learning problems
(ElasticNet/Lasso and Support Vector Machine) to Computational Studies
Sectio
- …