1,391 research outputs found
A Smoothing SQP Framework for a Class of Composite Minimization over Polyhedron
The composite minimization problem over a general polyhedron
has received various applications in machine learning, wireless communications,
image restoration, signal reconstruction, etc. This paper aims to provide a
theoretical study on this problem. Firstly, we show that for any fixed ,
finding the global minimizer of the problem, even its unconstrained
counterpart, is strongly NP-hard. Secondly, we derive Karush-Kuhn-Tucker (KKT)
optimality conditions for local minimizers of the problem. Thirdly, we propose
a smoothing sequential quadratic programming framework for solving this
problem. The framework requires a (approximate) solution of a convex quadratic
program at each iteration. Finally, we analyze the worst-case iteration
complexity of the framework for returning an -KKT point; i.e., a
feasible point that satisfies a perturbed version of the derived KKT optimality
conditions. To the best of our knowledge, the proposed framework is the first
one with a worst-case iteration complexity guarantee for solving composite
minimization over a general polyhedron
The proximal point method revisited
In this short survey, I revisit the role of the proximal point method in
large scale optimization. I focus on three recent examples: a proximally guided
subgradient method for weakly convex stochastic approximation, the prox-linear
algorithm for minimizing compositions of convex functions and smooth maps, and
Catalyst generic acceleration for regularized Empirical Risk Minimization.Comment: 11 pages, submitted to SIAG/OPT Views and New
Graphical Convergence of Subgradients in Nonconvex Optimization and Learning
We investigate the stochastic optimization problem of minimizing population
risk, where the loss defining the risk is assumed to be weakly convex.
Compositions of Lipschitz convex functions with smooth maps are the primary
examples of such losses. We analyze the estimation quality of such nonsmooth
and nonconvex problems by their sample average approximations. Our main results
establish dimension-dependent rates on subgradient estimation in full
generality and dimension-independent rates when the loss is a generalized
linear model. As an application of the developed techniques, we analyze the
nonsmooth landscape of a robust nonlinear regression problem.Comment: 36 page
Quartic First-Order Methods for Low Rank Minimization
We study a generalized nonconvex Burer-Monteiro formulation for low-rank
minimization problems. We use recent results on non-Euclidean first order
methods to provide efficient and scalable algorithms. Our approach uses
geometries induced by quartic kernels on matrix spaces; for unconstrained cases
we introduce a novel family of Gram kernels that considerably improves
numerical performances. Numerical experiments for Euclidean distance matrix
completion and symmetric nonnegative matrix factorization show that our
algorithms scale well and reach state of the art performance when compared to
specialized methods
A direct formulation for sparse PCA using semidefinite programming
We examine the problem of approximating, in the Frobenius-norm sense, a
positive, semidefinite symmetric matrix by a rank-one matrix, with an upper
bound on the cardinality of its eigenvector. The problem arises in the
decomposition of a covariance matrix into sparse factors, and has wide
applications ranging from biology to finance. We use a modification of the
classical variational representation of the largest eigenvalue of a symmetric
matrix, where cardinality is constrained, and derive a semidefinite programming
based relaxation for our problem. We also discuss Nesterov's smooth
minimization technique applied to the SDP arising in the direct sparse PCA
method.Comment: Final version, to appear in SIAM revie
Proximal-Like Incremental Aggregated Gradient Method with Linear Convergence under Bregman Distance Growth Conditions
We introduce a unified algorithmic framework, called proximal-like
incremental aggregated gradient (PLIAG) method, for minimizing the sum of a
convex function that consists of additive relatively smooth convex components
and a proper lower semi-continuous convex regularization function, over an
abstract feasible set whose geometry can be captured by using the domain of a
Legendre function. The PLIAG method includes many existing algorithms in the
literature as special cases such as the proximal gradient method, the Bregman
proximal gradient method (also called NoLips algorithm), the incremental
aggregated gradient method, the incremental aggregated proximal method, and the
proximal incremental aggregated gradient method. It also includes some novel
interesting iteration schemes. First we show the PLIAG method is globally
sublinearly convergent without requiring a growth condition, which extends the
sublinear convergence result for the proximal gradient algorithm to incremental
aggregated type first order methods. Then by embedding a so-called Bregman
distance growth condition into a descent-type lemma to construct a special
Lyapunov function, we show that the PLIAG method is globally linearly
convergent in terms of both function values and Bregman distances to the
optimal solution set, provided that the step size is not greater than some
positive constant. These convergence results derived in this paper are all
established beyond the standard assumptions in the literature (i.e., without
requiring the strong convexity and the Lipschitz gradient continuity of the
smooth part of the objective). When specialized to many existing algorithms,
our results recover or supplement their convergence results under strictly
weaker conditions.Comment: 28 page
Escaping Saddle Points in Constrained Optimization
In this paper, we study the problem of escaping from saddle points in smooth
nonconvex optimization problems subject to a convex set . We
propose a generic framework that yields convergence to a second-order
stationary point of the problem, if the convex set is simple for
a quadratic objective function. Specifically, our results hold if one can find
a -approximate solution of a quadratic program subject to
in polynomial time, where is a positive constant that depends on the
structure of the set . Under this condition, we show that the
sequence of iterates generated by the proposed framework reaches an
-second order stationary point (SOSP) in at most
iterations. We
further characterize the overall complexity of reaching an SOSP when the convex
set can be written as a set of quadratic constraints and the
objective function Hessian has a specific structure over the convex set
. Finally, we extend our results to the stochastic setting and
characterize the number of stochastic gradient and Hessian evaluations to reach
an -SOSP
High-Order Evaluation Complexity for Convexly-Constrained Optimization with Non-Lipschitzian Group Sparsity Terms
This paper studies high-order evaluation complexity for partially separable
convexly-constrained optimization involving non-Lipschitzian group sparsity
terms in a nonconvex objective function. We propose a partially separable
adaptive regularization algorithm using a -th order Taylor model and show
that the algorithm can produce an (epsilon,delta)-approximate q-th-order
stationary point in at most O(epsilon^{-(p+1)/(p-q+1)}) evaluations of the
objective function and its first p derivatives (whenever they exist). Our model
uses the underlying rotational symmetry of the Euclidean norm function to build
a Lipschitzian approximation for the non-Lipschitzian group sparsity terms,
which are defined by the group ell_2-ell_a norm with a in (0,1). The new result
shows that the partially-separable structure and non-Lipschitzian group
sparsity terms in the objective function may not affect the worst-case
evaluation complexity order.Comment: 27 page
Parallel and Distributed Methods for Nonconvex Optimization--Part II: Applications
In Part I of this paper, we proposed and analyzed a novel algorithmic
framework for the minimization of a nonconvex (smooth) objective function,
subject to nonconvex constraints, based on inner convex approximations. This
Part II is devoted to the application of the framework to some resource
allocation problems in communication networks. In particular, we consider two
non-trivial case-study applications, namely: (generalizations of) i) the rate
profile maximization in MIMO interference broadcast networks; and the ii) the
max-min fair multicast multigroup beamforming problem in a multi-cell
environment. We develop a new class of algorithms enjoying the following
distinctive features: i) they are \emph{distributed} across the base stations
(with limited signaling) and lead to subproblems whose solutions are computable
in closed form; and ii) differently from current relaxation-based schemes
(e.g., semidefinite relaxation), they are proved to always converge to
d-stationary solutions of the aforementioned class of nonconvex problems.
Numerical results show that the proposed (distributed) schemes achieve larger
worst-case rates (resp. signal-to-noise interference ratios) than
state-of-the-art centralized ones while having comparable computational
complexity.Comment: Part I of this paper can be found at http://arxiv.org/abs/1410.475
DCOOL-NET: Distributed cooperative localization for sensor networks
We present DCOOL-NET, a scalable distributed in-network algorithm for sensor
network localization based on noisy range measurements. DCOOL-NET operates by
parallel, collaborative message passing between single-hop neighbor sensors,
and involves simple computations at each node. It stems from an application of
the majorization-minimization (MM) framework to the nonconvex optimization
problem at hand, and capitalizes on a novel convex majorizer. The proposed
majorizer is endowed with several desirable properties and represents a key
contribution of this work. It is a more accurate match to the underlying
nonconvex cost function than popular MM quadratic majorizers, and is readily
amenable to distributed minimization via the alternating direction method of
multipliers (ADMM). Moreover, it allows for low-complexity, fast Nesterov
gradient methods to tackle the ADMM subproblems induced at each node. Computer
simulations show that DCOOL-NET achieves comparable or better sensor position
accuracies than a state-of-art method which, furthermore, is not parallel
- …