639 research outputs found
Generalized Sparse and Low-Rank Optimization for Ultra-Dense Networks
Ultra-dense network (UDN) is a promising technology to further evolve
wireless networks and meet the diverse performance requirements of 5G networks.
With abundant access points, each with communication, computation and storage
resources, UDN brings unprecedented benefits, including significant improvement
in network spectral efficiency and energy efficiency, greatly reduced latency
to enable novel mobile applications, and the capability of providing massive
access for Internet of Things (IoT) devices. However, such great promises come
with formidable research challenges. To design and operate such complex
networks with various types of resources, efficient and innovative
methodologies will be needed. This motivates the recent introduction of highly
structured and generalizable models for network optimization. In this article,
we present some recently proposed large-scale sparse and low-rank frameworks
for optimizing UDNs, supported by various motivating applications. A special
attention is paid on algorithmic approaches to deal with nonconvex objective
functions and constraints, as well as computational scalability.Comment: This paper has been accepted by IEEE Communication Magazine, Special
Issue on Heterogeneous Ultra Dense Network
Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
Substantial progress has been made recently on developing provably accurate
and efficient algorithms for low-rank matrix factorization via nonconvex
optimization. While conventional wisdom often takes a dim view of nonconvex
optimization algorithms due to their susceptibility to spurious local minima,
simple iterative methods such as gradient descent have been remarkably
successful in practice. The theoretical footings, however, had been largely
lacking until recently.
In this tutorial-style overview, we highlight the important role of
statistical models in enabling efficient nonconvex optimization with
performance guarantees. We review two contrasting approaches: (1) two-stage
algorithms, which consist of a tailored initialization step followed by
successive refinement; and (2) global landscape analysis and
initialization-free algorithms. Several canonical matrix factorization problems
are discussed, including but not limited to matrix sensing, phase retrieval,
matrix completion, blind deconvolution, robust principal component analysis,
phase synchronization, and joint alignment. Special care is taken to illustrate
the key technical insights underlying their analyses. This article serves as a
testament that the integrated consideration of optimization and statistics
leads to fruitful research findings.Comment: Invited overview articl
Complete Dictionary Recovery over the Sphere II: Recovery by Riemannian Trust-region Method
We consider the problem of recovering a complete (i.e., square and
invertible) matrix , from
with , provided is
sufficiently sparse. This recovery problem is central to theoretical
understanding of dictionary learning, which seeks a sparse representation for a
collection of input signals and finds numerous applications in modern signal
processing and machine learning. We give the first efficient algorithm that
provably recovers when has nonzeros per
column, under suitable probability model for .
Our algorithmic pipeline centers around solving a certain nonconvex
optimization problem with a spherical constraint, and hence is naturally
phrased in the language of manifold optimization. In a companion paper
(arXiv:1511.03607), we have showed that with high probability our nonconvex
formulation has no "spurious" local minimizers and around any saddle point the
objective function has a negative directional curvature. In this paper, we take
advantage of the particular geometric structure, and describe a Riemannian
trust region algorithm that provably converges to a local minimizer with from
arbitrary initializations. Such minimizers give excellent approximations to
rows of . The rows are then recovered by linear programming
rounding and deflation.Comment: The second of two papers based on the report arXiv:1504.06785.
Accepted by IEEE Transaction on Information Theory; revised according to the
reviewers' comment
Complete Dictionary Recovery over the Sphere
We consider the problem of recovering a complete (i.e., square and
invertible) matrix , from
with , provided is
sufficiently sparse. This recovery problem is central to the theoretical
understanding of dictionary learning, which seeks a sparse representation for a
collection of input signals, and finds numerous applications in modern signal
processing and machine learning. We give the first efficient algorithm that
provably recovers when has nonzeros per
column, under suitable probability model for . In contrast, prior
results based on efficient algorithms provide recovery guarantees when has only nonzeros per column for any constant .
Our algorithmic pipeline centers around solving a certain nonconvex
optimization problem with a spherical constraint, and hence is naturally
phrased in the language of manifold optimization. To show this apparently hard
problem is tractable, we first provide a geometric characterization of the
high-dimensional objective landscape, which shows that with high probability
there are no "spurious" local minima. This particular geometric structure
allows us to design a Riemannian trust region algorithm over the sphere that
provably converges to one local minimizer with an arbitrary initialization,
despite the presence of saddle points. The geometric approach we develop here
may also shed light on other problems arising from nonconvex recovery of
structured signals.Comment: 104 pages, 5 figures. Due to length constraint of publication, this
long paper are subsequently divided into two papers (arXiv:1511.03607 and
arXiv:1511.04777). Further updates will be made only to the two paper
Efficient Dictionary Learning with Gradient Descent
Randomly initialized first-order optimization algorithms are the method of
choice for solving many high-dimensional nonconvex problems in machine
learning, yet general theoretical guarantees cannot rule out convergence to
critical points of poor objective value. For some highly structured nonconvex
problems however, the success of gradient descent can be understood by studying
the geometry of the objective. We study one such problem -- complete orthogonal
dictionary learning, and provide converge guarantees for randomly initialized
gradient descent to the neighborhood of a global optimum. The resulting rates
scale as low order polynomials in the dimension even though the objective
possesses an exponential number of saddle points. This efficient convergence
can be viewed as a consequence of negative curvature normal to the stable
manifolds associated with saddle points, and we provide evidence that this
feature is shared by other nonconvex problems of importance as well
High-Order Evaluation Complexity for Convexly-Constrained Optimization with Non-Lipschitzian Group Sparsity Terms
This paper studies high-order evaluation complexity for partially separable
convexly-constrained optimization involving non-Lipschitzian group sparsity
terms in a nonconvex objective function. We propose a partially separable
adaptive regularization algorithm using a -th order Taylor model and show
that the algorithm can produce an (epsilon,delta)-approximate q-th-order
stationary point in at most O(epsilon^{-(p+1)/(p-q+1)}) evaluations of the
objective function and its first p derivatives (whenever they exist). Our model
uses the underlying rotational symmetry of the Euclidean norm function to build
a Lipschitzian approximation for the non-Lipschitzian group sparsity terms,
which are defined by the group ell_2-ell_a norm with a in (0,1). The new result
shows that the partially-separable structure and non-Lipschitzian group
sparsity terms in the objective function may not affect the worst-case
evaluation complexity order.Comment: 27 page
Proximal Gradient Method for Nonsmooth Optimization over the Stiefel Manifold
We consider optimization problems over the Stiefel manifold whose objective
function is the summation of a smooth function and a nonsmooth function.
Existing methods for solving this kind of problems can be classified into three
classes. Algorithms in the first class rely on information of the subgradients
of the objective function and thus tend to converge slowly in practice.
Algorithms in the second class are proximal point algorithms, which involve
subproblems that can be as difficult as the original problem. Algorithms in the
third class are based on operator-splitting techniques, but they usually lack
rigorous convergence guarantees. In this paper, we propose a retraction-based
proximal gradient method for solving this class of problems. We prove that the
proposed method globally converges to a stationary point. Iteration complexity
for obtaining an -stationary solution is also analyzed. Numerical
results on solving sparse PCA and compressed modes problems are reported to
demonstrate the advantages of the proposed method
A Unified Primal Dual Active Set Algorithm for Nonconvex Sparse Recovery
In this paper, we consider the problem of recovering a sparse signal based on
penalized least squares formulations. We develop a novel algorithm of
primal-dual active set type for a class of nonconvex sparsity-promoting
penalties, including , bridge, smoothly clipped absolute deviation,
capped and minimax concavity penalty. First we establish the existence
of a global minimizer for the related optimization problems. Then we derive a
novel necessary optimality condition for the global minimizer using the
associated thresholding operator. The solutions to the optimality system are
coordinate-wise minimizers, and under minor conditions, they are also local
minimizers. Upon introducing the dual variable, the active set can be
determined using the primal and dual variables together. Further, this relation
lends itself to an iterative algorithm of active set type which at each step
involves first updating the primal variable only on the active set and then
updating the dual variable explicitly. When combined with a continuation
strategy on the regularization parameter, the primal dual active set method is
shown to converge globally to the underlying regression target under certain
regularity conditions. Extensive numerical experiments with both simulated and
real data demonstrate its superior performance in efficiency and accuracy
compared with the existing sparse recovery methods
Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent
We address the rectangular matrix completion problem by lifting the unknown
matrix to a positive semidefinite matrix in higher dimension, and optimizing a
nonconvex objective over the semidefinite factor using a simple gradient
descent scheme. With random
observations of a -incoherent matrix of rank and
condition number , where , the algorithm linearly
converges to the global optimum with high probability
How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery?
When the linear measurements of an instance of low-rank matrix recovery
satisfy a restricted isometry property (RIP)---i.e. they are approximately
norm-preserving---the problem is known to contain no spurious local minima, so
exact recovery is guaranteed. In this paper, we show that moderate RIP is not
enough to eliminate spurious local minima, so existing results can only hold
for near-perfect RIP. In fact, counterexamples are ubiquitous: we prove that
every x is the spurious local minimum of a rank-1 instance of matrix recovery
that satisfies RIP. One specific counterexample has RIP constant ,
but causes randomly initialized stochastic gradient descent (SGD) to fail 12%
of the time. SGD is frequently able to avoid and escape spurious local minima,
but this empirical result shows that it can occasionally be defeated by their
existence. Hence, while exact recovery guarantees will likely require a proof
of no spurious local minima, arguments based solely on norm preservation will
only be applicable to a narrow set of nearly-isotropic instances.Comment: 32nd Conference on Neural Information Processing Systems (NIPS 2018
- …