639 research outputs found

    Generalized Sparse and Low-Rank Optimization for Ultra-Dense Networks

    Full text link
    Ultra-dense network (UDN) is a promising technology to further evolve wireless networks and meet the diverse performance requirements of 5G networks. With abundant access points, each with communication, computation and storage resources, UDN brings unprecedented benefits, including significant improvement in network spectral efficiency and energy efficiency, greatly reduced latency to enable novel mobile applications, and the capability of providing massive access for Internet of Things (IoT) devices. However, such great promises come with formidable research challenges. To design and operate such complex networks with various types of resources, efficient and innovative methodologies will be needed. This motivates the recent introduction of highly structured and generalizable models for network optimization. In this article, we present some recently proposed large-scale sparse and low-rank frameworks for optimizing UDNs, supported by various motivating applications. A special attention is paid on algorithmic approaches to deal with nonconvex objective functions and constraints, as well as computational scalability.Comment: This paper has been accepted by IEEE Communication Magazine, Special Issue on Heterogeneous Ultra Dense Network

    Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

    Full text link
    Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated consideration of optimization and statistics leads to fruitful research findings.Comment: Invited overview articl

    Complete Dictionary Recovery over the Sphere II: Recovery by Riemannian Trust-region Method

    Full text link
    We consider the problem of recovering a complete (i.e., square and invertible) matrix A0\mathbf A_0, from YRn×p\mathbf Y \in \mathbb{R}^{n \times p} with Y=A0X0\mathbf Y = \mathbf A_0 \mathbf X_0, provided X0\mathbf X_0 is sufficiently sparse. This recovery problem is central to theoretical understanding of dictionary learning, which seeks a sparse representation for a collection of input signals and finds numerous applications in modern signal processing and machine learning. We give the first efficient algorithm that provably recovers A0\mathbf A_0 when X0\mathbf X_0 has O(n)O(n) nonzeros per column, under suitable probability model for X0\mathbf X_0. Our algorithmic pipeline centers around solving a certain nonconvex optimization problem with a spherical constraint, and hence is naturally phrased in the language of manifold optimization. In a companion paper (arXiv:1511.03607), we have showed that with high probability our nonconvex formulation has no "spurious" local minimizers and around any saddle point the objective function has a negative directional curvature. In this paper, we take advantage of the particular geometric structure, and describe a Riemannian trust region algorithm that provably converges to a local minimizer with from arbitrary initializations. Such minimizers give excellent approximations to rows of X0\mathbf X_0. The rows are then recovered by linear programming rounding and deflation.Comment: The second of two papers based on the report arXiv:1504.06785. Accepted by IEEE Transaction on Information Theory; revised according to the reviewers' comment

    Complete Dictionary Recovery over the Sphere

    Full text link
    We consider the problem of recovering a complete (i.e., square and invertible) matrix A0\mathbf A_0, from YRn×p\mathbf Y \in \mathbb R^{n \times p} with Y=A0X0\mathbf Y = \mathbf A_0 \mathbf X_0, provided X0\mathbf X_0 is sufficiently sparse. This recovery problem is central to the theoretical understanding of dictionary learning, which seeks a sparse representation for a collection of input signals, and finds numerous applications in modern signal processing and machine learning. We give the first efficient algorithm that provably recovers A0\mathbf A_0 when X0\mathbf X_0 has O(n)O(n) nonzeros per column, under suitable probability model for X0\mathbf X_0. In contrast, prior results based on efficient algorithms provide recovery guarantees when X0\mathbf X_0 has only O(n1δ)O(n^{1-\delta}) nonzeros per column for any constant δ(0,1)\delta \in (0, 1). Our algorithmic pipeline centers around solving a certain nonconvex optimization problem with a spherical constraint, and hence is naturally phrased in the language of manifold optimization. To show this apparently hard problem is tractable, we first provide a geometric characterization of the high-dimensional objective landscape, which shows that with high probability there are no "spurious" local minima. This particular geometric structure allows us to design a Riemannian trust region algorithm over the sphere that provably converges to one local minimizer with an arbitrary initialization, despite the presence of saddle points. The geometric approach we develop here may also shed light on other problems arising from nonconvex recovery of structured signals.Comment: 104 pages, 5 figures. Due to length constraint of publication, this long paper are subsequently divided into two papers (arXiv:1511.03607 and arXiv:1511.04777). Further updates will be made only to the two paper

    Efficient Dictionary Learning with Gradient Descent

    Full text link
    Randomly initialized first-order optimization algorithms are the method of choice for solving many high-dimensional nonconvex problems in machine learning, yet general theoretical guarantees cannot rule out convergence to critical points of poor objective value. For some highly structured nonconvex problems however, the success of gradient descent can be understood by studying the geometry of the objective. We study one such problem -- complete orthogonal dictionary learning, and provide converge guarantees for randomly initialized gradient descent to the neighborhood of a global optimum. The resulting rates scale as low order polynomials in the dimension even though the objective possesses an exponential number of saddle points. This efficient convergence can be viewed as a consequence of negative curvature normal to the stable manifolds associated with saddle points, and we provide evidence that this feature is shared by other nonconvex problems of importance as well

    High-Order Evaluation Complexity for Convexly-Constrained Optimization with Non-Lipschitzian Group Sparsity Terms

    Full text link
    This paper studies high-order evaluation complexity for partially separable convexly-constrained optimization involving non-Lipschitzian group sparsity terms in a nonconvex objective function. We propose a partially separable adaptive regularization algorithm using a pp-th order Taylor model and show that the algorithm can produce an (epsilon,delta)-approximate q-th-order stationary point in at most O(epsilon^{-(p+1)/(p-q+1)}) evaluations of the objective function and its first p derivatives (whenever they exist). Our model uses the underlying rotational symmetry of the Euclidean norm function to build a Lipschitzian approximation for the non-Lipschitzian group sparsity terms, which are defined by the group ell_2-ell_a norm with a in (0,1). The new result shows that the partially-separable structure and non-Lipschitzian group sparsity terms in the objective function may not affect the worst-case evaluation complexity order.Comment: 27 page

    Proximal Gradient Method for Nonsmooth Optimization over the Stiefel Manifold

    Full text link
    We consider optimization problems over the Stiefel manifold whose objective function is the summation of a smooth function and a nonsmooth function. Existing methods for solving this kind of problems can be classified into three classes. Algorithms in the first class rely on information of the subgradients of the objective function and thus tend to converge slowly in practice. Algorithms in the second class are proximal point algorithms, which involve subproblems that can be as difficult as the original problem. Algorithms in the third class are based on operator-splitting techniques, but they usually lack rigorous convergence guarantees. In this paper, we propose a retraction-based proximal gradient method for solving this class of problems. We prove that the proposed method globally converges to a stationary point. Iteration complexity for obtaining an ϵ\epsilon-stationary solution is also analyzed. Numerical results on solving sparse PCA and compressed modes problems are reported to demonstrate the advantages of the proposed method

    A Unified Primal Dual Active Set Algorithm for Nonconvex Sparse Recovery

    Full text link
    In this paper, we consider the problem of recovering a sparse signal based on penalized least squares formulations. We develop a novel algorithm of primal-dual active set type for a class of nonconvex sparsity-promoting penalties, including 0\ell^0, bridge, smoothly clipped absolute deviation, capped 1\ell^1 and minimax concavity penalty. First we establish the existence of a global minimizer for the related optimization problems. Then we derive a novel necessary optimality condition for the global minimizer using the associated thresholding operator. The solutions to the optimality system are coordinate-wise minimizers, and under minor conditions, they are also local minimizers. Upon introducing the dual variable, the active set can be determined using the primal and dual variables together. Further, this relation lends itself to an iterative algorithm of active set type which at each step involves first updating the primal variable only on the active set and then updating the dual variable explicitly. When combined with a continuation strategy on the regularization parameter, the primal dual active set method is shown to converge globally to the underlying regression target under certain regularity conditions. Extensive numerical experiments with both simulated and real data demonstrate its superior performance in efficiency and accuracy compared with the existing sparse recovery methods

    Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent

    Full text link
    We address the rectangular matrix completion problem by lifting the unknown matrix to a positive semidefinite matrix in higher dimension, and optimizing a nonconvex objective over the semidefinite factor using a simple gradient descent scheme. With O(μr2κ2nmax(μ,logn))O( \mu r^2 \kappa^2 n \max(\mu, \log n)) random observations of a n1×n2n_1 \times n_2 μ\mu-incoherent matrix of rank rr and condition number κ\kappa, where n=max(n1,n2)n = \max(n_1, n_2), the algorithm linearly converges to the global optimum with high probability

    How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery?

    Full text link
    When the linear measurements of an instance of low-rank matrix recovery satisfy a restricted isometry property (RIP)---i.e. they are approximately norm-preserving---the problem is known to contain no spurious local minima, so exact recovery is guaranteed. In this paper, we show that moderate RIP is not enough to eliminate spurious local minima, so existing results can only hold for near-perfect RIP. In fact, counterexamples are ubiquitous: we prove that every x is the spurious local minimum of a rank-1 instance of matrix recovery that satisfies RIP. One specific counterexample has RIP constant δ=1/2\delta=1/2, but causes randomly initialized stochastic gradient descent (SGD) to fail 12% of the time. SGD is frequently able to avoid and escape spurious local minima, but this empirical result shows that it can occasionally be defeated by their existence. Hence, while exact recovery guarantees will likely require a proof of no spurious local minima, arguments based solely on norm preservation will only be applicable to a narrow set of nearly-isotropic instances.Comment: 32nd Conference on Neural Information Processing Systems (NIPS 2018
    corecore