110 research outputs found

    On the Sublinear Convergence of Randomly Perturbed Alternating Gradient Descent to Second Order Stationary Solutions

    Get PDF
    The alternating gradient descent (AGD) is a simple but popular algorithm which has been applied to problems in optimization, machine learning, data ming, and signal processing, etc. The algorithm updates two blocks of variables in an alternating manner, in which a gradient step is taken on one block, while keeping the remaining block fixed. When the objective function is nonconvex, it is well-known the AGD converges to the first-order stationary solution with a global sublinear rate.In this paper, we show that a variant of AGD-type algorithms will not be trapped by bad stationary solutions such as saddle points and local maximum points. In particular, we consider a smooth unconstrained optimization problem, and propose a perturbed AGD (PA-GD) which converges (with high probability) to the set of second-order stationary solutions (SS2) with a global sublinear rate. To the best of our knowledge, this is the first alternating type algorithm which takes O(polylog(d)/ϵ7/3) iterations to achieve SS2 with high probability [where polylog(d) is polynomial of the logarithm of dimension d of the problem]

    First-order methods of solving nonconvex optimization problems: Algorithms, convergence, and optimality

    Get PDF
    First-order methods for solving large scale nonconvex problems have been applied in many areas of machine learning, such as matrix factorization, dictionary learning, matrix sensing/completion, training deep neural networks, etc. For example, matrix factorization problems have lots of important applications in document clustering, community detection and image segmentation. In this dissertation, we first study some novel nonconvex variable splitting methods for solving some matrix factorization problems, mainly focusing on symmetric non-negative matrix factorization (SymNMF) and stochastic SymNMF. In the problem of SymNMF, the proposed algorithm, called nonconvex splitting SymNMF (NS-SymNMF), is guaranteed to converge to the set of Karush-Kuhn-Tucker (KKT) points of the nonconvex SymNMF problem. Furthermore, it achieves a global sublinear convergence rate. We also show that the algorithm can be efficiently implemented in a distributed manner. Further, sufficient conditions are provided which guarantee the global and local optimality of the obtained solutions. Extensive numerical results performed on both synthetic and real data sets suggest that the proposed algorithm converges quickly to a local minimum solution. Furthermore, we consider a stochastic SymNMF problem in which the observation matrix is generated in a random and sequential manner. The proposed stochastic nonconvex splitting method not only guarantees convergence to the set of stationary points of the problem (in the mean-square sense), but further achieves a sublinear convergence rate. Numerical results show that for clustering problems over both synthetic and real world datasets, the proposed algorithm converges quickly to the set of stationary points. When the objective function is nonconvex, it is well-known the most of the first-order algorithms converge to the first-order stationary solution (SS1) with a global sublinear rate. Whether the first-order algorithm can converge to the second-order stationary points (SS2) with some provable rate has attracted a lot of attention recently. In particular, we study the alternating gradient descent (AGD) algorithm as an example, which is a simple but popular algorithm and has been applied to problems in optimization, machine learning, data mining, and signal processing, etc. The algorithm updates two blocks of variables in an alternating manner, in which a gradient step is taken on one block, while keeping the remaining block fixed. In this work, we show that a variant of AGD-type algorithms will not be trapped by ``bad\u27\u27 stationary solutions such as saddle points and local maximum points. In particular, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed AGD (PA-GD) which converges (with high probability) to the set of SS2 with a global sublinear rate. To the best of our knowledge, this is the first alternating type algorithm which is guaranteed to achieve SS2 points with high probability and the corresponding convergence rate is O(polylog(d)/ϵ7/3)\mathcal{O}(\text{polylog}(d)/\epsilon^{7/3}) [where polylog(d)(d) is polynomial of the logarithm of problem dimension dd]

    PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization

    Get PDF
    Alternating gradient descent (A-GD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps. In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed A-GD (PA-GD) which is able to converge (with high probability) to the second-order stationary points (SOSPs) with a global sublinear rate. Existing analysis on A-GD type algorithm either only guarantees convergence to first-order solutions, or converges to second-order solutions asymptotically (without rates). To the best of our knowledge, this is the first alternating type algorithm that takes O(polylog(d)/ϵ2) iterations to achieve an (ϵ,ϵ√)-SOSP with high probability, where polylog(d) denotes the polynomial of the logarithm with respect to problem dimension d
    corecore