25 research outputs found

    Stochastic Methods for Composite and Weakly Convex Optimization Problems

    We consider minimization of stochastic functionals that are compositions of a (potentially) non-smooth convex function hh and smooth function cc and, more generally, stochastic weakly-convex functionals. We develop a family of stochastic methods---including a stochastic prox-linear algorithm and a stochastic (generalized) sub-gradient procedure---and prove that, under mild technical conditions, each converges to first-order stationary points of the stochastic objective. We provide experiments further investigating our methods on non-smooth phase retrieval problems; the experiments indicate the practical effectiveness of the procedures

    Minimization of nonsmooth nonconvex functions using inexact evaluations and its worst-case complexity

    An adaptive regularization algorithm using inexact function and derivatives evaluations is proposed for the solution of composite nonsmooth nonconvex optimization. It is shown that this algorithm needs at most O(log(ϵ)ϵ2)O(|\log(\epsilon)|\,\epsilon^{-2}) evaluations of the problem's functions and their derivatives for finding an ϵ\epsilon-approximate first-order stationary point. This complexity bound therefore generalizes that provided by [Bellavia, Gurioli, Morini and Toint, 2018] for inexact methods for smooth nonconvex problems, and is within a factor log(ϵ)|\log(\epsilon)| of the optimal bound known for smooth and nonsmooth nonconvex minimization with exact evaluations. A practically more restrictive variant of the algorithm with worst-case complexity O(log(ϵ)+ϵ2)O(|\log(\epsilon)|+\epsilon^{-2}) is also presented.Comment: 19 page

    Convergence of a Stochastic Subgradient Method with Averaging for Nonsmooth Nonconvex Constrained Optimization

    We prove convergence of a single time-scale stochastic subgradient method with subgradient averaging for constrained problems with a nonsmooth and nonconvex objective function having the property of generalized differentiability. As a tool of our analysis, we also prove a chain rule on a path for such functions

    Automatic Registration and Clustering of Time Series

    Clustering of time series data exhibits a number of challenges not present in other settings, notably the problem of registration (alignment) of observed signals. Typical approaches include pre-registration to a user-specified template or time warping approaches which attempt to optimally align series with a minimum of distortion. For many signals obtained from recording or sensing devices, these methods may be unsuitable as a template signal is not available for pre-registration, while the distortion of warping approaches may obscure meaningful temporal information. We propose a new method for automatic time series alignment within a clustering problem. Our approach, Temporal Registration using Optimal Unitary Transformations (TROUT), is based on a novel dissimilarity measure between time series that is easy to compute and automatically identifies optimal alignment between pairs of time series. By embedding our new measure in a optimization formulation, we retain well-known advantages of computational and statistical performance. We provide an efficient algorithm for TROUT-based clustering and demonstrate its superior performance over a range of competitors.Comment: To appear in ICASSP 202

    Zero Order Stochastic Weakly Convex Composite Optimization

    In this paper we consider stochastic weakly convex composite problems, however without the existence of a stochastic subgradient oracle. We present a derivative free algorithm that uses a two point approximation for computing a gradient estimate of the smoothed function. We prove convergence at a similar rate as state of the art methods, however with a larger constant, and report some numerical results showing the effectiveness of the approach

    ODE Analysis of Stochastic Gradient Methods with Optimism and Anchoring for Minimax Problems and GANs

    Despite remarkable empirical success, the training dynamics of generative adversarial networks (GAN), which involves solving a minimax game using stochastic gradients, is still poorly understood. In this work, we analyze last-iterate convergence of simultaneous gradient descent (simGD) and its variants under the assumption of convex-concavity, guided by a continuous-time analysis with differential equations. First, we show that simGD, as is, converges with stochastic sub-gradients under strict convexity in the primal variable. Second, we generalize optimistic simGD to accommodate an optimism rate separate from the learning rate and show its convergence with full gradients. Finally, we present anchored simGD, a new method, and show convergence with stochastic subgradients

    Adaptive First-and Zeroth-order Methods for Weakly Convex Stochastic Optimization Problems

    In this paper, we design and analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) stochastic optimization problems. Adaptive methods that use exponential moving averages of past gradients to update search directions and learning rates have recently attracted a lot of attention for solving optimization problems that arise in machine learning. Nevertheless, their convergence analysis almost exclusively requires smoothness and/or convexity of the objective function. In contrast, we establish non-asymptotic rates of convergence of first and zeroth-order adaptive methods and their proximal variants for a reasonably broad class of nonsmooth \& nonconvex optimization problems. Experimental results indicate how the proposed algorithms empirically outperform stochastic gradient descent and its zeroth-order variant for solving such optimization problems

    Learning Latent Features with Pairwise Penalties in Low-Rank Matrix Completion

    Low-rank matrix completion has achieved great success in many real-world data applications. A matrix factorization model that learns latent features is usually employed and, to improve prediction performance, the similarities between latent variables can be exploited by pairwise learning using the graph regularized matrix factorization (GRMF) method. However, existing GRMF approaches often use the squared loss to measure the pairwise differences, which may be overly influenced by dissimilar pairs and lead to inferior prediction. To fully empower pairwise learning for matrix completion, we propose a general optimization framework that allows a rich class of (non-)convex pairwise penalty functions. A new and efficient algorithm is developed to solve the proposed optimization problem, with a theoretical convergence guarantee under mild assumptions. In an important situation where the latent variables form a small number of subgroups, its statistical guarantee is also fully considered. In particular, we theoretically characterize the performance of the complexity-regularized maximum likelihood estimator, as a special case of our framework, which is shown to have smaller errors when compared to the standard matrix completion framework without pairwise penalties. We conduct extensive experiments on both synthetic and real datasets to demonstrate the superior performance of this general framework

    A Manifold Proximal Linear Method for Sparse Spectral Clustering with Application to Single-Cell RNA Sequencing Data Analysis

    Spectral clustering is one of the fundamental unsupervised learning methods widely used in data analysis. Sparse spectral clustering (SSC) imposes sparsity to the spectral clustering and it improves the interpretability of the model. This paper considers a widely adopted model for SSC, which can be formulated as an optimization problem over the Stiefel manifold with nonsmooth and nonconvex objective. Such an optimization problem is very challenging to solve. Existing methods usually solve its convex relaxation or need to smooth its nonsmooth part using certain smoothing techniques. In this paper, we propose a manifold proximal linear method (ManPL) that solves the original SSC formulation. We also extend the algorithm to solve the multiple-kernel SSC problems, for which an alternating ManPL algorithm is proposed. Convergence and iteration complexity results of the proposed methods are established. We demonstrate the advantage of our proposed methods over existing methods via the single-cell RNA sequencing data analysis

    A Stochastic Subgradient Method for Nonsmooth Nonconvex Multi-Level Composition Optimization

    We propose a single time-scale stochastic subgradient method for constrained optimization of a composition of several nonsmooth and nonconvex functions. The functions are assumed to be locally Lipschitz and differentiable in a generalized sense. Only stochastic estimates of the values and generalized derivatives of the functions are used. The method is parameter-free. We prove convergence with probability one of the method, by associating with it a system of differential inclusions and devising a nondifferentiable Lyapunov function for this system. For problems with functions having Lipschitz continuous derivatives, the method finds a point satisfying an optimality measure with error of order 1/N1/\sqrt{N}, after executing NN iterations with constant stepsize