826 research outputs found

    Penalty Methods with Stochastic Approximation for Stochastic Nonlinear Programming

    Full text link
    In this paper, we propose a class of penalty methods with stochastic approximation for solving stochastic nonlinear programming problems. We assume that only noisy gradients or function values of the objective function are available via calls to a stochastic first-order or zeroth-order oracle. In each iteration of the proposed methods, we minimize an exact penalty function which is nonsmooth and nonconvex with only stochastic first-order or zeroth-order information available. Stochastic approximation algorithms are presented for solving this particular subproblem. The worst-case complexity of calls to the stochastic first-order (or zeroth-order) oracle for the proposed penalty methods for obtaining an ϵ\epsilon-stochastic critical point is analyzed

    Zeroth-Order Stochastic Block Coordinate Type Methods for Nonconvex Optimization

    Full text link
    We study (constrained) nonconvex (composite) optimization problems where the decision variables vector can be split into blocks of variables. Random block projection is a popular technique to handle this kind of problem for its remarkable reduction of the computational cost from the projection. However, this powerful method has not been proposed for the situation that first-order information is prohibited and only zeroth-order information is available. In this paper, we propose to develop different classes of zeroth-order stochastic block coordinate type methods. Zeroth-order block coordinate descent (ZS-BCD) is proposed for solving unconstrained nonconvex optimization problem. For composite optimization, we establish the zeroth-order stochastic block mirror descent (ZS-BMD) and its associated two-phase method to achieve the complexity bound for finding (ϵ,Λ)(\epsilon,\Lambda)-solution. Furthermore, we also establish zeroth-order stochastic block coordinate conditional gradient (ZS-BCCG) method for nonconvex (composite) optimization. By implementing ZS-BCCG method, in each iteration, only (approximate) linear programming subproblem needs to be solved on a random block instead of a rather costly projection subproblem on the whole decision space, in contrast to the existing traditional stochastic approximation methods. In what follows, an approximate ZS-BCCG method and corresponding two-phase ZS-BCCG method are proposed. This is also the first time that a two-phase BCCG method has been developed to achieve the (ϵ,Λ)(\epsilon, \Lambda)-solution of nonconvex composite optimization problem. To the best of our knowledge, the proposed results in this paper are new in stochastic nonconvex (composite) optimization literature.Comment: 39page

    Zeroth-order Nonconvex Stochastic Optimization: Handling Constraints, High-Dimensionality and Saddle-Points

    Full text link
    In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization, with a focus on addressing constrained optimization, high-dimensional setting and saddle-point avoiding. To handle constrained optimization, we first propose generalizations of the conditional gradient algorithm achieving rates similar to the standard stochastic gradient algorithm using only zeroth-order information. To facilitate zeroth-order optimization in high-dimensions, we explore the advantages of structural sparsity assumptions. Specifically, (i) we highlight an implicit regularization phenomenon where the standard stochastic gradient algorithm with zeroth-order information adapts to the sparsity of the problem at hand by just varying the step-size and (ii) propose a truncated stochastic gradient algorithm with zeroth-order information, whose rate of convergence depends only poly-logarithmically on the dimensionality. We next focus on avoiding saddle-points in non-convex setting. Towards that, we interpret the Gaussian smoothing technique for estimating gradient based on zeroth-order information as an instantiation of first-order Stein's identity. Based on this, we provide a novel linear-(in dimension) time estimator of the Hessian matrix of a function using only zeroth-order information, which is based on second-order Stein's identity. We then provide an algorithm for avoiding saddle-points, which is based on a zeroth-order cubic regularization Newton's method and discuss its convergence rates

    A Proximal Zeroth-Order Algorithm for Nonconvex Nonsmooth Problems

    Full text link
    In this paper, we focus on solving an important class of nonconvex optimization problems which includes many problems for example signal processing over a networked multi-agent system and distributed learning over networks. Motivated by many applications in which the local objective function is the sum of smooth but possibly nonconvex part, and non-smooth but convex part subject to a linear equality constraint, this paper proposes a proximal zeroth-order primal dual algorithm (PZO-PDA) that accounts for the information structure of the problem. This algorithm only utilize the zeroth-order information (i.e., the functional values) of smooth functions, yet the flexibility is achieved for applications that only noisy information of the objective function is accessible, where classical methods cannot be applied. We prove convergence and rate of convergence for PZO-PDA. Numerical experiments are provided to validate the theoretical results

    Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization

    Full text link
    This paper considers a class of constrained stochastic composite optimization problems whose objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a certain non-differentiable (but convex) component. In order to solve these problems, we propose a randomized stochastic projected gradient (RSPG) algorithm, in which proper mini-batch of samples are taken at each iteration depending on the total budget of stochastic samples allowed. The RSPG algorithm also employs a general distance function to allow taking advantage of the geometry of the feasible region. Complexity of this algorithm is established in a unified setting, which shows nearly optimal complexity of the algorithm for convex stochastic programming. A post-optimization phase is also proposed to significantly reduce the variance of the solutions returned by the algorithm. In addition, based on the RSPG algorithm, a stochastic gradient free algorithm, which only uses the stochastic zeroth-order information, has been also discussed. Some preliminary numerical results are also provided.Comment: 32 page

    Adaptive First-and Zeroth-order Methods for Weakly Convex Stochastic Optimization Problems

    Full text link
    In this paper, we design and analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) stochastic optimization problems. Adaptive methods that use exponential moving averages of past gradients to update search directions and learning rates have recently attracted a lot of attention for solving optimization problems that arise in machine learning. Nevertheless, their convergence analysis almost exclusively requires smoothness and/or convexity of the objective function. In contrast, we establish non-asymptotic rates of convergence of first and zeroth-order adaptive methods and their proximal variants for a reasonably broad class of nonsmooth \& nonconvex optimization problems. Experimental results indicate how the proposed algorithms empirically outperform stochastic gradient descent and its zeroth-order variant for solving such optimization problems

    Gradient-Free Multi-Agent Nonconvex Nonsmooth Optimization

    Full text link
    In this paper, we consider the problem of minimizing the sum of nonconvex and possibly nonsmooth functions over a connected multi-agent network, where the agents have partial knowledge about the global cost function and can only access the zeroth-order information (i.e., the functional values) of their local cost functions. We propose and analyze a distributed primal-dual gradient-free algorithm for this challenging problem. We show that by appropriately choosing the parameters, the proposed algorithm converges to the set of first order stationary solutions with a provable global sublinear convergence rate. Numerical experiments demonstrate the effectiveness of our proposed method for optimizing nonconvex and nonsmooth problems over a network.Comment: Long version of CDC pape

    Zeroth-Order Stochastic Alternating Direction Method of Multipliers for Nonconvex Nonsmooth Optimization

    Full text link
    Alternating direction method of multipliers (ADMM) is a popular optimization tool for the composite and constrained problems in machine learning. However, in many machine learning problems such as black-box attacks and bandit feedback, ADMM could fail because the explicit gradients of these problems are difficult or infeasible to obtain. Zeroth-order (gradient-free) methods can effectively solve these problems due to that the objective function values are only required in the optimization. Recently, though there exist a few zeroth-order ADMM methods, they build on the convexity of objective function. Clearly, these existing zeroth-order methods are limited in many applications. In the paper, thus, we propose a class of fast zeroth-order stochastic ADMM methods (i.e., ZO-SVRG-ADMM and ZO-SAGA-ADMM) for solving nonconvex problems with multiple nonsmooth penalties, based on the coordinate smoothing gradient estimator. Moreover, we prove that both the ZO-SVRG-ADMM and ZO-SAGA-ADMM have convergence rate of O(1/T)O(1/T), where TT denotes the number of iterations. In particular, our methods not only reach the best convergence rate O(1/T)O(1/T) for the nonconvex optimization, but also are able to effectively solve many complex machine learning problems with multiple regularized penalties and constraints. Finally, we conduct the experiments of black-box binary classification and structured adversarial attack on black-box deep neural network to validate the efficiency of our algorithms.Comment: To Appear in IJCAI 2019. Supplementary materials are adde

    Zeroth Order Nonconvex Multi-Agent Optimization over Networks

    Full text link
    In this paper, we consider distributed optimization problems over a multi-agent network, where each agent can only partially evaluate the objective function, and it is allowed to exchange messages with its immediate neighbors. Differently from all existing works on distributed optimization, our focus is given to optimizing a class of non-convex problems, and under the challenging setting where each agent can only access the zeroth-order information (i.e., the functional values) of its local functions. For different types of network topologies such as undirected connected networks or star networks, we develop efficient distributed algorithms and rigorously analyze their convergence and rate of convergence (to the set of stationary solutions). Numerical results are provided to demonstrate the efficiency of the proposed algorithms

    Semantics, Representations and Grammars for Deep Learning

    Full text link
    Deep learning is currently the subject of intensive study. However, fundamental concepts such as representations are not formally defined -- researchers "know them when they see them" -- and there is no common language for describing and analyzing algorithms. This essay proposes an abstract framework that identifies the essential features of current practice and may provide a foundation for future developments. The backbone of almost all deep learning algorithms is backpropagation, which is simply a gradient computation distributed over a neural network. The main ingredients of the framework are thus, unsurprisingly: (i) game theory, to formalize distributed optimization; and (ii) communication protocols, to track the flow of zeroth and first-order information. The framework allows natural definitions of semantics (as the meaning encoded in functions), representations (as functions whose semantics is chosen to optimized a criterion) and grammars (as communication protocols equipped with first-order convergence guarantees). Much of the essay is spent discussing examples taken from the literature. The ultimate aim is to develop a graphical language for describing the structure of deep learning algorithms that backgrounds the details of the optimization procedure and foregrounds how the components interact. Inspiration is taken from probabilistic graphical models and factor graphs, which capture the essential structural features of multivariate distributions.Comment: 20 pages, many diagram
    corecore