6,280 research outputs found

    Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond

    Full text link
    In this paper, we provide near-optimal accelerated first-order methods for minimizing a broad class of smooth nonconvex functions that are strictly unimodal on all lines through a minimizer. This function class, which we call the class of smooth quasar-convex functions, is parameterized by a constant γ(0,1]\gamma \in (0,1], where γ=1\gamma = 1 encompasses the classes of smooth convex and star-convex functions, and smaller values of γ\gamma indicate that the function can be "more nonconvex." We develop a variant of accelerated gradient descent that computes an ϵ\epsilon-approximate minimizer of a smooth γ\gamma-quasar-convex function with at most O(γ1ϵ1/2log(γ1ϵ1))O(\gamma^{-1} \epsilon^{-1/2} \log(\gamma^{-1} \epsilon^{-1})) total function and gradient evaluations. We also derive a lower bound of Ω(γ1ϵ1/2)\Omega(\gamma^{-1} \epsilon^{-1/2}) on the number of gradient evaluations required by any deterministic first-order method in the worst case, showing that, up to a logarithmic factor, no deterministic first-order algorithm can improve upon ours.Comment: 37 page

    Quasiconvex Programming

    Full text link
    We define quasiconvex programming, a form of generalized linear programming in which one seeks the point minimizing the pointwise maximum of a collection of quasiconvex functions. We survey algorithms for solving quasiconvex programs either numerically or via generalizations of the dual simplex method from linear programming, and describe varied applications of this geometric optimization technique in meshing, scientific computation, information visualization, automated algorithm analysis, and robust statistics.Comment: 33 pages, 14 figure

    A Distributed Frank-Wolfe Algorithm for Communication-Efficient Sparse Learning

    Full text link
    Learning sparse combinations is a frequent theme in machine learning. In this paper, we study its associated optimization problem in the distributed setting where the elements to be combined are not centrally located but spread over a network. We address the key challenges of balancing communication costs and optimization errors. To this end, we propose a distributed Frank-Wolfe (dFW) algorithm. We obtain theoretical guarantees on the optimization error ϵ\epsilon and communication cost that do not depend on the total number of combining elements. We further show that the communication cost of dFW is optimal by deriving a lower-bound on the communication cost required to construct an ϵ\epsilon-approximate solution. We validate our theoretical analysis with empirical studies on synthetic and real-world data, which demonstrate that dFW outperforms both baselines and competing methods. We also study the performance of dFW when the conditions of our analysis are relaxed, and show that dFW is fairly robust.Comment: Extended version of the SIAM Data Mining 2015 pape

    Approximating gradients with continuous piecewise polynomial functions

    Get PDF
    Motivated by conforming finite element methods for elliptic problems of second order, we analyze the approximation of the gradient of a target function by continuous piecewise polynomial functions over a simplicial mesh. The main result is that the global best approximation error is equivalent to an appropriate sum in terms of the local best approximations errors on elements. Thus, requiring continuity does not downgrade local approximability and discontinuous piecewise polynomials essentially do not offer additional approximation power, even for a fixed mesh. This result implies error bounds in terms of piecewise regularity over the whole admissible smoothness range. Moreover, it allows for simple local error functionals in adaptive tree approximation of gradients.Comment: 21 pages, 1 figur
    corecore