Search CORE

6,280 research outputs found

Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond

Author: Hinder Oliver
Sidford Aaron
Sohoni Nimit Sharad
Publication venue
Publication date: 27/06/2019
Field of study

In this paper, we provide near-optimal accelerated first-order methods for minimizing a broad class of smooth nonconvex functions that are strictly unimodal on all lines through a minimizer. This function class, which we call the class of smooth quasar-convex functions, is parameterized by a constant

\gamma \in (0,1]

, where

\gamma = 1

encompasses the classes of smooth convex and star-convex functions, and smaller values of

\gamma

indicate that the function can be "more nonconvex." We develop a variant of accelerated gradient descent that computes an

\epsilon

-approximate minimizer of a smooth

\gamma

-quasar-convex function with at most

O(\gamma^{-1} \epsilon^{-1/2} \log(\gamma^{-1} \epsilon^{-1}))

total function and gradient evaluations. We also derive a lower bound of

\Omega(\gamma^{-1} \epsilon^{-1/2})

on the number of gradient evaluations required by any deterministic first-order method in the worst case, showing that, up to a logarithmic factor, no deterministic first-order algorithm can improve upon ours.Comment: 37 page

arXiv.org e-Print Archive

Quasiconvex Programming

Author: Eppstein David
Publication venue
Publication date: 01/01/2003
Field of study

We define quasiconvex programming, a form of generalized linear programming in which one seeks the point minimizing the pointwise maximum of a collection of quasiconvex functions. We survey algorithms for solving quasiconvex programs either numerically or via generalizations of the dual simplex method from linear programming, and describe varied applications of this geometric optimization technique in meshing, scientific computation, information visualization, automated algorithm analysis, and robust statistics.Comment: 33 pages, 14 figure

arXiv.org e-Print Archive

CiteSeerX

A Distributed Frank-Wolfe Algorithm for Communication-Efficient Sparse Learning

Author: Balcan Maria-Florina
Bellet Aurélien
Garakani Alireza Bagheri
Liang Yingyu
Sha Fei
Publication venue
Publication date: 12/01/2015
Field of study

Learning sparse combinations is a frequent theme in machine learning. In this paper, we study its associated optimization problem in the distributed setting where the elements to be combined are not centrally located but spread over a network. We address the key challenges of balancing communication costs and optimization errors. To this end, we propose a distributed Frank-Wolfe (dFW) algorithm. We obtain theoretical guarantees on the optimization error

\epsilon

and communication cost that do not depend on the total number of combining elements. We further show that the communication cost of dFW is optimal by deriving a lower-bound on the communication cost required to construct an

\epsilon

-approximate solution. We validate our theoretical analysis with empirical studies on synthetic and real-world data, which demonstrate that dFW outperforms both baselines and competing methods. We also study the performance of dFW when the conditions of our analysis are relaxed, and show that dFW is fairly robust.Comment: Extended version of the SIAM Data Mining 2015 pape

arXiv.org e-Print Archive

Crossref

Approximating gradients with continuous piecewise polynomial functions

Author: Veeser Andreas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/02/2014
Field of study

Motivated by conforming finite element methods for elliptic problems of second order, we analyze the approximation of the gradient of a target function by continuous piecewise polynomial functions over a simplicial mesh. The main result is that the global best approximation error is equivalent to an appropriate sum in terms of the local best approximations errors on elements. Thus, requiring continuity does not downgrade local approximability and discontinuous piecewise polynomials essentially do not offer additional approximation power, even for a fixed mesh. This result implies error bounds in terms of piecewise regularity over the whole admissible smoothness range. Moreover, it allows for simple local error functionals in adaptive tree approximation of gradients.Comment: 21 pages, 1 figur

arXiv.org e-Print Archive

AIR Universita degli studi di Milano