25,670 research outputs found

    Beyond Convexity: Stochastic Quasi-Convex Optimization

    Full text link
    Stochastic convex optimization is a basic and well studied primitive in machine learning. It is well known that convex and Lipschitz functions can be minimized efficiently using Stochastic Gradient Descent (SGD). The Normalized Gradient Descent (NGD) algorithm, is an adaptation of Gradient Descent, which updates according to the direction of the gradients, rather than the gradients themselves. In this paper we analyze a stochastic version of NGD and prove its convergence to a global minimum for a wider class of functions: we require the functions to be quasi-convex and locally-Lipschitz. Quasi-convexity broadens the con- cept of unimodality to multidimensions and allows for certain types of saddle points, which are a known hurdle for first-order optimization methods such as gradient descent. Locally-Lipschitz functions are only required to be Lipschitz in a small region around the optimum. This assumption circumvents gradient explosion, which is another known hurdle for gradient descent variants. Interestingly, unlike the vanilla SGD algorithm, the stochastic normalized gradient descent algorithm provably requires a minimal minibatch size

    Deterministic global optimization using space-filling curves and multiple estimates of Lipschitz and Holder constants

    Get PDF
    In this paper, the global optimization problem minySF(y)\min_{y\in S} F(y) with SS being a hyperinterval in N\Re^N and F(y)F(y) satisfying the Lipschitz condition with an unknown Lipschitz constant is considered. It is supposed that the function F(y)F(y) can be multiextremal, non-differentiable, and given as a `black-box'. To attack the problem, a new global optimization algorithm based on the following two ideas is proposed and studied both theoretically and numerically. First, the new algorithm uses numerical approximations to space-filling curves to reduce the original Lipschitz multi-dimensional problem to a univariate one satisfying the H\"{o}lder condition. Second, the algorithm at each iteration applies a new geometric technique working with a number of possible H\"{o}lder constants chosen from a set of values varying from zero to infinity showing so that ideas introduced in a popular DIRECT method can be used in the H\"{o}lder global optimization. Convergence conditions of the resulting deterministic global optimization method are established. Numerical experiments carried out on several hundreds of test functions show quite a promising performance of the new algorithm in comparison with its direct competitors.Comment: 26 pages, 10 figures, 4 table

    Solving non-monotone equilibrium problems via a DIRECT-type approach

    Full text link
    A global optimization approach for solving non-monotone equilibrium problems (EPs) is proposed. The class of (regularized) gap functions is used to reformulate any EP as a constrained global optimization program and some bounds on the Lipschitz constant of such functions are provided. The proposed global optimization approach is a combination of an improved version of the \texttt{DIRECT} algorithm, which exploits local bounds of the Lipschitz constant of the objective function, with local minimizations. Unlike most existing solution methods for EPs, no monotonicity-type condition is assumed in this paper. Preliminary numerical results on several classes of EPs show the effectiveness of the approach.Comment: Technical Report of Department of Computer Science, University of Pisa, Ital

    Extended cutting angle method of global optimization

    Full text link
    Methods of Lipschitz optimization allow one to find and confirm the global minimum of multivariate Lipschitz functions using a finite number of function evaluations. This paper extends the Cutting Angle method, in which the optimization problem is solved by building a sequence of piecewise linear underestimates of the objective function. We use a more flexible set of support functions, which yields a better underestimate of a Lipschitz objective function. An efficient algorithm for enumeration of all local minima of the underestimate is presented, along with the results of numerical experiments. One dimensional Pijavski-Shubert method arises as a special case of the proposed approach.<br /

    Index Information Algorithm with Local Tuning for Solving Multidimensional Global Optimization Problems with Multiextremal Constraints

    Full text link
    Multidimensional optimization problems where the objective function and the constraints are multiextremal non-differentiable Lipschitz functions (with unknown Lipschitz constants) and the feasible region is a finite collection of robust nonconvex subregions are considered. Both the objective function and the constraints may be partially defined. To solve such problems an algorithm is proposed, that uses Peano space-filling curves and the index scheme to reduce the original problem to a H\"{o}lder one-dimensional one. Local tuning on the behaviour of the objective function and constraints is used during the work of the global optimization procedure in order to accelerate the search. The method neither uses penalty coefficients nor additional variables. Convergence conditions are established. Numerical experiments confirm the good performance of the technique.Comment: 29 pages, 5 figure

    Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

    Full text link
    In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in O(1/t)O(1/\sqrt{t}), the structure of the communication network only impacts a second-order term in O(1/t)O(1/t), where tt is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a d1/4d^{1/4} multiplicative factor of the optimal convergence rate, where dd is the underlying dimension.Comment: 17 page
    corecore