22 research outputs found

    Adaptive Regularization Algorithms with Inexact Evaluations for Nonconvex Optimization

    Get PDF
    A regularization algorithm using inexact function values and inexact derivatives is proposed and its evaluation complexity analyzed. This algorithm is applicable to unconstrained problems and to problems with inexpensive constraints (that is constraints whose evaluation and enforcement has negligible cost) under the assumption that the derivative of highest degree is β\beta-H\"{o}lder continuous. It features a very flexible adaptive mechanism for determining the inexactness which is allowed, at each iteration, when computing objective function values and derivatives. The complexity analysis covers arbitrary optimality order and arbitrary degree of available approximate derivatives. It extends results of Cartis, Gould and Toint (2018) on the evaluation complexity to the inexact case: if a qqth order minimizer is sought using approximations to the first pp derivatives, it is proved that a suitable approximate minimizer within ϵ\epsilon is computed by the proposed algorithm in at most O(ϵp+βpq+β)O(\epsilon^{-\frac{p+\beta}{p-q+\beta}}) iterations and at most O(log(ϵ)ϵp+βpq+β)O(|\log(\epsilon)|\epsilon^{-\frac{p+\beta}{p-q+\beta}}) approximate evaluations. An algorithmic variant, although more rigid in practice, can be proved to find such an approximate minimizer in O(log(ϵ)+ϵp+βpq+β)O(|\log(\epsilon)|+\epsilon^{-\frac{p+\beta}{p-q+\beta}}) evaluations.While the proposed framework remains so far conceptual for high degrees and orders, it is shown to yield simple and computationally realistic inexact methods when specialized to the unconstrained and bound-constrained first- and second-order cases. The deterministic complexity results are finally extended to the stochastic context, yielding adaptive sample-size rules for subsampling methods typical of machine learning.Comment: 32 page

    Estudo de métodos de minimização para um problema black box / Study of minimization methods for a black box problem

    Get PDF
    Este artigo realiza o estudo dos métodos de otimização determinístico, Steepest Descent, e heurístico, Differential Evolution e Particle Swarm, para um problema black box genérico com duas variáveis em sua função objetivo. O método determinístico apresentou forte dependência dos valores iniciais adotados, apresentando diversos mínimos locais, sendo necessário a adoção de múltiplos pontos iniciais. Os métodos Particle Swarm e Differential Evolution apresentam resultados razoáveis, porém o funcionamento dos algoritmos heurísticos impossibilita que o ponto encontrado seja certamente definido como mínimo global

    A note about the complexity of minimizing Nesterov's smooth Chebyshev-Rosenbrock function

    Get PDF
    This short note considers and resolves the apparent contradiction between known worst-case complexity results for first- and second-order methods for solving unconstrained smooth nonconvex optimization problems and a recent note by Jarre [On Nesterov's smooth Chebyshev-Rosenbrock function, Optim. Methods Softw. (2011)] implying a very large lower bound on the number of iterations required to reach the solution's neighbourhood for a specific problem with variable dimension. © 2013 Copyright Taylor and Francis Group, LLC

    Deterministic Nonsmooth Nonconvex Optimization

    Full text link
    We study the complexity of optimizing nonsmooth nonconvex Lipschitz functions by producing (δ,ϵ)(\delta,\epsilon)-stationary points. Several recent works have presented randomized algorithms that produce such points using O~(δ1ϵ3)\tilde O(\delta^{-1}\epsilon^{-3}) first-order oracle calls, independent of the dimension dd. It has been an open problem as to whether a similar result can be obtained via a deterministic algorithm. We resolve this open problem, showing that randomization is necessary to obtain a dimension-free rate. In particular, we prove a lower bound of Ω(d)\Omega(d) for any deterministic algorithm. Moreover, we show that unlike smooth or convex optimization, access to function values is required for any deterministic algorithm to halt within any finite time. On the other hand, we prove that if the function is even slightly smooth, then the dimension-free rate of O~(δ1ϵ3)\tilde O(\delta^{-1}\epsilon^{-3}) can be obtained by a deterministic algorithm with merely a logarithmic dependence on the smoothness parameter. Motivated by these findings, we turn to study the complexity of deterministically smoothing Lipschitz functions. Though there are efficient black-box randomized smoothings, we start by showing that no such deterministic procedure can smooth functions in a meaningful manner, resolving an open question. We then bypass this impossibility result for the structured case of ReLU neural networks. To that end, in a practical white-box setting in which the optimizer is granted access to the network's architecture, we propose a simple, dimension-free, deterministic smoothing that provably preserves (δ,ϵ)(\delta,\epsilon)-stationary points. Our method applies to a variety of architectures of arbitrary depth, including ResNets and ConvNets. Combined with our algorithm, this yields the first deterministic dimension-free algorithm for optimizing ReLU networks, circumventing our lower bound.Comment: This work supersedes arxiv:2209.12463 and arxiv:2209.10346[Section 3], with major additional result
    corecore