68 research outputs found
PDFO: A Cross-Platform Package for Powell's Derivative-Free Optimization Solvers
The late Professor M. J. D. Powell devised five trust-region derivative-free
optimization methods, namely COBYLA, UOBYQA, NEWUOA, BOBYQA, and LINCOA. He
also carefully implemented them into publicly available solvers, which are
renowned for their robustness and efficiency. However, the solvers were
implemented in Fortran 77 and hence may not be easily accessible to some users.
We introduce the PDFO package, which provides user-friendly Python and MATLAB
interfaces to Powell's code. With PDFO, users of such languages can call
Powell's Fortran solvers easily without dealing with the Fortran code.
Moreover, PDFO includes bug fixes and improvements, which are particularly
important for handling problems that suffer from ill-conditioning or failures
of function evaluations. In addition to the PDFO package, we provide an
overview of Powell's methods, sketching them from a uniform perspective,
summarizing their main features, and highlighting the similarities and
interconnections among them. We also present experiments on PDFO to demonstrate
its stability under noise, tolerance of failures in function evaluations, and
potential in solving certain hyperparameter optimization problems
Probabilistic Approaches to Stochastic Optimization
Optimization is a cardinal concept in the sciences, and viable algorithms of utmost importance as tools
for finding the solution to an optimization problem. Empirical risk minimization is a major workhorse,
in particular in machine learning applications, where an input-target relation is learned in a supervised
manner. Empirical risks with high-dimensional inputs are mostly optimized by greedy, gradient-based,
and possibly stochastic optimization routines, such as stochastic gradient descent.
Though popular, and practically successful, this setup has major downsides which often makes it
finicky to work with, or at least the bottleneck in a larger chain of learning procedures. For instance,
typical issues are:
• Overfitting of a parametrized model to the data. This generally leads to poor generalization performance
on unseen data.
• Tuning of algorithmic parameters, such as learning rates, is tedious, inefficient, and costly.
• Stochastic losses and gradients occur due to sub-sampling of a large dataset. They only yield
incomplete, or corrupted information about the empirical risk, and are thus difficult to handle from a
decision making point of view.
This thesis consist of four conceptual parts.
In the first one, we argue that conditional distributions of local full and mini-batch evaluations of
losses and gradients can be well approximated by Gaussian distributions, since the losses themselves
are sums of independently and identically distributed random variables. We then provide a way of
estimating the corresponding sufficient statistics, i. e., variances and means, with low computational
overhead. This yields an analytic likelihood for the loss and gradient at every point of the inputs space,
which subsequently can be incorporated into active decision making at run-time of the optimizer.
The second part focuses on estimating generalization performance, not by monitoring a validation
loss, but by assessing if stochastic gradients can be fully explained by noise that occurs due to the
finiteness of the training dataset, and not due to an informative gradient direction of the expected loss
(risk). This yields a criterion for early-stopping where no validation set is needed, and the full dataset
can be used for training.
The third part is concerned with fully automated learning rate adaption for stochastic gradient descent
(SGD). Global learning rates are arguably the most exposed manual tuning parameters of stochastic
optimization routines. We propose a cheap and self-contained sub-routine, called a ‘probabilistic
line search’ that automatically adapts the learning rate in every step, based on a local probability of
descent. The result is an entirely parameter-free, stochastic optimizer that reaches comparable or better
generalization performances than SGD with a carefully hand-tuned learning rate on the tested problems.
The last part deals with noise-robust search directions. Inspired by classic first- and second-order
methods, we model the unknown dynamics of the gradient or Hessian-function on the optimization
path. The approach has strong connections to classic filtering frameworks and can incorporate noise-corrupted
evaluations of the gradient at successive locations. The benefits are twofold. Firstly, we gain
valuable insight on less accessible or ad-hoc design choices of classic optimizer as special cases. Secondly,
we provide the basis for a flexible, self-contained, and easy-to-use class of stochastic optimizers that
exhibit a higher degree of robustness and automation.Optimierung ist ein grundlegendes Prinzip in denWissenschaften, und Algorithmen zu deren Lösung
von großer praktischer Bedeutung. Empirische Risikominimierung ist ein gängiges Modell, vor allem in
Anwendungen des Maschinellen Lernens, in denen eine Eingabe-Ausgabe Relation überwacht gelernt
wird. Empirische Risiken mit hoch-dimensionalen Eingaben werden meist durch gierige, gradientenbasierte,
und möglicherweise stochastische Routinen optimiert, so wie beispielsweise der stochastische
Gradientenabstieg.
Obwohl dieses Konzept populär als auch erfolgreich in der Praxis ist, hat es doch beträchtliche
Nachteile, die es entweder aufwendig machen damit zu arbeiten, oder verlangsamen, sodass es den
Engpass in einer größeren Kette von Lernprozessen darstellen kann. Typische Verhalten sind zum
Beispiel:
• Überanpassung eines parametrischen Modells an die Daten. Dies führt oft zu schlechterer Generalisierungsleistung
auf ungesehenen Daten.
• Die manuelle Anpassung von algorithmischen Parametern, wie zum Beispiel Lernraten ist oft mühsam,
ineffizient und kostspielig.
• Stochastische Verluste und Gradienten treten auf, wenn Zufallsstichproben anstelle eines ganzen
großen Datensatzes für deren Berechnung benutzt wird. Erstere stellen nur inkomplette, oder korrupte
Information über das empirische Risiko dar und sind deshalb schwieriger zu handhaben, wenn ein
Algorithmus Entscheidungen treffen soll.
Diese Arbeit enthält vier konzeptionelle Teile.
Im ersten Teil argumentieren wir, dass bedingte Verteilungen von lokalen Voll- und Mini-Batch Verlusten
und deren Gradienten gut mit Gaußverteilungen approximiert werden können, da die Verluste
selbst Summen aus unabhängig und identisch verteilten Zufallsvariablen sind. Wir stellen daraufhin
dar, wie man die suffizienten Statistiken, also Varianzen und Mittelwerte, mit geringem zusätzlichen
Rechenaufwand schätzen kann. Dies führt zu analytischen Likelihood-Funktionen für Verlust und Gradient
an jedem Eingabepunkt, die daraufhin in aktive Entscheidungen des Optimierer zur Laufzeit
einbezogen werden können.
Der zweite Teil konzentriert sich auf die Schätzung der Generalisierungsleistung nicht indem der
Verlust eines Validierungsdatensatzes überwacht wird, sondern indem beurteilt wird, ob stochastische
Gradienten vollständig durch Rauschen aufgrund der Endlichkeit des Trainingsdatensatzes und nicht
durch eine informative Gradientenrichtung des erwarteten Verlusts (des Risikos), erklärt werden können.
Daraus wird ein Early-Stopping Kriterium abgeleitet, das keinen Validierungsdatensatz benötigt,
sodass der komplette Datensatz für das Training verwendet werden kann.
Der dritte Teil betrifft die vollständige Automatisierung der Adaptierung von Lernraten für den stochastischen
Gradientenabstieg (SGD). Globale Lernraten sind wohl die prominentesten Parameter von
stochastischen Optimierungsroutinen, die manuell angepasst werden müssenWir stellen eine günstige
und eigenständige Subroutine vor, genannt ’Probabilistic Line Search’, die automatisch die Lernrate
in jedem Schritt, basierend auf einer lokalen Abstiegswahrscheinlichkeit, anpasst. Das Ergebnis ist ein
vollständig parameterfreier stochastischer Optimierer, der vergleichbare oder bessere Generalisierungsleistung
wie SGD mit sorgfältig von Hand eingestellten Lernraten erbringt.
Der letzte Teil beschäftigt sich mit Suchrichtungen, die robust gegenüber Rauschen sind. Inspiriert von
klassischen Optimierern erster und zweiter Ordnung, modellieren wir die Dynamik der Gradienten oder
Hesse-Funktion auf dem Optimierungspfad. Dieser Ansatz ist stark verwandt mit klassischen
Filter-Modellen, die aufeinanderfolgende verrauschte Gradienten berücksichtigen können Die Vorteile
sind zweifältig. Zunächst gewinnen wir wertvolle Einsichten in weniger zugängliche oder ad hoc
gewählte Designs klassischer Optimierer als Spezialfälle. Zweitens bereiten wir die Basis für flexible,
eigenständige und nutzerfreundliche stochastische Optimierer mit einem erhöhten Grad an Robustheit
und Automatisierung
Méthodes sans factorisation pour l’optimisation non linéaire
RÉSUMÉ : Cette thèse a pour objectif de formuler mathématiquement, d'analyser et d'implémenter deux méthodes sans factorisation pour l'optimisation non linéaire. Dans les problèmes de grande taille, la jacobienne des contraintes n'est souvent pas disponible sous forme de matrice; seules son action et celle de sa transposée sur un vecteur le sont. L'optimisation sans factorisation consiste alors à utiliser des opérateurs linéaires abstraits représentant la jacobienne ou le hessien. De ce fait, seules les actions > sont autorisées et l'algèbre linéaire directe doit être remplacée par des méthodes itératives. Outre ces restrictions, une grande difficulté lors de l'introduction de méthodes sans factorisation dans des algorithmes d'optimisation concerne le contrôle de l'inexactitude de la résolution des systèmes linéaires. Il faut en effet s'assurer que la direction calculée est suffisamment précise pour garantir la convergence de l'algorithme concerné. En premier lieu, nous décrivons l'implémentation sans factorisation d'une méthode de lagrangien augmenté pouvant utiliser des approximations quasi-Newton des dérivées secondes. Nous montrons aussi que notre approche parvient à résoudre des problèmes d'optimisation de structure avec des milliers de variables et contraintes alors que les méthodes avec factorisation échouent. Afin d'obtenir une méthode possédant une convergence plus rapide, nous présentons ensuite un algorithme qui utilise un lagrangien augmenté proximal comme fonction de mérite et qui, asymptotiquement, se transforme en une méthode de programmation quadratique séquentielle stabilisée. L'utilisation d'approximations BFGS à mémoire limitée du hessien du lagrangien conduit à l'obtention de systèmes linéaires symétriques quasi-définis. Ceux-ci sont interprétés comme étant les conditions d'optimalité d'un problème aux moindres carrés linéaire, qui est résolu de manière inexacte par une méthode de Krylov. L'inexactitude de cette résolution est contrôlée par un critère d'arrêt facile à mettre en œuvre. Des tests numériques démontrent l'efficacité et la robustesse de notre méthode, qui se compare très favorablement à IPOPT, en particulier pour les problèmes dégénérés pour lesquels la LICQ n'est pas respectée à la solution ou lors de la minimisation. Finalement, l'écosystème de développement d'algorithmes d'optimisation en Python, baptisé NLP.py, est exposé. Cet environnement s'adresse aussi bien aux chercheurs en optimisation qu'aux étudiants désireux de découvrir ou d'approfondir l'optimisation. NLP.py donne accès à un ensemble de blocs constituant les éléments les plus importants des méthodes d'optimisation continue. Grâce à ceux-ci, le chercheur est en mesure d'implémenter son algorithme en se concentrant sur la logique de celui-ci plutôt que sur les subtilités techniques de son implémentation.----------ABSTRACT : This thesis focuses on the mathematical formulation, analysis and implementation of two factorization-free methods for nonlinear constrained optimization. In large-scale optimization, the Jacobian of the constraints may not be available in matrix form; only its action and that of its transpose on a vector are. Factorization-free optimization employs abstract linear operators representing the Jacobian or Hessian matrices. Therefore, only operator-vector products are allowed and direct linear algebra is replaced by iterative methods. Besides these implementation restrictions, a difficulty inherent to methods without factorization in optimization algorithms is the control of the inaccuracy in linear system solves. Indeed, we have to guarantee that the direction calculated is sufficiently accurate to ensure convergence. We first describe a factorization-free implementation of a classical augmented Lagrangian method that may use quasi-Newton second derivatives approximations. This method is applied to problems with thousands of variables and constraints coming from aircraft structural design optimization, for which methods based on factorizations fail. Results show that it is a viable approach for these problems. In order to obtain a method with a faster convergence rate, we present an algorithm that uses a proximal augmented Lagrangian as merit function and that asymptotically turns in a stabilized sequential quadratic programming method. The use of limited-memory BFGS approximations of the Hessian of the Lagrangian combined with regularization of the constraints leads to symmetric quasi-definite linear systems. Because such systems may be interpreted as the KKT conditions of linear least-squares problems, they can be efficiently solved using an appropriate Krylov method. Inaccuracy of their solutions is controlled by a stopping criterion which is easy to implement. Numerical tests demonstrate the effectiveness and robustness of our method, which compares very favorably with IPOPT, especially for degenerate problems for which LICQ is not satisfied at the optimal solution or during the minimization process. Finally, an ecosystem for optimization algorithm development in Python, code-named NLP.py, is exposed. This environment is aimed at researchers in optimization and students eager to discover or strengthen their knowledge in optimization. NLP.py provides access to a set of building blocks constituting the most important elements of continuous optimization methods. With these blocks, users are able to implement their own algorithm focusing on the logic of the algorithm rather than on the technicalities of its implementation
Towards Reduced-order Model Accelerated Optimization for Aerodynamic Design
The adoption of mathematically formal simulation-based optimization approaches within aerodynamic design depends upon a delicate balance of affordability and accessibility. Techniques are needed to accelerate the simulation-based optimization process, but they must remain approachable enough for the implementation time to not eliminate the cost savings or act as a barrier to adoption.
This dissertation introduces a reduced-order model technique for accelerating fixed-point iterative solvers (e.g. such as those employed to solve primal equations, sensitivity equations, design equations, and their combination). The reduced-order model-based acceleration technique collects snapshots of early iteration (pre-convergent) solutions and residuals and then uses them to project to significantly more accurate solutions, i.e. smaller residual. The technique can be combined with other convergence schemes like multigrid and adaptive timestepping. The technique is generalizable and in this work is demonstrated to accelerate steady and unsteady flow solutions; continuous and discrete adjoint sensitivity solutions; and one-shot design optimization solutions. This final application, reduced-order model accelerated one-shot optimization approach, in particular represents a step towards more efficient aerodynamic design optimization.
Through this series of applications, different basis vectors were considered and best practices for snapshot collection procedures were outlined. The major outcome of this dissertation is the development and demonstration of this reduced-order model acceleration technique. This work includes the first application of the reduced-order model-based acceleration method to an explicit one-shot iterative optimization process
The preliminary SOL (Sizing and Optimization Language) reference manual
The Sizing and Optimization Language, SOL, a high-level special-purpose computer language has been developed to expedite application of numerical optimization to design problems and to make the process less error-prone. This document is a reference manual for those wishing to write SOL programs. SOL is presently available for DEC VAX/VMS systems. A SOL package is available which includes the SOL compiler and runtime library routines. An overview of SOL appears in NASA TM 100565
Thermal/Structural Tailoring of Engine Blades (T/STAEBL) User's manual
The Thermal/Structural Tailoring of Engine Blades (T/STAEBL) system is a computer code that is able to perform numerical optimizations of cooled jet engine turbine blades and vanes. These optimizations seek an airfoil design of minimum operating cost that satisfies realistic design constraints. This report documents the organization of the T/STAEBL computer program, its design and analysis procedure, its optimization procedure, and provides an overview of the input required to run the program, as well as the computer resources required for its effective use. Additionally, usage of the program is demonstrated through a validation test case
Study of hybrid strategies for multi-objective optimization using gradient based methods and evolutionary algorithms
Most of the optimization problems encountered in engineering have conflicting objectives. In order to solve these problems, genetic algorithms (GAs) and gradient-based methods are widely used. GAs are relatively easy to implement, because these algorithms only require first-order information of the objectives and constraints. On the other hand, GAs do not have a standard termination condition and therefore they may not converge to the exact solutions. Gradient-based methods, on the other hand, are based on first- and higher-order information of the objectives and constraints. These algorithms converge faster to the exact solutions in solving single-objective optimization problems, but are inefficient for multi-objective optimization problems (MOOPs) and unable to solve those with non-convex objective spaces.
The work in this dissertation focuses on developing a hybrid strategy for solving MOOPs based on feasible sequential quadratic programming (FSQP) and nondominated sorting genetic algorithm II (NSGA-II). The hybrid algorithms developed in this dissertation are tested using benchmark problems and evaluated based on solution distribution, solution accuracy, and execution time. Based on these performance factors, the best hybrid strategy is determined and found to be generally efficient with good solution distributions in most of the cases studied. The best hybrid algorithm is applied to the design of a crushing tube and is shown to have relatively well-distributed solutions and good efficiency compared to solutions obtained by NSGA-II and FSQP alone
On the use of Neural Networks to solve Differential Equations
[EN]Artificial neural networks are parametric models, generally adjusted to solve regression and classification problem. For a long time, a question has laid around regarding the possibility of using these types of models to approximate the solutions of initial and boundary value problems, as a means for numerical integration. Recent improvements in deep-learning have made this approach much attainable, and integration methods based on training (fitting) artificial neural networks have begin to spring, motivated mostly by their mesh-free nature and scalability to high dimensions. In this work, we go all the way from the most basic elements, such as the definition of artificial neural networks and well-posedness of the problems, to solving several linear and quasi-linear PDEs using this approach. Throughout this work we explain general theory concerning artificial neural networks, including topics such as vanishing gradients, non-convex optimization or regularization, and we adapt them to better suite the initial and boundary value problems nature. Some of the original contributions in this work include: an analysis of the vanishing gradient problem with respect to the input derivatives, a custom regularization technique based on the network’s parameters derivatives, and a method to rescale the subgradients of the multi-objective of the loss function used to optimize the network.[ES]Las redes neuronales son modelos paramétricos generalmente usados para resolver problemas de regresiones y clasificación. Durante bastante tiempo ha rondado la pregunta de si es posible usar este tipo de modelos para aproximar soluciones de problemas de valores iniciales y de contorno, como un medio de integración numérica. Los cambios recientes en deep-learning han hecho este enfoque más viable, y métodos basados en entrenar (ajustar) redes neuronales han empezado a surgir motivados por su no necesidad de un mallado y su buena escalabilidad a altas dimensiones. En este trabajo, vamos desde los elementos más básicos, como la definición de una red neuronal o la buena definición de los problemas, hasta ser capaces de resolver diversas EDPs lineales y casi-lineales. A lo largo del trabajo explicamos la teorÃa general relacionada con redes neuronales, que incluyen tópicos como los problemas de desvanecimiento de gradientes (vanishing gradient), optimización no-convexa y técnicas de regularización, y los adaptamos a la naturaleza de los problemas de valores iniciales y de contorno. Algunas de las contribuciones originales de este trabajo incluyen: un análisis del desvanecimiento de gradientes con respecto a las variables de entrada, una técnica de regularización customizada basada en las derivadas de los parámetros de la red neuronal, y un método para rescalar los subgradientes de la función de coste multi-objectivo usada para optimizar la red neuronal
- …