38 research outputs found

    Discrepancy-based Evolutionary Diversity Optimization

    Get PDF
    Diversity plays a crucial role in evolutionary computation. While diversity has been mainly used to prevent the population of an evolutionary algorithm from premature convergence, the use of evolutionary algorithms to obtain a diverse set of solutions has gained increasing attention in recent years. Diversity optimization in terms of features on the underlying problem allows to obtain a better understanding of possible solutions to the problem at hand and can be used for algorithm selection when dealing with combinatorial optimization problems such as the Traveling Salesperson Problem. We explore the use of the star-discrepancy measure to guide the diversity optimization process of an evolutionary algorithm. In our experimental investigations, we consider our discrepancy-based diversity optimization approaches for evolving diverse sets of images as well as instances of the Traveling Salesperson problem where a local search is not able to find near optimal solutions. Our experimental investigations comparing three diversity optimization approaches show that a discrepancy-based diversity optimization approach using a tie-breaking rule based on weighted differences to surrounding feature points provides the best results in terms of the star discrepancy measure

    Efficient local search for Pseudo Boolean Optimization

    Get PDF
    Algorithms and the Foundations of Software technolog

    Understanding Deep Learning Optimization via Benchmarking and Debugging

    Get PDF
    Das zentrale Prinzip des maschinellen Lernens (ML) ist die Vorstellung, dass Computer die notwendigen Strategien zur Lösung einer Aufgabe erlernen können, ohne explizit dafür programmiert worden zu sein. Die Hoffnung ist, dass Computer anhand von Daten die zugrunde liegenden Muster erkennen und selbst feststellen, wie sie Aufgaben erledigen können, ohne dass sie dabei von Menschen geleitet werden müssen. Um diese Aufgabe zu erfüllen, werden viele Probleme des maschinellen Lernens als Minimierung einer Verlustfunktion formuliert. Daher sind Optimierungsverfahren ein zentraler Bestandteil des Trainings von ML-Modellen. Obwohl das maschinelle Lernen und insbesondere das tiefe Lernen oft als innovative Spitzentechnologie wahrgenommen wird, basieren viele der zugrunde liegenden Optimierungsalgorithmen eher auf simplen, fast archaischen Verfahren. Um moderne neuronale Netze erfolgreich zu trainieren, bedarf es daher häufig umfangreicher menschlicher Unterstützung. Ein Grund für diesen mühsamen, umständlichen und langwierigen Trainingsprozess ist unser mangelndes Verständnis der Optimierungsmethoden im anspruchsvollen Rahmen des tiefen Lernens. Auch deshalb hat das Training neuronaler Netze bis heute den Ruf, eher eine Kunstform als eine echte Wissenschaft zu sein und erfordert ein Maß an menschlicher Beteiligung, welche dem Kernprinzip des maschinellen Lernens widerspricht. Obwohl bereits Hunderte Optimierungsverfahren für das tiefe Lernen vorgeschlagen wurden, gibt es noch kein allgemein anerkanntes Protokoll zur Beurteilung ihrer Qualität. Ohne ein standardisiertes und unabhängiges Bewertungsprotokoll ist es jedoch schwierig, die Nützlichkeit neuartiger Methoden zuverlässig nachzuweisen. In dieser Arbeit werden Strategien vorgestellt, mit denen sich Optimierer für das tiefe Lernen quantitativ, reproduzierbar und aussagekräftig vergleichen lassen. Dieses Protokoll berücksichtigt die einzigartigen Herausforderungen des tiefen Lernens, wie etwa die inhärente Stochastizität oder die wichtige Unterscheidung zwischen Lernen und reiner Optimierung. Die Erkenntnisse sind im Python-Paket DeepOBS formalisiert und automatisiert, wodurch gerechtere, schnellere und überzeugendere empirische Vergleiche von Optimierern ermöglicht werden. Auf der Grundlage dieses Benchmarking-Protokolls werden anschließend fünfzehn populäre Deep-Learning-Optimierer verglichen, um einen Überblick über den aktuellen Entwicklungsstand in diesem Bereich zu gewinnen. Um fundierte Entscheidungshilfen für die Auswahl einer Optimierungsmethode aus der wachsenden Liste zu erhalten, evaluiert der Benchmark sie umfassend anhand von fast 50 000 Trainingsprozessen. Unser Benchmark zeigt, dass der vergleichsweise traditionelle Adam-Optimierer eine gute, aber nicht dominierende Methode ist und dass neuere Algorithmen ihn nicht kontinuierlich übertreffen können. Neben dem verwendeten Optimierer können auch andere Ursachen das Training neuronaler Netze erschweren, etwa ineffiziente Modellarchitekturen oder Hyperparameter. Herkömmliche Leistungsindikatoren, wie etwa die Verlustfunktion auf den Trainingsdaten oder die erreichte Genauigkeit auf einem separaten Validierungsdatensatz, können zwar zeigen, ob das Modell lernt oder nicht, aber nicht warum. Um dieses Verständnis und gleichzeitig einen Blick in die Blackbox der neuronalen Netze zu liefern, wird in dieser Arbeit Cockpit präsentiert, ein Debugging-Tool speziell für das tiefe Lernen. Es kombiniert neuartige und bewährte Observablen zu einem Echtzeit-Überwachungswerkzeug für das Training neuronaler Netze. Cockpit macht unter anderem deutlich,dass gut getunte Trainingsprozesse konsequent über das lokale Minimum hinausgehen, zumindest für wesentliche Phasen des Trainings. Der Einsatz von sorgfältigen Benchmarking-Experimenten und maßgeschneiderten Debugging-Tools verbessert unser Verständnis des Trainings neuronaler Netze. Angesichts des Mangels an theoretischen Erkenntnissen sind diese empirischen Ergebnisse und praktischen Instrumente unerlässlich für die Unterstützung in der Praxis. Vor allem aber zeigen sie auf, dass es einen Bedarf und einen klaren Weg für grundlegend neuartigen Optimierungsmethoden gibt, um das tiefe Lernen zugänglicher, robuster und ressourcenschonender zu machen.The central paradigm of machine learning (ML) is the idea that computers can learn the strategies needed to solve a task without being explicitly programmed to do so. The hope is that given data, computers can recognize underlying patterns and figure out how to perform tasks without extensive human oversight. To achieve this, many machine learning problems are framed as minimizing a loss function, which makes optimization methods a core part of training ML models. Machine learning and in particular deep learning is often perceived as a cutting-edge technology, the underlying optimization algorithms, however, tend to resemble rather simplistic, even archaic methods. Crucially, they rely on extensive human intervention to successfully train modern neural networks. One reason for this tedious, finicky, and lengthy training process lies in our insufficient understanding of optimization methods in the challenging deep learning setting. As a result, training neural nets, to this day, has the reputation of being more of an art form than a science and requires a level of human assistance that runs counter to the core principle of ML. Although hundreds of optimization algorithms for deep learning have been proposed, there is no widely agreed-upon protocol for evaluating their performance. Without a standardized and independent evaluation protocol, it is difficult to reliably demonstrate the usefulness of novel methods. In this thesis, we present strategies for quantitatively and reproducibly comparing deep learning optimizers in a meaningful way. This protocol considers the unique challenges of deep learning such as the inherent stochasticity or the crucial distinction between learning and pure optimization. It is formalized and automatized in the Python package DeepOBS and allows fairer, faster, and more convincing empirical comparisons of deep learning optimizers. Based on this benchmarking protocol, we compare fifteen popular deep learning optimizers to gain insight into the field’s current state. To provide evidence-backed heuristics for choosing among the growing list of optimization methods, we extensively evaluate them with roughly 50,000 training runs. Our benchmark indicates that the comparably traditional Adam optimizer remains a strong but not dominating contender and that newer methods fail to consistently outperform it. In addition to the optimizer, other causes can impede neural network training, such as inefficient model architectures or hyperparameters. Traditional performance metrics, such as training loss or validation accuracy, can show if a model is learning or not, but not why. To provide this understanding and a glimpse into the black box of neural networks, we developed Cockpit, a debugging tool specifically for deep learning. It combines novel and proven observables into a live monitoring tool for practitioners. Among other findings, Cockpit reveals that well-tuned training runs consistently overshoot the local minimum, at least for significant portions of the training. The use of thorough benchmarking experiments and tailored debugging tools improves our understanding of neural network training. In the absence of theoretical insights, these empirical results and practical tools are essential for guiding practitioners. More importantly, our results show that there is a need and a clear path for fundamentally different optimization methods to make deep learning more accessible, robust, and resource-efficient

    METAMODEL-BASED GLOBAL OPTIMIZATION AND GENERALIZED NASH EQUILIBRIUM PROBLEMS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Probabilistic Linear Algebra for Stochastic Optimization

    Get PDF
    The emergent field of machine learning has by now become the main proponent of data-driven discovery. Yet, with ever more data, it is also faced with new computational challenges. To make machines "learn", the desired task is oftentimes phrased as an empirical risk minimization problem that needs to be solved by numerical optimization routines. Optimization in ML deviates from the scope of traditional optimization in two regards. First, ML deals with large datasets that need to be subsampled to reduce the computational burden, inadvertently introducing noise into the optimization procedure. The second distinction is the sheer size of the parameter space which severely limits the amount of information that optimization algorithms store. Both aspects together have made first-order optimization routines a prevalent choice for model training in ML. First-order algorithms use only gradient information to determine a step direction and step length to update the parameters. Inclusion of second-order information about the local curvature has a great potential to improve the performance of the optimizer if done efficiently. Probabilistic curvature estimation for use in optimization is a recurring theme of this thesis and the problem is explored in three different directions that are relevant to ML training. By iteratively adapting the scale of an arbitrary curvature estimate it is possible to circumvent the tedious work of manually tuning the optimizer’s step length during model training. The general form of the curvature estimate naturally extends its applicability to various popular optimization algorithms. Curvature can also be inferred with matrix-variate distributions by projections of the curvature matrix. Noise can then be captured by a likelihood with non-vanishing width, leading to a novel update strategy that uses the inherent uncertainty to estimate the curvature. Finally, a new form of curvature estimate is derived from gradient observations of a nonparametric model. It expands the family of viable curvature estimates used in optimization. An important outcome of the research is to highlight the benefit of utilizing curvature information in stochastic optimization. By considering multiple ways of efficiently leveraging second-order information, the thesis advances the frontier of stochastic optimization and unlocks new avenues for research on the training of large scale ML models

    A Rigorous Runtime Analysis for Quasi-Random Restarts and Decreasing Stepsize

    Get PDF
    International audienceMulti-Modal Optimization (MMO) is ubiquitous in engineer- ing, machine learning and artificial intelligence applications. Many algo- rithms have been proposed for multimodal optimization, and many of them are based on restart strategies. However, only few works address the issue of initialization in restarts. Furthermore, very few comparisons have been done, between different MMO algorithms, and against simple baseline methods. This paper proposes an analysis of restart strategies, and provides a restart strategy for any local search algorithm for which theoretical guarantees are derived. This restart strategy is to decrease some 'step-size', rather than to increase the population size, and it uses quasi-random initialization, that leads to a rigorous proof of improve- ment with respect to random restarts or restarts with constant initial step-size. Furthermore, when this strategy encapsulates a (1+1)-ES with 1/5th adaptation rule, the resulting algorithm outperforms state of the art MMO algorithms while being computationally faster

    International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book

    Get PDF
    The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions. This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more

    Acceleration Methods

    Full text link
    This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization schemes. They coincide in the quadratic case to form the Chebyshev method. We discuss momentum methods in detail, starting with the seminal work of Nesterov and structure convergence proofs using a few master templates, such as that for optimized gradient methods, which provide the key benefit of showing how momentum methods optimize convergence guarantees. We further cover proximal acceleration, at the heart of the Catalyst and Accelerated Hybrid Proximal Extragradient frameworks, using similar algorithmic patterns. Common acceleration techniques rely directly on the knowledge of some of the regularity parameters in the problem at hand. We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates while adapting to unobserved regularity parameters.Comment: Published in Foundation and Trends in Optimization (see https://www.nowpublishers.com/article/Details/OPT-036
    corecore