332 research outputs found

    International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book

    Get PDF
    The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions. This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more

    Computational issues in process optimisation using historical data.

    Get PDF
    This thesis presents a new generic approach to improve the computational efficiency of neural-network-training algorithms and investigates the applicability of its 'learning from examples'' featured in improving the performance of a current intelligent diagnostic system. The contribution of this thesis is summarised in the following two points: For the first time in the literature, it has been shown that significant improvements in the computational efficiency of neural-network algorithms can be achieved using the proposed methodology based on using adaptive-gain variation. The capabilities of the current Knowledge Hyper-surface method (Meghana R. Ransing, 2002) are enhanced to overcome its existing limitations in modelling an exponential increase in the shape of the hyper-surface. Neural-network techniques, particularly back-propagation algorithms, have been widely used as a tool for discovering a mapping function between a known set of input and output examples. Neural networks learn from the known example set by adjusting its internal parameters, referred to as weights, using an optimisation procedure based on the 'least square fit principle'. The optimisation procedure normally involves thousands of iterations to converge to an acceptable solution. Hence, improving the computational efficiency of a neural-network algorithm is an active area of research. Various options for improving the computational efficiency of neural networks have been reviewed in this thesis. It has been shown in the existing literature that the variation of the gain parameter improves the learning efficiency of the gradient-descent method. However, it can be concluded from previous researchers' claims that the adaptive-gain variation improved the learning rate and hence the efficiency. It was discovered in this thesis that the gain variation has no influence on the learning rate; however, it actually influences the search direction. This made it possible to develop a novel approach that modifies the gradient-search direction by introducing the adaptive-gain variation. The proposed method is robust and has been shown that it can easily be implemented in all commonly used gradient- based optimisation algorithms. It has also been shown that it significantly improves the computational efficiency as compared to existing neural-network training algorithms. Computer simulations on a number of benchmark problems are used throughout to illustrate the improvement proposed in this thesis. In a foundry a large amount of data is generated within the foundry every time a casting is poured. Furthermore, with the increased number of computing tools and power there is a need to develop an efficient, intelligent diagnostic tool that can learn from the historical data to gain further insight into cause and effect relationships. In this study the performance of the current Knowledge Hyper-surface method was reviewed and the mathematical formulation of the current Knowledge Hyper-surface method was analysed to identify its limitations. An enhancement is proposed by introducing mid-points in the existing shape formulation. It is shown that the midpoints' shape function can successfully constrain the shape of decision hyper-surface to become more realistic with an acceptable result in a multi-dimensional case. This is a novel and original approach and is of direct relevance to the foundry industry

    Improving time efficiency of feedforward neural network learning

    Get PDF
    Feedforward neural networks have been widely studied and used in many applications in science and engineering. The training of this type of networks is mainly undertaken using the well-known backpropagation based learning algorithms. One major problem with this type of algorithms is the slow training convergence speed, which hinders their applications. In order to improve the training convergence speed of this type of algorithms, many researchers have developed different improvements and enhancements. However, the slow convergence problem has not been fully addressed. This thesis makes several contributions by proposing new backpropagation learning algorithms based on the terminal attractor concept to improve the existing backpropagation learning algorithms such as the gradient descent and Levenberg-Marquardt algorithms. These new algorithms enable fast convergence both at a distance from and in a close range of the ideal weights. In particular, a new fast convergence mechanism is proposed which is based on the fast terminal attractor concept. Comprehensive simulation studies are undertaken to demonstrate the effectiveness of the proposed backpropagataion algorithms with terminal attractors. Finally, three practical application cases of time series forecasting, character recognition and image interpolation are chosen to show the practicality and usefulness of the proposed learning algorithms with comprehensive comparative studies with existing algorithms

    Probabilistic Approaches to Stochastic Optimization

    Get PDF
    Optimization is a cardinal concept in the sciences, and viable algorithms of utmost importance as tools for finding the solution to an optimization problem. Empirical risk minimization is a major workhorse, in particular in machine learning applications, where an input-target relation is learned in a supervised manner. Empirical risks with high-dimensional inputs are mostly optimized by greedy, gradient-based, and possibly stochastic optimization routines, such as stochastic gradient descent. Though popular, and practically successful, this setup has major downsides which often makes it finicky to work with, or at least the bottleneck in a larger chain of learning procedures. For instance, typical issues are: ‱ Overfitting of a parametrized model to the data. This generally leads to poor generalization performance on unseen data. ‱ Tuning of algorithmic parameters, such as learning rates, is tedious, inefficient, and costly. ‱ Stochastic losses and gradients occur due to sub-sampling of a large dataset. They only yield incomplete, or corrupted information about the empirical risk, and are thus difficult to handle from a decision making point of view. This thesis consist of four conceptual parts. In the first one, we argue that conditional distributions of local full and mini-batch evaluations of losses and gradients can be well approximated by Gaussian distributions, since the losses themselves are sums of independently and identically distributed random variables. We then provide a way of estimating the corresponding sufficient statistics, i. e., variances and means, with low computational overhead. This yields an analytic likelihood for the loss and gradient at every point of the inputs space, which subsequently can be incorporated into active decision making at run-time of the optimizer. The second part focuses on estimating generalization performance, not by monitoring a validation loss, but by assessing if stochastic gradients can be fully explained by noise that occurs due to the finiteness of the training dataset, and not due to an informative gradient direction of the expected loss (risk). This yields a criterion for early-stopping where no validation set is needed, and the full dataset can be used for training. The third part is concerned with fully automated learning rate adaption for stochastic gradient descent (SGD). Global learning rates are arguably the most exposed manual tuning parameters of stochastic optimization routines. We propose a cheap and self-contained sub-routine, called a ‘probabilistic line search’ that automatically adapts the learning rate in every step, based on a local probability of descent. The result is an entirely parameter-free, stochastic optimizer that reaches comparable or better generalization performances than SGD with a carefully hand-tuned learning rate on the tested problems. The last part deals with noise-robust search directions. Inspired by classic first- and second-order methods, we model the unknown dynamics of the gradient or Hessian-function on the optimization path. The approach has strong connections to classic filtering frameworks and can incorporate noise-corrupted evaluations of the gradient at successive locations. The benefits are twofold. Firstly, we gain valuable insight on less accessible or ad-hoc design choices of classic optimizer as special cases. Secondly, we provide the basis for a flexible, self-contained, and easy-to-use class of stochastic optimizers that exhibit a higher degree of robustness and automation.Optimierung ist ein grundlegendes Prinzip in denWissenschaften, und Algorithmen zu deren Lösung von großer praktischer Bedeutung. Empirische Risikominimierung ist ein gĂ€ngiges Modell, vor allem in Anwendungen des Maschinellen Lernens, in denen eine Eingabe-Ausgabe Relation überwacht gelernt wird. Empirische Risiken mit hoch-dimensionalen Eingaben werden meist durch gierige, gradientenbasierte, und möglicherweise stochastische Routinen optimiert, so wie beispielsweise der stochastische Gradientenabstieg. Obwohl dieses Konzept populĂ€r als auch erfolgreich in der Praxis ist, hat es doch betrĂ€chtliche Nachteile, die es entweder aufwendig machen damit zu arbeiten, oder verlangsamen, sodass es den Engpass in einer grĂ¶ĂŸeren Kette von Lernprozessen darstellen kann. Typische Verhalten sind zum Beispiel: ‱ Überanpassung eines parametrischen Modells an die Daten. Dies führt oft zu schlechterer Generalisierungsleistung auf ungesehenen Daten. ‱ Die manuelle Anpassung von algorithmischen Parametern, wie zum Beispiel Lernraten ist oft mühsam, ineffizient und kostspielig. ‱ Stochastische Verluste und Gradienten treten auf, wenn Zufallsstichproben anstelle eines ganzen großen Datensatzes für deren Berechnung benutzt wird. Erstere stellen nur inkomplette, oder korrupte Information über das empirische Risiko dar und sind deshalb schwieriger zu handhaben, wenn ein Algorithmus Entscheidungen treffen soll. Diese Arbeit enthĂ€lt vier konzeptionelle Teile. Im ersten Teil argumentieren wir, dass bedingte Verteilungen von lokalen Voll- und Mini-Batch Verlusten und deren Gradienten gut mit Gaußverteilungen approximiert werden können, da die Verluste selbst Summen aus unabhĂ€ngig und identisch verteilten Zufallsvariablen sind. Wir stellen daraufhin dar, wie man die suffizienten Statistiken, also Varianzen und Mittelwerte, mit geringem zusĂ€tzlichen Rechenaufwand schĂ€tzen kann. Dies führt zu analytischen Likelihood-Funktionen für Verlust und Gradient an jedem Eingabepunkt, die daraufhin in aktive Entscheidungen des Optimierer zur Laufzeit einbezogen werden können. Der zweite Teil konzentriert sich auf die SchĂ€tzung der Generalisierungsleistung nicht indem der Verlust eines Validierungsdatensatzes überwacht wird, sondern indem beurteilt wird, ob stochastische Gradienten vollstĂ€ndig durch Rauschen aufgrund der Endlichkeit des Trainingsdatensatzes und nicht durch eine informative Gradientenrichtung des erwarteten Verlusts (des Risikos), erklĂ€rt werden können. Daraus wird ein Early-Stopping Kriterium abgeleitet, das keinen Validierungsdatensatz benötigt, sodass der komplette Datensatz für das Training verwendet werden kann. Der dritte Teil betrifft die vollstĂ€ndige Automatisierung der Adaptierung von Lernraten für den stochastischen Gradientenabstieg (SGD). Globale Lernraten sind wohl die prominentesten Parameter von stochastischen Optimierungsroutinen, die manuell angepasst werden müssenWir stellen eine günstige und eigenstĂ€ndige Subroutine vor, genannt ’Probabilistic Line Search’, die automatisch die Lernrate in jedem Schritt, basierend auf einer lokalen Abstiegswahrscheinlichkeit, anpasst. Das Ergebnis ist ein vollstĂ€ndig parameterfreier stochastischer Optimierer, der vergleichbare oder bessere Generalisierungsleistung wie SGD mit sorgfĂ€ltig von Hand eingestellten Lernraten erbringt. Der letzte Teil beschĂ€ftigt sich mit Suchrichtungen, die robust gegenüber Rauschen sind. Inspiriert von klassischen Optimierern erster und zweiter Ordnung, modellieren wir die Dynamik der Gradienten oder Hesse-Funktion auf dem Optimierungspfad. Dieser Ansatz ist stark verwandt mit klassischen Filter-Modellen, die aufeinanderfolgende verrauschte Gradienten berücksichtigen können Die Vorteile sind zweifĂ€ltig. ZunĂ€chst gewinnen wir wertvolle Einsichten in weniger zugĂ€ngliche oder ad hoc gewĂ€hlte Designs klassischer Optimierer als SpezialfĂ€lle. Zweitens bereiten wir die Basis für flexible, eigenstĂ€ndige und nutzerfreundliche stochastische Optimierer mit einem erhöhten Grad an Robustheit und Automatisierung

    Stochastic, distributed and federated optimization for machine learning

    Get PDF
    We study optimization algorithms for the finite sum problems frequently arising in machine learning applications. First, we propose novel variants of stochastic gradient descent with a variance reduction property that enables linear convergence for strongly convex objectives. Second, we study distributed setting, in which the data describing the optimization problem does not fit into a single computing node. In this case, traditional methods are inefficient, as the communication costs inherent in distributed optimization become the bottleneck. We propose a communication-efficient framework which iteratively forms local subproblems that can be solved with arbitrary local optimization algorithms. Finally, we introduce the concept of Federated Optimization/Learning, where we try to solve the machine learning problems without having data stored in any centralized manner. The main motivation comes from industry when handling user-generated data. The current prevalent practice is that companies collect vast amounts of user data and store them in datacenters. An alternative we propose is not to collect the data in first place, and instead occasionally use the computational power of users' devices to solve the very same optimization problems, while alleviating privacy concerns at the same time. In such setting, minimization of communication rounds is the primary goal, and we demonstrate that solving the optimization problems in such circumstances is conceptually tractable

    Apprentissage Ă  grande Ă©chelle et applications

    Get PDF
    This thesis presents my main research activities in statistical machine learning aftermy PhD, starting from my post-doc at UC Berkeley to my present research position atInria Grenoble. The first chapter introduces the context and a summary of my scientificcontributions and emphasizes the importance of pluri-disciplinary research. For instance,mathematical optimization has become central in machine learning and the interplay betweensignal processing, statistics, bioinformatics, and computer vision is stronger thanever. With many scientific and industrial fields producing massive amounts of data, theimpact of machine learning is potentially huge and diverse. However, dealing with massivedata raises also many challenges. In this context, the manuscript presents differentcontributions, which are organized in three main topics.Chapter 2 is devoted to large-scale optimization in machine learning with a focus onalgorithmic methods. We start with majorization-minimization algorithms for structuredproblems, including block-coordinate, incremental, and stochastic variants. These algorithmsare analyzed in terms of convergence rates for convex problems and in terms ofconvergence to stationary points for non-convex ones. We also introduce fast schemesfor minimizing large sums of convex functions and principles to accelerate gradient-basedapproaches, based on Nesterov’s acceleration and on Quasi-Newton approaches.Chapter 3 presents the paradigm of deep kernel machine, which is an alliance betweenkernel methods and multilayer neural networks. In the context of visual recognition, weintroduce a new invariant image model called convolutional kernel networks, which is anew type of convolutional neural network with a reproducing kernel interpretation. Thenetwork comes with simple and effective principles to do unsupervised learning, and iscompatible with supervised learning via backpropagation rules.Chapter 4 is devoted to sparse estimation—that is, the automatic selection of modelvariables for explaining observed data; in particular, this chapter presents the result ofpluri-disciplinary collaborations in bioinformatics and neuroscience where the sparsityprinciple is a key to build intepretable predictive models.Finally, the last chapter concludes the manuscript and suggests future perspectives.Ce mĂ©moire prĂ©sente mes activitĂ©s de recherche en apprentissage statistique aprĂšs mathĂšse de doctorat, dans une pĂ©riode allant de mon post-doctorat Ă  UC Berkeley jusqu’àmon activitĂ© actuelle de chercheur chez Inria. Le premier chapitre fournit un contextescientifique dans lequel s’inscrivent mes travaux et un rĂ©sumĂ© de mes contributions, enmettant l’accent sur l’importance de la recherche pluri-disciplinaire. L’optimisation mathĂ©matiqueest ainsi devenue un outil central en apprentissage statistique et les interactionsavec les communautĂ©s de vision artificielle, traitement du signal et bio-informatiquen’ont jamais Ă©tĂ© aussi fortes. De nombreux domaines scientifiques et industriels produisentdes donnĂ©es massives, mais les traiter efficacement nĂ©cessite de lever de nombreux verrousscientifiques. Dans ce contexte, ce mĂ©moire prĂ©sente diffĂ©rentes contributions, qui sontorganisĂ©es en trois thĂ©matiques.Le chapitre 2 est dĂ©diĂ© Ă  l’optimisation Ă  large Ă©chelle en apprentissage statistique.Dans un premier lieu, nous Ă©tudions plusieurs variantes d’algorithmes de majoration/minimisationpour des problĂšmes structurĂ©s, telles que des variantes par bloc de variables,incrĂ©mentales, et stochastiques. Chaque algorithme est analysĂ© en terme de taux deconvergence lorsque le problĂšme est convexe, et nous montrons la convergence de ceux-civers des points stationnaires dans le cas contraire. Des mĂ©thodes de minimisation rapidespour traiter le cas de sommes finies de fonctions sont aussi introduites, ainsi que desalgorithmes d’accĂ©lĂ©ration pour les techniques d’optimisation de premier ordre.Le chapitre 3 prĂ©sente le paradigme des mĂ©thodes Ă  noyaux profonds, que l’on peutinterprĂ©ter comme un mariage entre les mĂ©thodes Ă  noyaux classiques et les techniquesd’apprentissage profond. Dans le contexte de la reconnaissance visuelle, ce chapitre introduitun nouveau modĂšle d’image invariant appelĂ© rĂ©seau convolutionnel Ă  noyaux, qui estun nouveau type de rĂ©seau de neurones convolutionnel avec une interprĂ©tation en termesde noyaux reproduisants. Le rĂ©seau peut ĂȘtre appris simplement sans supervision grĂąceĂ  des techniques classiques d’approximation de noyaux, mais est aussi compatible avecl’apprentissage supervisĂ© grĂące Ă  des rĂšgles de backpropagation.Le chapitre 4 est dĂ©diĂ© Ă  l’estimation parcimonieuse, c’est Ă  dire, Ă  la sĂ©lĂ©ction automatiquede variables permettant d’expliquer des donnĂ©es observĂ©es. En particulier, cechapitre dĂ©crit des collaborations pluri-disciplinaires en bioinformatique et neuroscience,oĂč le principe de parcimonie est crucial pour obtenir des modĂšles prĂ©dictifs interprĂ©tables.Enfin, le dernier chapitre conclut ce mĂ©moire et prĂ©sente des perspectives futures

    Single-channel source separation using non-negative matrix factorization

    Get PDF

    Probabilistic Linear Algebra for Stochastic Optimization

    Get PDF
    The emergent field of machine learning has by now become the main proponent of data-driven discovery. Yet, with ever more data, it is also faced with new computational challenges. To make machines "learn", the desired task is oftentimes phrased as an empirical risk minimization problem that needs to be solved by numerical optimization routines. Optimization in ML deviates from the scope of traditional optimization in two regards. First, ML deals with large datasets that need to be subsampled to reduce the computational burden, inadvertently introducing noise into the optimization procedure. The second distinction is the sheer size of the parameter space which severely limits the amount of information that optimization algorithms store. Both aspects together have made first-order optimization routines a prevalent choice for model training in ML. First-order algorithms use only gradient information to determine a step direction and step length to update the parameters. Inclusion of second-order information about the local curvature has a great potential to improve the performance of the optimizer if done efficiently. Probabilistic curvature estimation for use in optimization is a recurring theme of this thesis and the problem is explored in three different directions that are relevant to ML training. By iteratively adapting the scale of an arbitrary curvature estimate it is possible to circumvent the tedious work of manually tuning the optimizer’s step length during model training. The general form of the curvature estimate naturally extends its applicability to various popular optimization algorithms. Curvature can also be inferred with matrix-variate distributions by projections of the curvature matrix. Noise can then be captured by a likelihood with non-vanishing width, leading to a novel update strategy that uses the inherent uncertainty to estimate the curvature. Finally, a new form of curvature estimate is derived from gradient observations of a nonparametric model. It expands the family of viable curvature estimates used in optimization. An important outcome of the research is to highlight the benefit of utilizing curvature information in stochastic optimization. By considering multiple ways of efficiently leveraging second-order information, the thesis advances the frontier of stochastic optimization and unlocks new avenues for research on the training of large scale ML models
    • 

    corecore