Search CORE

332 research outputs found

International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book

Author: Arndt Rafael
Hintermüller Michael
Huber Olivier
Löbhard Caroline
Stengl Steven-Marian
Publication venue
Publication date: 01/01/2019
Field of study

The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions. This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more

Publications Server of the Weierstrass Institute for Applied Analysis and Stochastics

Computational issues in process optimisation using historical data.

Author: Nazri Mohd Nawi
Publication venue
Publication date: 01/01/2007
Field of study

This thesis presents a new generic approach to improve the computational efficiency of neural-network-training algorithms and investigates the applicability of its 'learning from examples'' featured in improving the performance of a current intelligent diagnostic system. The contribution of this thesis is summarised in the following two points: For the first time in the literature, it has been shown that significant improvements in the computational efficiency of neural-network algorithms can be achieved using the proposed methodology based on using adaptive-gain variation. The capabilities of the current Knowledge Hyper-surface method (Meghana R. Ransing, 2002) are enhanced to overcome its existing limitations in modelling an exponential increase in the shape of the hyper-surface. Neural-network techniques, particularly back-propagation algorithms, have been widely used as a tool for discovering a mapping function between a known set of input and output examples. Neural networks learn from the known example set by adjusting its internal parameters, referred to as weights, using an optimisation procedure based on the 'least square fit principle'. The optimisation procedure normally involves thousands of iterations to converge to an acceptable solution. Hence, improving the computational efficiency of a neural-network algorithm is an active area of research. Various options for improving the computational efficiency of neural networks have been reviewed in this thesis. It has been shown in the existing literature that the variation of the gain parameter improves the learning efficiency of the gradient-descent method. However, it can be concluded from previous researchers' claims that the adaptive-gain variation improved the learning rate and hence the efficiency. It was discovered in this thesis that the gain variation has no influence on the learning rate; however, it actually influences the search direction. This made it possible to develop a novel approach that modifies the gradient-search direction by introducing the adaptive-gain variation. The proposed method is robust and has been shown that it can easily be implemented in all commonly used gradient- based optimisation algorithms. It has also been shown that it significantly improves the computational efficiency as compared to existing neural-network training algorithms. Computer simulations on a number of benchmark problems are used throughout to illustrate the improvement proposed in this thesis. In a foundry a large amount of data is generated within the foundry every time a casting is poured. Furthermore, with the increased number of computing tools and power there is a need to develop an efficient, intelligent diagnostic tool that can learn from the historical data to gain further insight into cause and effect relationships. In this study the performance of the current Knowledge Hyper-surface method was reviewed and the mathematical formulation of the current Knowledge Hyper-surface method was analysed to identify its limitations. An enhancement is proposed by introducing mid-points in the existing shape formulation. It is shown that the midpoints' shape function can successfully constrain the shape of decision hyper-surface to become more realistic with an acceptable result in a multi-dimensional case. This is a novel and original approach and is of direct relevance to the foundry industry

Cronfa at Swansea University

Improving time efficiency of feedforward neural network learning

Author: Batbayar B
Publication venue: RMIT University
Publication date: 01/01/2008
Field of study

Feedforward neural networks have been widely studied and used in many applications in science and engineering. The training of this type of networks is mainly undertaken using the well-known backpropagation based learning algorithms. One major problem with this type of algorithms is the slow training convergence speed, which hinders their applications. In order to improve the training convergence speed of this type of algorithms, many researchers have developed different improvements and enhancements. However, the slow convergence problem has not been fully addressed. This thesis makes several contributions by proposing new backpropagation learning algorithms based on the terminal attractor concept to improve the existing backpropagation learning algorithms such as the gradient descent and Levenberg-Marquardt algorithms. These new algorithms enable fast convergence both at a distance from and in a close range of the ideal weights. In particular, a new fast convergence mechanism is proposed which is based on the fast terminal attractor concept. Comprehensive simulation studies are undertaken to demonstrate the effectiveness of the proposed backpropagataion algorithms with terminal attractors. Finally, three practical application cases of time series forecasting, character recognition and image interpolation are chosen to show the practicality and usefulness of the proposed learning algorithms with comprehensive comparative studies with existing algorithms

RMIT Research Repository

Probabilistic Approaches to Stochastic Optimization

Author: Mahsereci Maren
Publication venue: Universität Tübingen
Publication date: 23/07/2018
Field of study

Optimization is a cardinal concept in the sciences, and viable algorithms of utmost importance as tools for finding the solution to an optimization problem. Empirical risk minimization is a major workhorse, in particular in machine learning applications, where an input-target relation is learned in a supervised manner. Empirical risks with high-dimensional inputs are mostly optimized by greedy, gradient-based, and possibly stochastic optimization routines, such as stochastic gradient descent. Though popular, and practically successful, this setup has major downsides which often makes it finicky to work with, or at least the bottleneck in a larger chain of learning procedures. For instance, typical issues are: • Overfitting of a parametrized model to the data. This generally leads to poor generalization performance on unseen data. • Tuning of algorithmic parameters, such as learning rates, is tedious, inefficient, and costly. • Stochastic losses and gradients occur due to sub-sampling of a large dataset. They only yield incomplete, or corrupted information about the empirical risk, and are thus difficult to handle from a decision making point of view. This thesis consist of four conceptual parts. In the first one, we argue that conditional distributions of local full and mini-batch evaluations of losses and gradients can be well approximated by Gaussian distributions, since the losses themselves are sums of independently and identically distributed random variables. We then provide a way of estimating the corresponding sufficient statistics, i. e., variances and means, with low computational overhead. This yields an analytic likelihood for the loss and gradient at every point of the inputs space, which subsequently can be incorporated into active decision making at run-time of the optimizer. The second part focuses on estimating generalization performance, not by monitoring a validation loss, but by assessing if stochastic gradients can be fully explained by noise that occurs due to the finiteness of the training dataset, and not due to an informative gradient direction of the expected loss (risk). This yields a criterion for early-stopping where no validation set is needed, and the full dataset can be used for training. The third part is concerned with fully automated learning rate adaption for stochastic gradient descent (SGD). Global learning rates are arguably the most exposed manual tuning parameters of stochastic optimization routines. We propose a cheap and self-contained sub-routine, called a ‘probabilistic line search’ that automatically adapts the learning rate in every step, based on a local probability of descent. The result is an entirely parameter-free, stochastic optimizer that reaches comparable or better generalization performances than SGD with a carefully hand-tuned learning rate on the tested problems. The last part deals with noise-robust search directions. Inspired by classic first- and second-order methods, we model the unknown dynamics of the gradient or Hessian-function on the optimization path. The approach has strong connections to classic filtering frameworks and can incorporate noise-corrupted evaluations of the gradient at successive locations. The benefits are twofold. Firstly, we gain valuable insight on less accessible or ad-hoc design choices of classic optimizer as special cases. Secondly, we provide the basis for a flexible, self-contained, and easy-to-use class of stochastic optimizers that exhibit a higher degree of robustness and automation.Optimierung ist ein grundlegendes Prinzip in denWissenschaften, und Algorithmen zu deren Lösung von großer praktischer Bedeutung. Empirische Risikominimierung ist ein gängiges Modell, vor allem in Anwendungen des Maschinellen Lernens, in denen eine Eingabe-Ausgabe Relation überwacht gelernt wird. Empirische Risiken mit hoch-dimensionalen Eingaben werden meist durch gierige, gradientenbasierte, und möglicherweise stochastische Routinen optimiert, so wie beispielsweise der stochastische Gradientenabstieg. Obwohl dieses Konzept populär als auch erfolgreich in der Praxis ist, hat es doch beträchtliche Nachteile, die es entweder aufwendig machen damit zu arbeiten, oder verlangsamen, sodass es den Engpass in einer größeren Kette von Lernprozessen darstellen kann. Typische Verhalten sind zum Beispiel: • Überanpassung eines parametrischen Modells an die Daten. Dies führt oft zu schlechterer Generalisierungsleistung auf ungesehenen Daten. • Die manuelle Anpassung von algorithmischen Parametern, wie zum Beispiel Lernraten ist oft mühsam, ineffizient und kostspielig. • Stochastische Verluste und Gradienten treten auf, wenn Zufallsstichproben anstelle eines ganzen großen Datensatzes für deren Berechnung benutzt wird. Erstere stellen nur inkomplette, oder korrupte Information über das empirische Risiko dar und sind deshalb schwieriger zu handhaben, wenn ein Algorithmus Entscheidungen treffen soll. Diese Arbeit enthält vier konzeptionelle Teile. Im ersten Teil argumentieren wir, dass bedingte Verteilungen von lokalen Voll- und Mini-Batch Verlusten und deren Gradienten gut mit Gaußverteilungen approximiert werden können, da die Verluste selbst Summen aus unabhängig und identisch verteilten Zufallsvariablen sind. Wir stellen daraufhin dar, wie man die suffizienten Statistiken, also Varianzen und Mittelwerte, mit geringem zusätzlichen Rechenaufwand schätzen kann. Dies führt zu analytischen Likelihood-Funktionen für Verlust und Gradient an jedem Eingabepunkt, die daraufhin in aktive Entscheidungen des Optimierer zur Laufzeit einbezogen werden können. Der zweite Teil konzentriert sich auf die Schätzung der Generalisierungsleistung nicht indem der Verlust eines Validierungsdatensatzes überwacht wird, sondern indem beurteilt wird, ob stochastische Gradienten vollständig durch Rauschen aufgrund der Endlichkeit des Trainingsdatensatzes und nicht durch eine informative Gradientenrichtung des erwarteten Verlusts (des Risikos), erklärt werden können. Daraus wird ein Early-Stopping Kriterium abgeleitet, das keinen Validierungsdatensatz benötigt, sodass der komplette Datensatz für das Training verwendet werden kann. Der dritte Teil betrifft die vollständige Automatisierung der Adaptierung von Lernraten für den stochastischen Gradientenabstieg (SGD). Globale Lernraten sind wohl die prominentesten Parameter von stochastischen Optimierungsroutinen, die manuell angepasst werden müssenWir stellen eine günstige und eigenständige Subroutine vor, genannt ’Probabilistic Line Search’, die automatisch die Lernrate in jedem Schritt, basierend auf einer lokalen Abstiegswahrscheinlichkeit, anpasst. Das Ergebnis ist ein vollständig parameterfreier stochastischer Optimierer, der vergleichbare oder bessere Generalisierungsleistung wie SGD mit sorgfältig von Hand eingestellten Lernraten erbringt. Der letzte Teil beschäftigt sich mit Suchrichtungen, die robust gegenüber Rauschen sind. Inspiriert von klassischen Optimierern erster und zweiter Ordnung, modellieren wir die Dynamik der Gradienten oder Hesse-Funktion auf dem Optimierungspfad. Dieser Ansatz ist stark verwandt mit klassischen Filter-Modellen, die aufeinanderfolgende verrauschte Gradienten berücksichtigen können Die Vorteile sind zweifältig. Zunächst gewinnen wir wertvolle Einsichten in weniger zugängliche oder ad hoc gewählte Designs klassischer Optimierer als Spezialfälle. Zweitens bereiten wir die Basis für flexible, eigenständige und nutzerfreundliche stochastische Optimierer mit einem erhöhten Grad an Robustheit und Automatisierung

Publikationsserver der Universität Tübingen

MPG.PuRe

Recommended from our members

Continuous learning of analytical and machine learning rate of penetration (ROP) models for real-time drilling optimization

Author: Soares Cesar Mattos De Salles
Publication venue
Publication date: 21/06/2021
Field of study

Oil and gas operators strive to reach hydrocarbon reserves by drilling wells in the safest and fastest possible manner, providing indispensable energy to society at reduced costs while maintaining environmental sustainability. Real-time drilling optimization consists of selecting operational drilling parameters that maximize a desirable measure of drilling performance. Drilling optimization efforts often aspire to improve drilling speed, commonly referred to as rate of penetration (ROP). ROP is a function of the forces and moments applied to the bit, in addition to mud, formation, bit and hydraulic properties. Three operational drilling parameters may be constantly adjusted at surface to influence ROP towards a drilling objective: weight on bit (WOB), drillstring rotational speed (RPM), and drilling fluid (mud) flow rate. In the traditional, analytical approach to ROP modeling, inflexible equations relate WOB, RPM, flow rate and/or other measurable drilling parameters to ROP and empirical model coefficients are computed for each rock formation to best fit field data. Over the last decade, enhanced data acquisition technology and widespread cheap computational power have driven a surge in applications of machine learning (ML) techniques to ROP prediction. Machine learning algorithms leverage statistics to uncover relations between any prescribed inputs (features/predictors) and the quantity of interest (response). The biggest advantage of ML algorithms over analytical models is their flexibility in model form. With no set equation, ML models permit segmentation of the drilling operational parameter space. However, increased model complexity diminishes interpretability of how an adjustment to the inputs will affect the output. There is no single ROP model applicable in every situation. This study investigates all stages of the drilling optimization workflow, with emphasis on real-time continuous model learning. Sensors constantly record data as wells are drilled, and it is postulated that ROP models can be retrained in real-time to adapt to changing drilling conditions. Cross-validation is assessed as a methodology to select the best performing ROP model for each drilling optimization interval in real-time. Constrained to rig equipment and operational limitations, drilling parameters are optimized in intervals with the most accurate ROP model determined by cross-validation. Dynamic range and full range training data segmentation techniques contest the classical lithology-dependent approach to ROP modeling. Spatial proximity and parameter similarity sample weighting expand data partitioning capabilities during model training. The prescribed ROP modeling and drilling parameter optimization scenarios are evaluated according to model performance, ROP improvements and computational expensePetroleum and Geosystems Engineerin

Texas ScholarWorks

Stochastic, distributed and federated optimization for machine learning

Author: Konečný Jakub
Publication venue: The University of Edinburgh
Publication date: 04/07/2017
Field of study

We study optimization algorithms for the finite sum problems frequently arising in machine learning applications. First, we propose novel variants of stochastic gradient descent with a variance reduction property that enables linear convergence for strongly convex objectives. Second, we study distributed setting, in which the data describing the optimization problem does not fit into a single computing node. In this case, traditional methods are inefficient, as the communication costs inherent in distributed optimization become the bottleneck. We propose a communication-efficient framework which iteratively forms local subproblems that can be solved with arbitrary local optimization algorithms. Finally, we introduce the concept of Federated Optimization/Learning, where we try to solve the machine learning problems without having data stored in any centralized manner. The main motivation comes from industry when handling user-generated data. The current prevalent practice is that companies collect vast amounts of user data and store them in datacenters. An alternative we propose is not to collect the data in first place, and instead occasionally use the computational power of users' devices to solve the very same optimization problems, while alleviating privacy concerns at the same time. In such setting, minimization of communication rounds is the primary goal, and we demonstrate that solving the optimization problems in such circumstances is conceptually tractable

arXiv.org e-Print Archive

Edinburgh Research Archive

Apprentissage à grande échelle et applications

Author: Mairal Julien
Publication venue: HAL CCSD
Publication date: 04/10/2017
Field of study

This thesis presents my main research activities in statistical machine learning aftermy PhD, starting from my post-doc at UC Berkeley to my present research position atInria Grenoble. The first chapter introduces the context and a summary of my scientificcontributions and emphasizes the importance of pluri-disciplinary research. For instance,mathematical optimization has become central in machine learning and the interplay betweensignal processing, statistics, bioinformatics, and computer vision is stronger thanever. With many scientific and industrial fields producing massive amounts of data, theimpact of machine learning is potentially huge and diverse. However, dealing with massivedata raises also many challenges. In this context, the manuscript presents differentcontributions, which are organized in three main topics.Chapter 2 is devoted to large-scale optimization in machine learning with a focus onalgorithmic methods. We start with majorization-minimization algorithms for structuredproblems, including block-coordinate, incremental, and stochastic variants. These algorithmsare analyzed in terms of convergence rates for convex problems and in terms ofconvergence to stationary points for non-convex ones. We also introduce fast schemesfor minimizing large sums of convex functions and principles to accelerate gradient-basedapproaches, based on Nesterov’s acceleration and on Quasi-Newton approaches.Chapter 3 presents the paradigm of deep kernel machine, which is an alliance betweenkernel methods and multilayer neural networks. In the context of visual recognition, weintroduce a new invariant image model called convolutional kernel networks, which is anew type of convolutional neural network with a reproducing kernel interpretation. Thenetwork comes with simple and effective principles to do unsupervised learning, and iscompatible with supervised learning via backpropagation rules.Chapter 4 is devoted to sparse estimation—that is, the automatic selection of modelvariables for explaining observed data; in particular, this chapter presents the result ofpluri-disciplinary collaborations in bioinformatics and neuroscience where the sparsityprinciple is a key to build intepretable predictive models.Finally, the last chapter concludes the manuscript and suggests future perspectives.Ce mémoire présente mes activités de recherche en apprentissage statistique après mathèse de doctorat, dans une période allant de mon post-doctorat à UC Berkeley jusqu’àmon activité actuelle de chercheur chez Inria. Le premier chapitre fournit un contextescientifique dans lequel s’inscrivent mes travaux et un résumé de mes contributions, enmettant l’accent sur l’importance de la recherche pluri-disciplinaire. L’optimisation mathématiqueest ainsi devenue un outil central en apprentissage statistique et les interactionsavec les communautés de vision artificielle, traitement du signal et bio-informatiquen’ont jamais été aussi fortes. De nombreux domaines scientifiques et industriels produisentdes données massives, mais les traiter efficacement nécessite de lever de nombreux verrousscientifiques. Dans ce contexte, ce mémoire présente différentes contributions, qui sontorganisées en trois thématiques.Le chapitre 2 est dédié à l’optimisation à large échelle en apprentissage statistique.Dans un premier lieu, nous étudions plusieurs variantes d’algorithmes de majoration/minimisationpour des problèmes structurés, telles que des variantes par bloc de variables,incrémentales, et stochastiques. Chaque algorithme est analysé en terme de taux deconvergence lorsque le problème est convexe, et nous montrons la convergence de ceux-civers des points stationnaires dans le cas contraire. Des méthodes de minimisation rapidespour traiter le cas de sommes finies de fonctions sont aussi introduites, ainsi que desalgorithmes d’accélération pour les techniques d’optimisation de premier ordre.Le chapitre 3 présente le paradigme des méthodes à noyaux profonds, que l’on peutinterpréter comme un mariage entre les méthodes à noyaux classiques et les techniquesd’apprentissage profond. Dans le contexte de la reconnaissance visuelle, ce chapitre introduitun nouveau modèle d’image invariant appelé réseau convolutionnel à noyaux, qui estun nouveau type de réseau de neurones convolutionnel avec une interprétation en termesde noyaux reproduisants. Le réseau peut être appris simplement sans supervision grâceà des techniques classiques d’approximation de noyaux, mais est aussi compatible avecl’apprentissage supervisé grâce à des règles de backpropagation.Le chapitre 4 est dédié à l’estimation parcimonieuse, c’est à dire, à la séléction automatiquede variables permettant d’expliquer des données observées. En particulier, cechapitre décrit des collaborations pluri-disciplinaires en bioinformatique et neuroscience,où le principe de parcimonie est crucial pour obtenir des modèles prédictifs interprétables.Enfin, le dernier chapitre conclut ce mémoire et présente des perspectives futures

INRIA a CCSD electronic archive server

Single-channel source separation using non-negative matrix factorization

Author: Schmidt Mikkel Nørgaard
Publication venue: Technical University of Denmark, DTU Informatics, Building 321
Publication date: 01/01/2009
Field of study

Online Research Database In Technology

Probabilistic Linear Algebra for Stochastic Optimization

Author: de Roos Filip
Publication venue: Universität Tübingen
Publication date: 01/01/2021
Field of study

The emergent field of machine learning has by now become the main proponent of data-driven discovery. Yet, with ever more data, it is also faced with new computational challenges. To make machines "learn", the desired task is oftentimes phrased as an empirical risk minimization problem that needs to be solved by numerical optimization routines. Optimization in ML deviates from the scope of traditional optimization in two regards. First, ML deals with large datasets that need to be subsampled to reduce the computational burden, inadvertently introducing noise into the optimization procedure. The second distinction is the sheer size of the parameter space which severely limits the amount of information that optimization algorithms store. Both aspects together have made first-order optimization routines a prevalent choice for model training in ML. First-order algorithms use only gradient information to determine a step direction and step length to update the parameters. Inclusion of second-order information about the local curvature has a great potential to improve the performance of the optimizer if done efficiently. Probabilistic curvature estimation for use in optimization is a recurring theme of this thesis and the problem is explored in three different directions that are relevant to ML training. By iteratively adapting the scale of an arbitrary curvature estimate it is possible to circumvent the tedious work of manually tuning the optimizer’s step length during model training. The general form of the curvature estimate naturally extends its applicability to various popular optimization algorithms. Curvature can also be inferred with matrix-variate distributions by projections of the curvature matrix. Noise can then be captured by a likelihood with non-vanishing width, leading to a novel update strategy that uses the inherent uncertainty to estimate the curvature. Finally, a new form of curvature estimate is derived from gradient observations of a nonparametric model. It expands the family of viable curvature estimates used in optimization. An important outcome of the research is to highlight the benefit of utilizing curvature information in stochastic optimization. By considering multiple ways of efficiently leveraging second-order information, the thesis advances the frontier of stochastic optimization and unlocks new avenues for research on the training of large scale ML models

Publikationsserver der Universität Tübingen