332 research outputs found
International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book
The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions.
This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more
Computational issues in process optimisation using historical data.
This thesis presents a new generic approach to improve the computational efficiency of neural-network-training algorithms and investigates the applicability of its 'learning from examples'' featured in improving the performance of a current intelligent diagnostic system. The contribution of this thesis is summarised in the following two points: For the first time in the literature, it has been shown that significant improvements in the computational efficiency of neural-network algorithms can be achieved using the proposed methodology based on using adaptive-gain variation. The capabilities of the current Knowledge Hyper-surface method (Meghana R. Ransing, 2002) are enhanced to overcome its existing limitations in modelling an exponential increase in the shape of the hyper-surface. Neural-network techniques, particularly back-propagation algorithms, have been widely used as a tool for discovering a mapping function between a known set of input and output examples. Neural networks learn from the known example set by adjusting its internal parameters, referred to as weights, using an optimisation procedure based on the 'least square fit principle'. The optimisation procedure normally involves thousands of iterations to converge to an acceptable solution. Hence, improving the computational efficiency of a neural-network algorithm is an active area of research. Various options for improving the computational efficiency of neural networks have been reviewed in this thesis. It has been shown in the existing literature that the variation of the gain parameter improves the learning efficiency of the gradient-descent method. However, it can be concluded from previous researchers' claims that the adaptive-gain variation improved the learning rate and hence the efficiency. It was discovered in this thesis that the gain variation has no influence on the learning rate; however, it actually influences the search direction. This made it possible to develop a novel approach that modifies the gradient-search direction by introducing the adaptive-gain variation. The proposed method is robust and has been shown that it can easily be implemented in all commonly used gradient- based optimisation algorithms. It has also been shown that it significantly improves the computational efficiency as compared to existing neural-network training algorithms. Computer simulations on a number of benchmark problems are used throughout to illustrate the improvement proposed in this thesis. In a foundry a large amount of data is generated within the foundry every time a casting is poured. Furthermore, with the increased number of computing tools and power there is a need to develop an efficient, intelligent diagnostic tool that can learn from the historical data to gain further insight into cause and effect relationships. In this study the performance of the current Knowledge Hyper-surface method was reviewed and the mathematical formulation of the current Knowledge Hyper-surface method was analysed to identify its limitations. An enhancement is proposed by introducing mid-points in the existing shape formulation. It is shown that the midpoints' shape function can successfully constrain the shape of decision hyper-surface to become more realistic with an acceptable result in a multi-dimensional case. This is a novel and original approach and is of direct relevance to the foundry industry
Improving time efficiency of feedforward neural network learning
Feedforward neural networks have been widely studied and used in many applications in science and engineering. The training of this type of networks is mainly undertaken using the well-known backpropagation based learning algorithms. One major problem with this type of algorithms is the slow training convergence speed, which hinders their applications. In order to improve the training convergence speed of this type of algorithms, many researchers have developed different improvements and enhancements. However, the slow convergence problem has not been fully addressed. This thesis makes several contributions by proposing new backpropagation learning algorithms based on the terminal attractor concept to improve the existing backpropagation learning algorithms such as the gradient descent and Levenberg-Marquardt algorithms. These new algorithms enable fast convergence both at a distance from and in a close range of the ideal weights. In particular, a new fast convergence mechanism is proposed which is based on the fast terminal attractor concept. Comprehensive simulation studies are undertaken to demonstrate the effectiveness of the proposed backpropagataion algorithms with terminal attractors. Finally, three practical application cases of time series forecasting, character recognition and image interpolation are chosen to show the practicality and usefulness of the proposed learning algorithms with comprehensive comparative studies with existing algorithms
Probabilistic Approaches to Stochastic Optimization
Optimization is a cardinal concept in the sciences, and viable algorithms of utmost importance as tools
for finding the solution to an optimization problem. Empirical risk minimization is a major workhorse,
in particular in machine learning applications, where an input-target relation is learned in a supervised
manner. Empirical risks with high-dimensional inputs are mostly optimized by greedy, gradient-based,
and possibly stochastic optimization routines, such as stochastic gradient descent.
Though popular, and practically successful, this setup has major downsides which often makes it
finicky to work with, or at least the bottleneck in a larger chain of learning procedures. For instance,
typical issues are:
âą Overfitting of a parametrized model to the data. This generally leads to poor generalization performance
on unseen data.
âą Tuning of algorithmic parameters, such as learning rates, is tedious, inefficient, and costly.
âą Stochastic losses and gradients occur due to sub-sampling of a large dataset. They only yield
incomplete, or corrupted information about the empirical risk, and are thus difficult to handle from a
decision making point of view.
This thesis consist of four conceptual parts.
In the first one, we argue that conditional distributions of local full and mini-batch evaluations of
losses and gradients can be well approximated by Gaussian distributions, since the losses themselves
are sums of independently and identically distributed random variables. We then provide a way of
estimating the corresponding sufficient statistics, i. e., variances and means, with low computational
overhead. This yields an analytic likelihood for the loss and gradient at every point of the inputs space,
which subsequently can be incorporated into active decision making at run-time of the optimizer.
The second part focuses on estimating generalization performance, not by monitoring a validation
loss, but by assessing if stochastic gradients can be fully explained by noise that occurs due to the
finiteness of the training dataset, and not due to an informative gradient direction of the expected loss
(risk). This yields a criterion for early-stopping where no validation set is needed, and the full dataset
can be used for training.
The third part is concerned with fully automated learning rate adaption for stochastic gradient descent
(SGD). Global learning rates are arguably the most exposed manual tuning parameters of stochastic
optimization routines. We propose a cheap and self-contained sub-routine, called a âprobabilistic
line searchâ that automatically adapts the learning rate in every step, based on a local probability of
descent. The result is an entirely parameter-free, stochastic optimizer that reaches comparable or better
generalization performances than SGD with a carefully hand-tuned learning rate on the tested problems.
The last part deals with noise-robust search directions. Inspired by classic first- and second-order
methods, we model the unknown dynamics of the gradient or Hessian-function on the optimization
path. The approach has strong connections to classic filtering frameworks and can incorporate noise-corrupted
evaluations of the gradient at successive locations. The benefits are twofold. Firstly, we gain
valuable insight on less accessible or ad-hoc design choices of classic optimizer as special cases. Secondly,
we provide the basis for a flexible, self-contained, and easy-to-use class of stochastic optimizers that
exhibit a higher degree of robustness and automation.Optimierung ist ein grundlegendes Prinzip in denWissenschaften, und Algorithmen zu deren Lösung
von groĂer praktischer Bedeutung. Empirische Risikominimierung ist ein gĂ€ngiges Modell, vor allem in
Anwendungen des Maschinellen Lernens, in denen eine Eingabe-Ausgabe Relation uÌberwacht gelernt
wird. Empirische Risiken mit hoch-dimensionalen Eingaben werden meist durch gierige, gradientenbasierte,
und möglicherweise stochastische Routinen optimiert, so wie beispielsweise der stochastische
Gradientenabstieg.
Obwohl dieses Konzept populÀr als auch erfolgreich in der Praxis ist, hat es doch betrÀchtliche
Nachteile, die es entweder aufwendig machen damit zu arbeiten, oder verlangsamen, sodass es den
Engpass in einer gröĂeren Kette von Lernprozessen darstellen kann. Typische Verhalten sind zum
Beispiel:
âą Ăberanpassung eines parametrischen Modells an die Daten. Dies fuÌhrt oft zu schlechterer Generalisierungsleistung
auf ungesehenen Daten.
âą Die manuelle Anpassung von algorithmischen Parametern, wie zum Beispiel Lernraten ist oft muÌhsam,
ineffizient und kostspielig.
âą Stochastische Verluste und Gradienten treten auf, wenn Zufallsstichproben anstelle eines ganzen
groĂen Datensatzes fuÌr deren Berechnung benutzt wird. Erstere stellen nur inkomplette, oder korrupte
Information uÌber das empirische Risiko dar und sind deshalb schwieriger zu handhaben, wenn ein
Algorithmus Entscheidungen treffen soll.
Diese Arbeit enthÀlt vier konzeptionelle Teile.
Im ersten Teil argumentieren wir, dass bedingte Verteilungen von lokalen Voll- und Mini-Batch Verlusten
und deren Gradienten gut mit GauĂverteilungen approximiert werden können, da die Verluste
selbst Summen aus unabhÀngig und identisch verteilten Zufallsvariablen sind. Wir stellen daraufhin
dar, wie man die suffizienten Statistiken, also Varianzen und Mittelwerte, mit geringem zusÀtzlichen
Rechenaufwand schĂ€tzen kann. Dies fuÌhrt zu analytischen Likelihood-Funktionen fuÌr Verlust und Gradient
an jedem Eingabepunkt, die daraufhin in aktive Entscheidungen des Optimierer zur Laufzeit
einbezogen werden können.
Der zweite Teil konzentriert sich auf die SchÀtzung der Generalisierungsleistung nicht indem der
Verlust eines Validierungsdatensatzes uÌberwacht wird, sondern indem beurteilt wird, ob stochastische
Gradienten vollstÀndig durch Rauschen aufgrund der Endlichkeit des Trainingsdatensatzes und nicht
durch eine informative Gradientenrichtung des erwarteten Verlusts (des Risikos), erklÀrt werden können.
Daraus wird ein Early-Stopping Kriterium abgeleitet, das keinen Validierungsdatensatz benötigt,
sodass der komplette Datensatz fuÌr das Training verwendet werden kann.
Der dritte Teil betrifft die vollstĂ€ndige Automatisierung der Adaptierung von Lernraten fuÌr den stochastischen
Gradientenabstieg (SGD). Globale Lernraten sind wohl die prominentesten Parameter von
stochastischen Optimierungsroutinen, die manuell angepasst werden muÌssenWir stellen eine guÌnstige
und eigenstĂ€ndige Subroutine vor, genannt âProbabilistic Line Searchâ, die automatisch die Lernrate
in jedem Schritt, basierend auf einer lokalen Abstiegswahrscheinlichkeit, anpasst. Das Ergebnis ist ein
vollstÀndig parameterfreier stochastischer Optimierer, der vergleichbare oder bessere Generalisierungsleistung
wie SGD mit sorgfÀltig von Hand eingestellten Lernraten erbringt.
Der letzte Teil beschĂ€ftigt sich mit Suchrichtungen, die robust gegenuÌber Rauschen sind. Inspiriert von
klassischen Optimierern erster und zweiter Ordnung, modellieren wir die Dynamik der Gradienten oder
Hesse-Funktion auf dem Optimierungspfad. Dieser Ansatz ist stark verwandt mit klassischen
Filter-Modellen, die aufeinanderfolgende verrauschte Gradienten beruÌcksichtigen können Die Vorteile
sind zweifÀltig. ZunÀchst gewinnen wir wertvolle Einsichten in weniger zugÀngliche oder ad hoc
gewĂ€hlte Designs klassischer Optimierer als SpezialfĂ€lle. Zweitens bereiten wir die Basis fuÌr flexible,
eigenstÀndige und nutzerfreundliche stochastische Optimierer mit einem erhöhten Grad an Robustheit
und Automatisierung
Recommended from our members
Continuous learning of analytical and machine learning rate of penetration (ROP) models for real-time drilling optimization
Oil and gas operators strive to reach hydrocarbon reserves by drilling wells in the safest and fastest possible manner, providing indispensable energy to society at reduced costs while maintaining environmental sustainability. Real-time drilling optimization consists of selecting operational drilling parameters that maximize a desirable measure of drilling performance. Drilling optimization efforts often aspire to improve drilling speed, commonly referred to as rate of penetration (ROP). ROP is a function of the forces and moments applied to the bit, in addition to mud, formation, bit and hydraulic properties. Three operational drilling parameters may be constantly adjusted at surface to influence ROP towards a drilling objective: weight on bit (WOB), drillstring rotational speed (RPM), and drilling fluid (mud) flow rate. In the traditional, analytical approach to ROP modeling, inflexible equations relate WOB, RPM, flow rate and/or other measurable drilling parameters to ROP and empirical model coefficients are computed for each rock formation to best fit field data. Over the last decade, enhanced data acquisition technology and widespread cheap computational power have driven a surge in applications of machine learning (ML) techniques to ROP prediction. Machine learning algorithms leverage statistics to uncover relations between any prescribed inputs (features/predictors) and the quantity of interest (response). The biggest advantage of ML algorithms over analytical models is their flexibility in model form. With no set equation, ML models permit segmentation of the drilling operational parameter space. However, increased model complexity diminishes interpretability of how an adjustment to the inputs will affect the output. There is no single ROP model applicable in every situation. This study investigates all stages of the drilling optimization workflow, with emphasis on real-time continuous model learning. Sensors constantly record data as wells are drilled, and it is postulated that ROP models can be retrained in real-time to adapt to changing drilling conditions. Cross-validation is assessed as a methodology to select the best performing ROP model for each drilling optimization interval in real-time. Constrained to rig equipment and operational limitations, drilling parameters are optimized in intervals with the most accurate ROP model determined by cross-validation. Dynamic range and full range training data segmentation techniques contest the classical lithology-dependent approach to ROP modeling. Spatial proximity and parameter similarity sample weighting expand data partitioning capabilities during model training. The prescribed ROP modeling and drilling parameter optimization scenarios are evaluated according to model performance, ROP improvements and computational expensePetroleum and Geosystems Engineerin
Stochastic, distributed and federated optimization for machine learning
We study optimization algorithms for the finite sum problems frequently arising in machine
learning applications. First, we propose novel variants of stochastic gradient descent with
a variance reduction property that enables linear convergence for strongly convex objectives.
Second, we study distributed setting, in which the data describing the optimization problem
does not fit into a single computing node. In this case, traditional methods are inefficient,
as the communication costs inherent in distributed optimization become the bottleneck. We
propose a communication-efficient framework which iteratively forms local subproblems that can
be solved with arbitrary local optimization algorithms. Finally, we introduce the concept of
Federated Optimization/Learning, where we try to solve the machine learning problems without
having data stored in any centralized manner. The main motivation comes from industry when
handling user-generated data. The current prevalent practice is that companies collect vast
amounts of user data and store them in datacenters. An alternative we propose is not to collect
the data in first place, and instead occasionally use the computational power of users' devices to
solve the very same optimization problems, while alleviating privacy concerns at the same time.
In such setting, minimization of communication rounds is the primary goal, and we demonstrate
that solving the optimization problems in such circumstances is conceptually tractable
Apprentissage Ă grande Ă©chelle et applications
This thesis presents my main research activities in statistical machine learning aftermy PhD, starting from my post-doc at UC Berkeley to my present research position atInria Grenoble. The first chapter introduces the context and a summary of my scientificcontributions and emphasizes the importance of pluri-disciplinary research. For instance,mathematical optimization has become central in machine learning and the interplay betweensignal processing, statistics, bioinformatics, and computer vision is stronger thanever. With many scientific and industrial fields producing massive amounts of data, theimpact of machine learning is potentially huge and diverse. However, dealing with massivedata raises also many challenges. In this context, the manuscript presents differentcontributions, which are organized in three main topics.Chapter 2 is devoted to large-scale optimization in machine learning with a focus onalgorithmic methods. We start with majorization-minimization algorithms for structuredproblems, including block-coordinate, incremental, and stochastic variants. These algorithmsare analyzed in terms of convergence rates for convex problems and in terms ofconvergence to stationary points for non-convex ones. We also introduce fast schemesfor minimizing large sums of convex functions and principles to accelerate gradient-basedapproaches, based on Nesterovâs acceleration and on Quasi-Newton approaches.Chapter 3 presents the paradigm of deep kernel machine, which is an alliance betweenkernel methods and multilayer neural networks. In the context of visual recognition, weintroduce a new invariant image model called convolutional kernel networks, which is anew type of convolutional neural network with a reproducing kernel interpretation. Thenetwork comes with simple and effective principles to do unsupervised learning, and iscompatible with supervised learning via backpropagation rules.Chapter 4 is devoted to sparse estimationâthat is, the automatic selection of modelvariables for explaining observed data; in particular, this chapter presents the result ofpluri-disciplinary collaborations in bioinformatics and neuroscience where the sparsityprinciple is a key to build intepretable predictive models.Finally, the last chapter concludes the manuscript and suggests future perspectives.Ce mĂ©moire prĂ©sente mes activitĂ©s de recherche en apprentissage statistique aprĂšs mathĂšse de doctorat, dans une pĂ©riode allant de mon post-doctorat Ă UC Berkeley jusquâĂ mon activitĂ© actuelle de chercheur chez Inria. Le premier chapitre fournit un contextescientifique dans lequel sâinscrivent mes travaux et un rĂ©sumĂ© de mes contributions, enmettant lâaccent sur lâimportance de la recherche pluri-disciplinaire. Lâoptimisation mathĂ©matiqueest ainsi devenue un outil central en apprentissage statistique et les interactionsavec les communautĂ©s de vision artificielle, traitement du signal et bio-informatiquenâont jamais Ă©tĂ© aussi fortes. De nombreux domaines scientifiques et industriels produisentdes donnĂ©es massives, mais les traiter efficacement nĂ©cessite de lever de nombreux verrousscientifiques. Dans ce contexte, ce mĂ©moire prĂ©sente diffĂ©rentes contributions, qui sontorganisĂ©es en trois thĂ©matiques.Le chapitre 2 est dĂ©diĂ© Ă lâoptimisation Ă large Ă©chelle en apprentissage statistique.Dans un premier lieu, nous Ă©tudions plusieurs variantes dâalgorithmes de majoration/minimisationpour des problĂšmes structurĂ©s, telles que des variantes par bloc de variables,incrĂ©mentales, et stochastiques. Chaque algorithme est analysĂ© en terme de taux deconvergence lorsque le problĂšme est convexe, et nous montrons la convergence de ceux-civers des points stationnaires dans le cas contraire. Des mĂ©thodes de minimisation rapidespour traiter le cas de sommes finies de fonctions sont aussi introduites, ainsi que desalgorithmes dâaccĂ©lĂ©ration pour les techniques dâoptimisation de premier ordre.Le chapitre 3 prĂ©sente le paradigme des mĂ©thodes Ă noyaux profonds, que lâon peutinterprĂ©ter comme un mariage entre les mĂ©thodes Ă noyaux classiques et les techniquesdâapprentissage profond. Dans le contexte de la reconnaissance visuelle, ce chapitre introduitun nouveau modĂšle dâimage invariant appelĂ© rĂ©seau convolutionnel Ă noyaux, qui estun nouveau type de rĂ©seau de neurones convolutionnel avec une interprĂ©tation en termesde noyaux reproduisants. Le rĂ©seau peut ĂȘtre appris simplement sans supervision grĂąceĂ des techniques classiques dâapproximation de noyaux, mais est aussi compatible aveclâapprentissage supervisĂ© grĂące Ă des rĂšgles de backpropagation.Le chapitre 4 est dĂ©diĂ© Ă lâestimation parcimonieuse, câest Ă dire, Ă la sĂ©lĂ©ction automatiquede variables permettant dâexpliquer des donnĂ©es observĂ©es. En particulier, cechapitre dĂ©crit des collaborations pluri-disciplinaires en bioinformatique et neuroscience,oĂč le principe de parcimonie est crucial pour obtenir des modĂšles prĂ©dictifs interprĂ©tables.Enfin, le dernier chapitre conclut ce mĂ©moire et prĂ©sente des perspectives futures
Probabilistic Linear Algebra for Stochastic Optimization
The emergent field of machine learning has by now become the main proponent of data-driven discovery. Yet, with ever more data, it is also faced with new computational challenges. To make machines "learn", the desired task is oftentimes phrased as an empirical risk minimization problem that needs to be solved by numerical optimization routines. Optimization in ML deviates from the scope of traditional optimization in two regards. First, ML deals with large datasets that need to be subsampled to reduce the computational burden, inadvertently introducing noise into the optimization procedure. The second distinction is the sheer size of the parameter space which severely limits the amount of information that optimization algorithms store. Both aspects together have made first-order optimization routines a prevalent choice for model training in ML. First-order algorithms use only gradient information to determine a step direction and step length to update the parameters. Inclusion of second-order information about the local curvature has a great potential to improve the performance of the optimizer if done efficiently.
Probabilistic curvature estimation for use in optimization is a recurring theme of this thesis and the problem is explored in three different directions that are relevant to ML training.
By iteratively adapting the scale of an arbitrary curvature estimate it is possible to circumvent the tedious work of manually tuning the optimizerâs step length during model training. The general form of the curvature estimate naturally extends its applicability to various popular optimization algorithms.
Curvature can also be inferred with matrix-variate distributions by projections of the curvature matrix. Noise can then be captured by a likelihood with non-vanishing width, leading to a novel update strategy that uses the inherent uncertainty to estimate the curvature.
Finally, a new form of curvature estimate is derived from gradient observations of a nonparametric model. It expands the family of viable curvature estimates used in optimization.
An important outcome of the research is to highlight the benefit of utilizing curvature information in stochastic optimization. By considering multiple ways of efficiently leveraging second-order information, the thesis advances the frontier of stochastic optimization and unlocks new avenues for research on the training of large scale ML models
- âŠ