Search CORE

10 research outputs found

Parallelizable sparse inverse formulation Gaussian processes (SpInGP)

Author: Grigorievskiy Alexander
Lawrence Neil
Särkkä Simo
Publication venue
Publication date: 27/09/2017
Field of study

We propose a parallelizable sparse inverse formulation Gaussian process (SpInGP) for temporal models. It uses a sparse precision GP formulation and sparse matrix routines to speed up the computations. Due to the state-space formulation used in the algorithm, the time complexity of the basic SpInGP is linear, and because all the computations are parallelizable, the parallel form of the algorithm is sublinear in the number of data points. We provide example algorithms to implement the sparse matrix routines and experimentally test the method using both simulated and real data.Comment: Presented at Machine Learning in Signal Processing (MLSP2017

arXiv.org e-Print Archive

Crossref

Forecasting of commercial sales with large scale Gaussian Processes

Author: Carmen Marsit (334042)
Jia Chen (8203)
Ke Hao (50181)
Luca Lambertini (72724)
Maya Deyssenroth (4238833)
Shouneng Peng (493132)
Publication venue
Publication date: 01/01/2017
Field of study

This paper argues that there has not been enough discussion in the field of applications of Gaussian Process for the fast moving consumer goods industry. Yet, this technique can be important as it e.g., can provide automatic feature relevance determination and the posterior mean can unlock insights on the data. Significant challenges are the large size and high dimensionality of commercial data at a point of sale. The study reviews approaches in the Gaussian Processes modeling for large data sets, evaluates their performance on commercial sales and shows value of this type of models as a decision-making tool for management.Comment: 1o pages, 5 figure

arXiv.org e-Print Archive

Crossref

FigShare

Ultra-fast Deep Mixtures of Gaussian Process Experts

Author: Etienam Clement
Law Kody
Wade Sara
Publication venue
Publication date: 11/06/2020
Field of study

Mixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context, and sparse Gaussian processes (GP) have shown promise as a leading candidate for the experts in such models. In the present article, we propose to design the gating network for selecting the experts from such mixtures of sparse GPs using a deep neural network (DNN). This combination provides a flexible, robust, and efficient model which is able to significantly outperform competing models. We furthermore consider efficient approaches to computing maximum a posteriori (MAP) estimators of these models by iteratively maximizing the distribution of experts given allocations and allocations given experts. We also show that a recently introduced method called Cluster-Classify-Regress (CCR) is capable of providing a good approximation of the optimal solution extremely quickly. This approximation can then be further refined with the iterative algorithm

arXiv.org e-Print Archive

Human motion estimation and controller learning

Author: Joukov Vladimir
Publication venue: 'University of Waterloo'
Publication date: 08/07/2021
Field of study

Humans are capable of complex manipulation and locomotion tasks. They are able to achieve energy-efficient gait, reject disturbances, handle changing loads, and adapt to environmental constraints. Using inspiration from the human body, robotics researchers aim to develop systems with similar capabilities. Research suggests that humans minimize a task specific cost function when performing movements. In order to learn this cost function from demonstrations and incorporate it into a controller, it is first imperative to accurately estimate the expert motion. The captured motions can then be analyzed to extract the objective function the expert was minimizing. We propose a framework for human motion estimation from wearable sensors. Human body joints are modeled by matrix Lie groups, using special orthogonal groups SO(2) and SO(3) for joint pose and special Euclidean group SE(3) for base link pose representation. To estimate the human joint pose, velocity and acceleration, we provide the equations for employing the extended Kalman Filter on Lie Groups, thus explicitly accounting for the non-Euclidean geometry of the state space. Incorporating interaction constraints with respect to the environment or within the participant allows us to track global body position without an absolute reference and ensure viable pose estimate. The algorithms are extensively validated in both simulation and real-world experiments. Next, to learn underlying expert control strategies from the expert demonstrations we present a novel fast approximate multi-variate Gaussian Process regression. The method estimates the underlying cost function, without making assumptions on its structure. The computational efficiency of the approach allows for real time forward horizon prediction. Using a linear model predictive control framework we then reproduce the demonstrated movements on a robot. The learned cost function captures the variability in expert motion as well as the correlations between states, leading to a controller that both produces motions and reacts to disturbances in a human-like manner. The model predictive control formulation allows the controller to satisfy task and joint space constraints avoiding obstacles and self collisions, as well as torque constraints, ensuring operational feasibility. The approach is validated on the Franka Emika robot using real human motion exemplars

University of Waterloo's Institutional Repository

Inference using Gaussian processes in animal movement modelling

Author: Paun Ionut Alexandru
Publication venue
Publication date: 01/01/2022
Field of study

In recent years, the field of movement ecology has been changed dramatically by the capacity to collect accurate high-frequency telemetry data. In this thesis I present new statistical methods scalable to very large volumes of data being generated as there is a problem of scale dependence in most popular animal movement models. Popular and widely used movement models in ecology are discrete-time movement models, where animals’ positions are observed at discrete times. However, discrete-time models do not perform well when problems such as missing or irregular data are present. A remedy to the inefficiency of discrete-time movement models is to use continuous-time movement models, however the formulation of continuous-time movement models is often difficult and hard to interpret. In this thesis, I first focus on discrete-time movement models, where through a study I illustrate one of the problems that discrete-time movement models pose - the specification in advance of the discretisation time-step. I then move on to probabilistic methods, widely used in the machine learning community, Gaussian processes (GPs), and I show that they are equivalent to many continuous-time movement models. Given that the primary goal of machine learning methods is to learn from large scale datasets, using robust continuous-time movement models such as Gaussian processes is highly advantageous for multiple reasons. These include their flexibility in choosing various covariance functions, their scalability to large datasets and their ability to analyse data, infer parameters of interest and quantify uncertainty within a nonparametric Bayesian approach. I extend the standard Gaussian process (GP) into a non-stationary hierarchical Gaussian process, where both the movement process and the dynamic parameters of the movement model are Gaussian processes, which allows for increased flexibility to a wide range of behaviour modes that animals can exhibit. Throughout this thesis, I implement Gaussian processes on simulated and real tracking data using statistical libraries such as TensorFlow, which provide an accessible way to implement the model and gain access to GPU/HPC-accelerated machine learning libraries. I perform inference using optimisation methods such as maximum-a-posteriori (MAP) estimation, approximate sampling based inference methods such as Markov Chain Monte Carlo (MCMC) and variational inference methods on both synthetic and real datasets

Glasgow Theses Service

Probabilistic Ordinary Differential Equation Solvers - Theory and Applications

Author: Schober Michael
Publication venue: Universität Tübingen
Publication date: 01/01/2018
Field of study

Ordinary differential equations are ubiquitous in science and engineering, as they provide mathematical models for many physical processes. However, most practical purposes require the temporal evolution of a particular solution. Many relevant ordinary differential equations are known to lack closed-form solutions in terms of simple analytic functions. Thus, users rely on numerical algorithms to compute discrete approximations. Numerical methods replace the intractable, and thus inaccessible, solution by an approximating model with known computational strategies. This is akin to a process in statistics where an unknown true relationship is modeled with access to instances of said relationship. One branch of statistics, Bayesian modeling, expresses degrees of uncertainty with probability distributions. In recent years, this idea has gained traction for the design and study of numerical algorithms which established probabilistic numerics as a research field in its own right. The theory part of this thesis is concerned with bridging the gap between classical numerical methods for ordinary differential equations and probabilistic numerics. To this end, an algorithm is presented based on Gaussian processes, a general and versatile model for Bayesian regression. This algorithm is compared to two standard frameworks for the solution of initial value problems. It is shown that the maximum a-posteriori estimator of certain Gaussian process regressors coincide with certain multistep formulae. Furthermore, a particular initialization scheme based on an improper prior model coincides with a Runge-Kutta method for the first discretization step. This analysis provides a higher-order probabilistic numerical algorithm for initial value problems. Based on the probabilistic description, an estimator of the local integration error is presented, which is used in a step size adaptation scheme. The completed algorithm is evaluated on a benchmark on initial value problems, confirming empirically the theoretically predicted error rates and displaying particularly efficient performance on domains with low accuracy requirements. To establish the practical benefit of the probabilistic solution, a probabilistic boundary value problem solver is applied to a medical imaging problem. In tractography, diffusion-weighted magnetic resonance imaging data is used to infer connectivity of neural fibers. The first application of the probabilistic solver shows how the quantification of the discretization error can be used in subsequent estimation of fiber density. The second application additionally incorporates the measurement noise of the imaging data into the tract estimation model. These two extensions of the shortest-path tractography method give more faithful data, modeling and algorithmic uncertainty representations in neural connectivity studies.Gewöhnliche Differentialgleichungen sind allgegenwärtig in Wissenschaft und Technik, da sie die mathematische Beschreibung vieler physikalischen Vorgänge sind. Jedoch benötigt ein Großteil der praktischen Anwendungen die zeitliche Entwicklung einer bestimmten Lösung. Es ist bekannt, dass viele relevante gewöhnliche Differentialgleichungen keine geschlossene Lösung als Ausdrücke einfacher analytischer Funktion besitzen. Daher verlassen sich Anwender auf numerische Algorithmen, um diskrete Annäherungen zu berechnen. Numerische Methoden ersetzen die unauswertbare, und daher unzugängliche, Lösung durch eine Annäherung mit bekannten Rechenverfahren. Dies ähnelt einem Vorgang in der Statistik, wobei ein unbekanntes wahres Verhältnis mittels Zugang zu Beispielen modeliert wird. Eine Unterdisziplin der Statistik, Bayes’sche Modellierung, stellt graduelle Unsicherheit mittels Wahrscheinlichkeitsverteilungen dar. In den letzten Jahren hat diese Idee an Zugkraft für die Konstruktion und Analyse von numerischen Algorithmen gewonnen, was zur Etablierung von probabilistischer Numerik als eigenständiges Forschungsgebiet führte. Der Theorieteil dieser Dissertation schlägt eine Brücke zwischen herkömmlichen numerischen Verfahren zur Lösung gewöhnlicher Differentialgleichungen und probabilistischer Numerik. Ein auf Gauß’schen Prozessen basierender Algorithmus wird vorgestellt, welche ein generelles und vielseitiges Modell der Bayesschen Regression sind. Dieser Algorithmus wird verglichen mit zwei Standardansätzen für die Lösung von Anfangswertproblemen. Es wird gezeigt, dass der Maximum-a-posteriori-Schätzer bestimmter Gaußprozess-Regressoren übereinstimmt mit bestimmten Mehrschrittverfahren. Weiterhin stimmt ein besonderes Initialisierungsverfahren basierend auf einer uneigentlichen A-priori-Wahrscheinlichkeit überein mit einer Runge-Kutta Methode im ersten Rechenschritt. Diese Analyse führt zu einer probabilistisch-numerischen Methode höherer Ordnung zur Lösung von Anfangswertproblemen. Basierend auf der probabilistischen Beschreibung wird ein Schätzer des lokalen Integrationfehlers präsentiert, welcher in einem Schrittweitensteuerungsverfahren verwendet wird. Der vollständige Algorithmus wird auf einem Satz standardisierter Anfangswertprobleme ausgewertet, um empirisch den von der Theorie vorhergesagten Fehler zu bestätigen. Der Test weist dem Verfahren einen besonders effizienten Rechenaufwand im Bereich der niedrigen Genauigkeitsanforderungen aus. Um den praktischen Nutzen der probabilistischen Lösung nachzuweisen, wird ein probabilistischer Löser für Randwertprobleme auf eine Fragestellung der medizinischen Bildgebung angewandt. In der Traktografie werden die Daten der diffusionsgewichteten Magnetresonanzbildgebung verwendet, um die Konnektivität neuronaler Fasern zu bestimmen. Die erste Anwendung des probabilistische Lösers demonstriert, wie die Quantifizierung des Diskretisierungsfehlers in einer nachgeschalteten Schätzung der Faserdichte verwendet werden kann. Die zweite Anwendung integriert zusätzlich das Messrauschen der Bildgebungsdaten in das Strangschätzungsmodell. Diese beiden Erweiterungen der Kürzesten-Pfad-Traktografie repräsentieren die Daten-, Modellierungs- und algorithmische Unsicherheit abbildungstreuer in neuronalen Konnektivitätsstudien

Publikationsserver der Universität Tübingen

MPG.PuRe

Probabilistic Approaches to Stochastic Optimization

Author: Mahsereci Maren
Publication venue: Universität Tübingen
Publication date: 23/07/2018
Field of study

Optimization is a cardinal concept in the sciences, and viable algorithms of utmost importance as tools for finding the solution to an optimization problem. Empirical risk minimization is a major workhorse, in particular in machine learning applications, where an input-target relation is learned in a supervised manner. Empirical risks with high-dimensional inputs are mostly optimized by greedy, gradient-based, and possibly stochastic optimization routines, such as stochastic gradient descent. Though popular, and practically successful, this setup has major downsides which often makes it finicky to work with, or at least the bottleneck in a larger chain of learning procedures. For instance, typical issues are: • Overfitting of a parametrized model to the data. This generally leads to poor generalization performance on unseen data. • Tuning of algorithmic parameters, such as learning rates, is tedious, inefficient, and costly. • Stochastic losses and gradients occur due to sub-sampling of a large dataset. They only yield incomplete, or corrupted information about the empirical risk, and are thus difficult to handle from a decision making point of view. This thesis consist of four conceptual parts. In the first one, we argue that conditional distributions of local full and mini-batch evaluations of losses and gradients can be well approximated by Gaussian distributions, since the losses themselves are sums of independently and identically distributed random variables. We then provide a way of estimating the corresponding sufficient statistics, i. e., variances and means, with low computational overhead. This yields an analytic likelihood for the loss and gradient at every point of the inputs space, which subsequently can be incorporated into active decision making at run-time of the optimizer. The second part focuses on estimating generalization performance, not by monitoring a validation loss, but by assessing if stochastic gradients can be fully explained by noise that occurs due to the finiteness of the training dataset, and not due to an informative gradient direction of the expected loss (risk). This yields a criterion for early-stopping where no validation set is needed, and the full dataset can be used for training. The third part is concerned with fully automated learning rate adaption for stochastic gradient descent (SGD). Global learning rates are arguably the most exposed manual tuning parameters of stochastic optimization routines. We propose a cheap and self-contained sub-routine, called a ‘probabilistic line search’ that automatically adapts the learning rate in every step, based on a local probability of descent. The result is an entirely parameter-free, stochastic optimizer that reaches comparable or better generalization performances than SGD with a carefully hand-tuned learning rate on the tested problems. The last part deals with noise-robust search directions. Inspired by classic first- and second-order methods, we model the unknown dynamics of the gradient or Hessian-function on the optimization path. The approach has strong connections to classic filtering frameworks and can incorporate noise-corrupted evaluations of the gradient at successive locations. The benefits are twofold. Firstly, we gain valuable insight on less accessible or ad-hoc design choices of classic optimizer as special cases. Secondly, we provide the basis for a flexible, self-contained, and easy-to-use class of stochastic optimizers that exhibit a higher degree of robustness and automation.Optimierung ist ein grundlegendes Prinzip in denWissenschaften, und Algorithmen zu deren Lösung von großer praktischer Bedeutung. Empirische Risikominimierung ist ein gängiges Modell, vor allem in Anwendungen des Maschinellen Lernens, in denen eine Eingabe-Ausgabe Relation überwacht gelernt wird. Empirische Risiken mit hoch-dimensionalen Eingaben werden meist durch gierige, gradientenbasierte, und möglicherweise stochastische Routinen optimiert, so wie beispielsweise der stochastische Gradientenabstieg. Obwohl dieses Konzept populär als auch erfolgreich in der Praxis ist, hat es doch beträchtliche Nachteile, die es entweder aufwendig machen damit zu arbeiten, oder verlangsamen, sodass es den Engpass in einer größeren Kette von Lernprozessen darstellen kann. Typische Verhalten sind zum Beispiel: • Überanpassung eines parametrischen Modells an die Daten. Dies führt oft zu schlechterer Generalisierungsleistung auf ungesehenen Daten. • Die manuelle Anpassung von algorithmischen Parametern, wie zum Beispiel Lernraten ist oft mühsam, ineffizient und kostspielig. • Stochastische Verluste und Gradienten treten auf, wenn Zufallsstichproben anstelle eines ganzen großen Datensatzes für deren Berechnung benutzt wird. Erstere stellen nur inkomplette, oder korrupte Information über das empirische Risiko dar und sind deshalb schwieriger zu handhaben, wenn ein Algorithmus Entscheidungen treffen soll. Diese Arbeit enthält vier konzeptionelle Teile. Im ersten Teil argumentieren wir, dass bedingte Verteilungen von lokalen Voll- und Mini-Batch Verlusten und deren Gradienten gut mit Gaußverteilungen approximiert werden können, da die Verluste selbst Summen aus unabhängig und identisch verteilten Zufallsvariablen sind. Wir stellen daraufhin dar, wie man die suffizienten Statistiken, also Varianzen und Mittelwerte, mit geringem zusätzlichen Rechenaufwand schätzen kann. Dies führt zu analytischen Likelihood-Funktionen für Verlust und Gradient an jedem Eingabepunkt, die daraufhin in aktive Entscheidungen des Optimierer zur Laufzeit einbezogen werden können. Der zweite Teil konzentriert sich auf die Schätzung der Generalisierungsleistung nicht indem der Verlust eines Validierungsdatensatzes überwacht wird, sondern indem beurteilt wird, ob stochastische Gradienten vollständig durch Rauschen aufgrund der Endlichkeit des Trainingsdatensatzes und nicht durch eine informative Gradientenrichtung des erwarteten Verlusts (des Risikos), erklärt werden können. Daraus wird ein Early-Stopping Kriterium abgeleitet, das keinen Validierungsdatensatz benötigt, sodass der komplette Datensatz für das Training verwendet werden kann. Der dritte Teil betrifft die vollständige Automatisierung der Adaptierung von Lernraten für den stochastischen Gradientenabstieg (SGD). Globale Lernraten sind wohl die prominentesten Parameter von stochastischen Optimierungsroutinen, die manuell angepasst werden müssenWir stellen eine günstige und eigenständige Subroutine vor, genannt ’Probabilistic Line Search’, die automatisch die Lernrate in jedem Schritt, basierend auf einer lokalen Abstiegswahrscheinlichkeit, anpasst. Das Ergebnis ist ein vollständig parameterfreier stochastischer Optimierer, der vergleichbare oder bessere Generalisierungsleistung wie SGD mit sorgfältig von Hand eingestellten Lernraten erbringt. Der letzte Teil beschäftigt sich mit Suchrichtungen, die robust gegenüber Rauschen sind. Inspiriert von klassischen Optimierern erster und zweiter Ordnung, modellieren wir die Dynamik der Gradienten oder Hesse-Funktion auf dem Optimierungspfad. Dieser Ansatz ist stark verwandt mit klassischen Filter-Modellen, die aufeinanderfolgende verrauschte Gradienten berücksichtigen können Die Vorteile sind zweifältig. Zunächst gewinnen wir wertvolle Einsichten in weniger zugängliche oder ad hoc gewählte Designs klassischer Optimierer als Spezialfälle. Zweitens bereiten wir die Basis für flexible, eigenständige und nutzerfreundliche stochastische Optimierer mit einem erhöhten Grad an Robustheit und Automatisierung

Publikationsserver der Universität Tübingen

MPG.PuRe