10 research outputs found

    Parallelizable sparse inverse formulation Gaussian processes (SpInGP)

    Full text link
    We propose a parallelizable sparse inverse formulation Gaussian process (SpInGP) for temporal models. It uses a sparse precision GP formulation and sparse matrix routines to speed up the computations. Due to the state-space formulation used in the algorithm, the time complexity of the basic SpInGP is linear, and because all the computations are parallelizable, the parallel form of the algorithm is sublinear in the number of data points. We provide example algorithms to implement the sparse matrix routines and experimentally test the method using both simulated and real data.Comment: Presented at Machine Learning in Signal Processing (MLSP2017

    Forecasting of commercial sales with large scale Gaussian Processes

    Full text link
    This paper argues that there has not been enough discussion in the field of applications of Gaussian Process for the fast moving consumer goods industry. Yet, this technique can be important as it e.g., can provide automatic feature relevance determination and the posterior mean can unlock insights on the data. Significant challenges are the large size and high dimensionality of commercial data at a point of sale. The study reviews approaches in the Gaussian Processes modeling for large data sets, evaluates their performance on commercial sales and shows value of this type of models as a decision-making tool for management.Comment: 1o pages, 5 figure

    Ultra-fast Deep Mixtures of Gaussian Process Experts

    Full text link
    Mixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context, and sparse Gaussian processes (GP) have shown promise as a leading candidate for the experts in such models. In the present article, we propose to design the gating network for selecting the experts from such mixtures of sparse GPs using a deep neural network (DNN). This combination provides a flexible, robust, and efficient model which is able to significantly outperform competing models. We furthermore consider efficient approaches to computing maximum a posteriori (MAP) estimators of these models by iteratively maximizing the distribution of experts given allocations and allocations given experts. We also show that a recently introduced method called Cluster-Classify-Regress (CCR) is capable of providing a good approximation of the optimal solution extremely quickly. This approximation can then be further refined with the iterative algorithm

    Human motion estimation and controller learning

    Get PDF
    Humans are capable of complex manipulation and locomotion tasks. They are able to achieve energy-efficient gait, reject disturbances, handle changing loads, and adapt to environmental constraints. Using inspiration from the human body, robotics researchers aim to develop systems with similar capabilities. Research suggests that humans minimize a task specific cost function when performing movements. In order to learn this cost function from demonstrations and incorporate it into a controller, it is first imperative to accurately estimate the expert motion. The captured motions can then be analyzed to extract the objective function the expert was minimizing. We propose a framework for human motion estimation from wearable sensors. Human body joints are modeled by matrix Lie groups, using special orthogonal groups SO(2) and SO(3) for joint pose and special Euclidean group SE(3) for base link pose representation. To estimate the human joint pose, velocity and acceleration, we provide the equations for employing the extended Kalman Filter on Lie Groups, thus explicitly accounting for the non-Euclidean geometry of the state space. Incorporating interaction constraints with respect to the environment or within the participant allows us to track global body position without an absolute reference and ensure viable pose estimate. The algorithms are extensively validated in both simulation and real-world experiments. Next, to learn underlying expert control strategies from the expert demonstrations we present a novel fast approximate multi-variate Gaussian Process regression. The method estimates the underlying cost function, without making assumptions on its structure. The computational efficiency of the approach allows for real time forward horizon prediction. Using a linear model predictive control framework we then reproduce the demonstrated movements on a robot. The learned cost function captures the variability in expert motion as well as the correlations between states, leading to a controller that both produces motions and reacts to disturbances in a human-like manner. The model predictive control formulation allows the controller to satisfy task and joint space constraints avoiding obstacles and self collisions, as well as torque constraints, ensuring operational feasibility. The approach is validated on the Franka Emika robot using real human motion exemplars

    Inference using Gaussian processes in animal movement modelling

    Get PDF
    In recent years, the field of movement ecology has been changed dramatically by the capacity to collect accurate high-frequency telemetry data. In this thesis I present new statistical methods scalable to very large volumes of data being generated as there is a problem of scale dependence in most popular animal movement models. Popular and widely used movement models in ecology are discrete-time movement models, where animals’ positions are observed at discrete times. However, discrete-time models do not perform well when problems such as missing or irregular data are present. A remedy to the inefficiency of discrete-time movement models is to use continuous-time movement models, however the formulation of continuous-time movement models is often difficult and hard to interpret. In this thesis, I first focus on discrete-time movement models, where through a study I illustrate one of the problems that discrete-time movement models pose - the specification in advance of the discretisation time-step. I then move on to probabilistic methods, widely used in the machine learning community, Gaussian processes (GPs), and I show that they are equivalent to many continuous-time movement models. Given that the primary goal of machine learning methods is to learn from large scale datasets, using robust continuous-time movement models such as Gaussian processes is highly advantageous for multiple reasons. These include their flexibility in choosing various covariance functions, their scalability to large datasets and their ability to analyse data, infer parameters of interest and quantify uncertainty within a nonparametric Bayesian approach. I extend the standard Gaussian process (GP) into a non-stationary hierarchical Gaussian process, where both the movement process and the dynamic parameters of the movement model are Gaussian processes, which allows for increased flexibility to a wide range of behaviour modes that animals can exhibit. Throughout this thesis, I implement Gaussian processes on simulated and real tracking data using statistical libraries such as TensorFlow, which provide an accessible way to implement the model and gain access to GPU/HPC-accelerated machine learning libraries. I perform inference using optimisation methods such as maximum-a-posteriori (MAP) estimation, approximate sampling based inference methods such as Markov Chain Monte Carlo (MCMC) and variational inference methods on both synthetic and real datasets

    Probabilistic Ordinary Differential Equation Solvers - Theory and Applications

    Get PDF
    Ordinary differential equations are ubiquitous in science and engineering, as they provide mathematical models for many physical processes. However, most practical purposes require the temporal evolution of a particular solution. Many relevant ordinary differential equations are known to lack closed-form solutions in terms of simple analytic functions. Thus, users rely on numerical algorithms to compute discrete approximations. Numerical methods replace the intractable, and thus inaccessible, solution by an approximating model with known computational strategies. This is akin to a process in statistics where an unknown true relationship is modeled with access to instances of said relationship. One branch of statistics, Bayesian modeling, expresses degrees of uncertainty with probability distributions. In recent years, this idea has gained traction for the design and study of numerical algorithms which established probabilistic numerics as a research field in its own right. The theory part of this thesis is concerned with bridging the gap between classical numerical methods for ordinary differential equations and probabilistic numerics. To this end, an algorithm is presented based on Gaussian processes, a general and versatile model for Bayesian regression. This algorithm is compared to two standard frameworks for the solution of initial value problems. It is shown that the maximum a-posteriori estimator of certain Gaussian process regressors coincide with certain multistep formulae. Furthermore, a particular initialization scheme based on an improper prior model coincides with a Runge-Kutta method for the first discretization step. This analysis provides a higher-order probabilistic numerical algorithm for initial value problems. Based on the probabilistic description, an estimator of the local integration error is presented, which is used in a step size adaptation scheme. The completed algorithm is evaluated on a benchmark on initial value problems, confirming empirically the theoretically predicted error rates and displaying particularly efficient performance on domains with low accuracy requirements. To establish the practical benefit of the probabilistic solution, a probabilistic boundary value problem solver is applied to a medical imaging problem. In tractography, diffusion-weighted magnetic resonance imaging data is used to infer connectivity of neural fibers. The first application of the probabilistic solver shows how the quantification of the discretization error can be used in subsequent estimation of fiber density. The second application additionally incorporates the measurement noise of the imaging data into the tract estimation model. These two extensions of the shortest-path tractography method give more faithful data, modeling and algorithmic uncertainty representations in neural connectivity studies.Gewöhnliche Differentialgleichungen sind allgegenwĂ€rtig in Wissenschaft und Technik, da sie die mathematische Beschreibung vieler physikalischen VorgĂ€nge sind. Jedoch benötigt ein Großteil der praktischen Anwendungen die zeitliche Entwicklung einer bestimmten Lösung. Es ist bekannt, dass viele relevante gewöhnliche Differentialgleichungen keine geschlossene Lösung als AusdrĂŒcke einfacher analytischer Funktion besitzen. Daher verlassen sich Anwender auf numerische Algorithmen, um diskrete AnnĂ€herungen zu berechnen. Numerische Methoden ersetzen die unauswertbare, und daher unzugĂ€ngliche, Lösung durch eine AnnĂ€herung mit bekannten Rechenverfahren. Dies Ă€hnelt einem Vorgang in der Statistik, wobei ein unbekanntes wahres VerhĂ€ltnis mittels Zugang zu Beispielen modeliert wird. Eine Unterdisziplin der Statistik, Bayes’sche Modellierung, stellt graduelle Unsicherheit mittels Wahrscheinlichkeitsverteilungen dar. In den letzten Jahren hat diese Idee an Zugkraft fĂŒr die Konstruktion und Analyse von numerischen Algorithmen gewonnen, was zur Etablierung von probabilistischer Numerik als eigenstĂ€ndiges Forschungsgebiet fĂŒhrte. Der Theorieteil dieser Dissertation schlĂ€gt eine BrĂŒcke zwischen herkömmlichen numerischen Verfahren zur Lösung gewöhnlicher Differentialgleichungen und probabilistischer Numerik. Ein auf Gauß’schen Prozessen basierender Algorithmus wird vorgestellt, welche ein generelles und vielseitiges Modell der Bayesschen Regression sind. Dieser Algorithmus wird verglichen mit zwei StandardansĂ€tzen fĂŒr die Lösung von Anfangswertproblemen. Es wird gezeigt, dass der Maximum-a-posteriori-SchĂ€tzer bestimmter Gaußprozess-Regressoren ĂŒbereinstimmt mit bestimmten Mehrschrittverfahren. Weiterhin stimmt ein besonderes Initialisierungsverfahren basierend auf einer uneigentlichen A-priori-Wahrscheinlichkeit ĂŒberein mit einer Runge-Kutta Methode im ersten Rechenschritt. Diese Analyse fĂŒhrt zu einer probabilistisch-numerischen Methode höherer Ordnung zur Lösung von Anfangswertproblemen. Basierend auf der probabilistischen Beschreibung wird ein SchĂ€tzer des lokalen Integrationfehlers prĂ€sentiert, welcher in einem Schrittweitensteuerungsverfahren verwendet wird. Der vollstĂ€ndige Algorithmus wird auf einem Satz standardisierter Anfangswertprobleme ausgewertet, um empirisch den von der Theorie vorhergesagten Fehler zu bestĂ€tigen. Der Test weist dem Verfahren einen besonders effizienten Rechenaufwand im Bereich der niedrigen Genauigkeitsanforderungen aus. Um den praktischen Nutzen der probabilistischen Lösung nachzuweisen, wird ein probabilistischer Löser fĂŒr Randwertprobleme auf eine Fragestellung der medizinischen Bildgebung angewandt. In der Traktografie werden die Daten der diffusionsgewichteten Magnetresonanzbildgebung verwendet, um die KonnektivitĂ€t neuronaler Fasern zu bestimmen. Die erste Anwendung des probabilistische Lösers demonstriert, wie die Quantifizierung des Diskretisierungsfehlers in einer nachgeschalteten SchĂ€tzung der Faserdichte verwendet werden kann. Die zweite Anwendung integriert zusĂ€tzlich das Messrauschen der Bildgebungsdaten in das StrangschĂ€tzungsmodell. Diese beiden Erweiterungen der KĂŒrzesten-Pfad-Traktografie reprĂ€sentieren die Daten-, Modellierungs- und algorithmische Unsicherheit abbildungstreuer in neuronalen KonnektivitĂ€tsstudien

    Probabilistic Approaches to Stochastic Optimization

    Get PDF
    Optimization is a cardinal concept in the sciences, and viable algorithms of utmost importance as tools for finding the solution to an optimization problem. Empirical risk minimization is a major workhorse, in particular in machine learning applications, where an input-target relation is learned in a supervised manner. Empirical risks with high-dimensional inputs are mostly optimized by greedy, gradient-based, and possibly stochastic optimization routines, such as stochastic gradient descent. Though popular, and practically successful, this setup has major downsides which often makes it finicky to work with, or at least the bottleneck in a larger chain of learning procedures. For instance, typical issues are: ‱ Overfitting of a parametrized model to the data. This generally leads to poor generalization performance on unseen data. ‱ Tuning of algorithmic parameters, such as learning rates, is tedious, inefficient, and costly. ‱ Stochastic losses and gradients occur due to sub-sampling of a large dataset. They only yield incomplete, or corrupted information about the empirical risk, and are thus difficult to handle from a decision making point of view. This thesis consist of four conceptual parts. In the first one, we argue that conditional distributions of local full and mini-batch evaluations of losses and gradients can be well approximated by Gaussian distributions, since the losses themselves are sums of independently and identically distributed random variables. We then provide a way of estimating the corresponding sufficient statistics, i. e., variances and means, with low computational overhead. This yields an analytic likelihood for the loss and gradient at every point of the inputs space, which subsequently can be incorporated into active decision making at run-time of the optimizer. The second part focuses on estimating generalization performance, not by monitoring a validation loss, but by assessing if stochastic gradients can be fully explained by noise that occurs due to the finiteness of the training dataset, and not due to an informative gradient direction of the expected loss (risk). This yields a criterion for early-stopping where no validation set is needed, and the full dataset can be used for training. The third part is concerned with fully automated learning rate adaption for stochastic gradient descent (SGD). Global learning rates are arguably the most exposed manual tuning parameters of stochastic optimization routines. We propose a cheap and self-contained sub-routine, called a ‘probabilistic line search’ that automatically adapts the learning rate in every step, based on a local probability of descent. The result is an entirely parameter-free, stochastic optimizer that reaches comparable or better generalization performances than SGD with a carefully hand-tuned learning rate on the tested problems. The last part deals with noise-robust search directions. Inspired by classic first- and second-order methods, we model the unknown dynamics of the gradient or Hessian-function on the optimization path. The approach has strong connections to classic filtering frameworks and can incorporate noise-corrupted evaluations of the gradient at successive locations. The benefits are twofold. Firstly, we gain valuable insight on less accessible or ad-hoc design choices of classic optimizer as special cases. Secondly, we provide the basis for a flexible, self-contained, and easy-to-use class of stochastic optimizers that exhibit a higher degree of robustness and automation.Optimierung ist ein grundlegendes Prinzip in denWissenschaften, und Algorithmen zu deren Lösung von großer praktischer Bedeutung. Empirische Risikominimierung ist ein gĂ€ngiges Modell, vor allem in Anwendungen des Maschinellen Lernens, in denen eine Eingabe-Ausgabe Relation überwacht gelernt wird. Empirische Risiken mit hoch-dimensionalen Eingaben werden meist durch gierige, gradientenbasierte, und möglicherweise stochastische Routinen optimiert, so wie beispielsweise der stochastische Gradientenabstieg. Obwohl dieses Konzept populĂ€r als auch erfolgreich in der Praxis ist, hat es doch betrĂ€chtliche Nachteile, die es entweder aufwendig machen damit zu arbeiten, oder verlangsamen, sodass es den Engpass in einer grĂ¶ĂŸeren Kette von Lernprozessen darstellen kann. Typische Verhalten sind zum Beispiel: ‱ Überanpassung eines parametrischen Modells an die Daten. Dies führt oft zu schlechterer Generalisierungsleistung auf ungesehenen Daten. ‱ Die manuelle Anpassung von algorithmischen Parametern, wie zum Beispiel Lernraten ist oft mühsam, ineffizient und kostspielig. ‱ Stochastische Verluste und Gradienten treten auf, wenn Zufallsstichproben anstelle eines ganzen großen Datensatzes für deren Berechnung benutzt wird. Erstere stellen nur inkomplette, oder korrupte Information über das empirische Risiko dar und sind deshalb schwieriger zu handhaben, wenn ein Algorithmus Entscheidungen treffen soll. Diese Arbeit enthĂ€lt vier konzeptionelle Teile. Im ersten Teil argumentieren wir, dass bedingte Verteilungen von lokalen Voll- und Mini-Batch Verlusten und deren Gradienten gut mit Gaußverteilungen approximiert werden können, da die Verluste selbst Summen aus unabhĂ€ngig und identisch verteilten Zufallsvariablen sind. Wir stellen daraufhin dar, wie man die suffizienten Statistiken, also Varianzen und Mittelwerte, mit geringem zusĂ€tzlichen Rechenaufwand schĂ€tzen kann. Dies führt zu analytischen Likelihood-Funktionen für Verlust und Gradient an jedem Eingabepunkt, die daraufhin in aktive Entscheidungen des Optimierer zur Laufzeit einbezogen werden können. Der zweite Teil konzentriert sich auf die SchĂ€tzung der Generalisierungsleistung nicht indem der Verlust eines Validierungsdatensatzes überwacht wird, sondern indem beurteilt wird, ob stochastische Gradienten vollstĂ€ndig durch Rauschen aufgrund der Endlichkeit des Trainingsdatensatzes und nicht durch eine informative Gradientenrichtung des erwarteten Verlusts (des Risikos), erklĂ€rt werden können. Daraus wird ein Early-Stopping Kriterium abgeleitet, das keinen Validierungsdatensatz benötigt, sodass der komplette Datensatz für das Training verwendet werden kann. Der dritte Teil betrifft die vollstĂ€ndige Automatisierung der Adaptierung von Lernraten für den stochastischen Gradientenabstieg (SGD). Globale Lernraten sind wohl die prominentesten Parameter von stochastischen Optimierungsroutinen, die manuell angepasst werden müssenWir stellen eine günstige und eigenstĂ€ndige Subroutine vor, genannt ’Probabilistic Line Search’, die automatisch die Lernrate in jedem Schritt, basierend auf einer lokalen Abstiegswahrscheinlichkeit, anpasst. Das Ergebnis ist ein vollstĂ€ndig parameterfreier stochastischer Optimierer, der vergleichbare oder bessere Generalisierungsleistung wie SGD mit sorgfĂ€ltig von Hand eingestellten Lernraten erbringt. Der letzte Teil beschĂ€ftigt sich mit Suchrichtungen, die robust gegenüber Rauschen sind. Inspiriert von klassischen Optimierern erster und zweiter Ordnung, modellieren wir die Dynamik der Gradienten oder Hesse-Funktion auf dem Optimierungspfad. Dieser Ansatz ist stark verwandt mit klassischen Filter-Modellen, die aufeinanderfolgende verrauschte Gradienten berücksichtigen können Die Vorteile sind zweifĂ€ltig. ZunĂ€chst gewinnen wir wertvolle Einsichten in weniger zugĂ€ngliche oder ad hoc gewĂ€hlte Designs klassischer Optimierer als SpezialfĂ€lle. Zweitens bereiten wir die Basis für flexible, eigenstĂ€ndige und nutzerfreundliche stochastische Optimierer mit einem erhöhten Grad an Robustheit und Automatisierung
    corecore