12 research outputs found

    Large-scale Heteroscedastic Regression via Gaussian Process

    Full text link
    Heteroscedastic regression considering the varying noises among observations has many applications in the fields like machine learning and statistics. Here we focus on the heteroscedastic Gaussian process (HGP) regression which integrates the latent function and the noise function together in a unified non-parametric Bayesian framework. Though showing remarkable performance, HGP suffers from the cubic time complexity, which strictly limits its application to big data. To improve the scalability, we first develop a variational sparse inference algorithm, named VSHGP, to handle large-scale datasets. Furthermore, two variants are developed to improve the scalability and capability of VSHGP. The first is stochastic VSHGP (SVSHGP) which derives a factorized evidence lower bound, thus enhancing efficient stochastic variational inference. The second is distributed VSHGP (DVSHGP) which (i) follows the Bayesian committee machine formalism to distribute computations over multiple local VSHGP experts with many inducing points; and (ii) adopts hybrid parameters for experts to guard against over-fitting and capture local variety. The superiority of DVSHGP and SVSHGP as compared to existing scalable heteroscedastic/homoscedastic GPs is then extensively verified on various datasets.Comment: 14 pages, 15 figure

    Coupling conditionally independent submaps for large-scale 2.5D mapping with Gaussian Markov Random Fields

    Full text link
    © 2017 IEEE. Building large-scale 2.5D maps when spatial correlations are considered can be quite expensive, but there are clear advantages when fusing data. While optimal submapping strategies have been explored previously in covariance-form using Gaussian Process for large-scale mapping, this paper focuses on transferring such concepts into information form. By exploiting the conditional independence property of the Gaussian Markov Random Field (GMRF) models, we propose a submapping approach to build a nearly optimal global 2.5D map. In the proposed approach data is fused by first fitting a GMRF to one sensor dataset; then conditional independent submaps are inferred using this model and updated individually with new data arrives. Finally, the information is propagated from submap to submap to later recover the fully updated map. This is efficiently achieved by exploiting the inherent structure of the GMRF, fusion and propagation all in information form. The key contribution of this paper is the derivation of the algorithm to optimally propagate information through submaps by only updating the common parts between submaps. Our results show the proposed method reduces the computational complexity of the full mapping process while maintaining the accuracy. The performance is evaluated on synthetic data from the Canadian Digital Elevation Data

    On Negative Transfer and Structure of Latent Functions in Multi-output Gaussian Processes

    Full text link
    The multi-output Gaussian process (MGP\mathcal{MGP}) is based on the assumption that outputs share commonalities, however, if this assumption does not hold negative transfer will lead to decreased performance relative to learning outputs independently or in subsets. In this article, we first define negative transfer in the context of an MGP\mathcal{MGP} and then derive necessary conditions for an MGP\mathcal{MGP} model to avoid negative transfer. Specifically, under the convolution construction, we show that avoiding negative transfer is mainly dependent on having a sufficient number of latent functions QQ regardless of the flexibility of the kernel or inference procedure used. However, a slight increase in QQ leads to a large increase in the number of parameters to be estimated. To this end, we propose two latent structures that scale to arbitrarily large datasets, can avoid negative transfer and allow any kernel or sparse approximations to be used within. These structures also allow regularization which can provide consistent and automatic selection of related outputs

    Entry Dependent Expert Selection in Distributed Gaussian Processes Using Multilabel Classification

    Full text link
    By distributing the training process, local approximation reduces the cost of the standard Gaussian Process. An ensemble technique combines local predictions from Gaussian experts trained on different partitions of the data. Ensemble methods aggregate models' predictions by assuming a perfect diversity of local predictors. Although it keeps the aggregation tractable, this assumption is often violated in practice. Even though ensemble methods provide consistent results by assuming dependencies between experts, they have a high computational cost, which is cubic in the number of experts involved. By implementing an expert selection strategy, the final aggregation step uses fewer experts and is more efficient. However, a selection approach that assigns a fixed set of experts to each new data point cannot encode the specific properties of each unique data point. This paper proposes a flexible expert selection approach based on the characteristics of entry data points. To this end, we investigate the selection task as a multi-label classification problem where the experts define labels, and each entry point is assigned to some experts. The proposed solution's prediction quality, efficiency, and asymptotic properties are discussed in detail. We demonstrate the efficacy of our method through extensive numerical experiments using synthetic and real-world data sets.Comment: A condensed version of this work has been accepted at the Gaussian Processes, Spatiotemporal Modeling, and Decision-making Systems workshop during NeurIPS 202

    Aggregation Strategies for Distributed Gaussian Processes

    Get PDF
    Gaussian processes are robust and flexible non-parametric statistical models that benefit from the Bayes theorem by assigning a Gaussian prior distribution to the unknown function. Despite their capability to provide high-accuracy predictions, they suffer from high computational costs. Various solutions have been proposed in the literature to deal with computational complexity. The main idea is to reduce the training cost, which is cubic in the size of the training set. A distributed Gaussian process is a divide-and-conquer approach that divides the entire training data set into several partitions and employs a local approximation scenario to train a Gaussian process at each data partition. An ensemble technique combines the local Gaussian experts to provide final aggregated predictions. Available baselines aggregate local predictions assuming perfect diversity between experts. However, this assumption is often violated in practice and leads to sub-optimal solutions. This thesis deals with dependency issues between experts. Aggregation based on experts' interactions improves accuracy and can lead to statistically consistent results. Few works have considered modeling dependencies between experts. Despite their theoretical advantages, their prediction steps are costly and cubically depend on the number of experts. We benefit from the experts' interactions in both dependence and independence-based aggregations. In conventional aggregation methods that combine experts using a conditional independence assumption, we transform the available experts set into clusters of highly correlated experts using spectral clustering. The final aggregation uses these clusters instead of the original experts. It reduces the effect of the independence assumption in the ensemble technique. Moreover, we develop a novel aggregation method for dependent experts using the latent variable graphical model and define the target function as a latent variable in a connected undirected graph. Besides, we propose two novel expert selection strategies in distributed learning. They improve the efficiency and accuracy of the prediction step by excluding weak experts in the ensemble method. The first is a static selection method that assigns a fixed set of experts to all new entry points in the prediction step using the Markov random field model. The second solution increases the flexibility of the selection step by converting it into a multi-label classification problem. It provides an entry-dependent selection model and assigns the most relevant experts to each data point. We address all related theoretical and practical aspects of the proposed solutions. The findings present valuable insights for distributed learning models and advance the state-of-the-art in several directions. Indeed, the proposed solutions do not need restricted assumptions and can be easily extended to non-Gaussian experts in distributed and federated learning.Gaußsche Prozesse sind robuste und flexible nichtparametrische statistische Modelle, die Bayes-Theorem verwenden, um einer unbekannten Funktion eine Gaußsche Prior-Verteilung zuzuweisen. Trotz ihrer Fähigkeit, hochgenaue Vorhersagen zu liefern, leiden sie unter hohen Rechenkosten. In der Literatur wurden verschiedene Lösungen vorgeschlagen, um die Rechenkomplexität zu beherrschen. Die Hauptidee besteht darin, die Trainingskosten zu reduzieren, die in der Größe des Trainingssets kubisch sind. Der verteilte Gaußsche Prozess ist ein Teile-und-Herrsche-Ansatz, der den gesamten Trainingsdatensatz in mehrere Partitionen unterteilt und ein lokales Näherungsszenario verwendet, um einen Gaußschen Prozess an jeder Datenpartition zu trainieren. Eine Ensemble-Technik kombiniert die lokalen Gaußschen Experten, um endgültige aggregierte Vorhersagen zu liefern. Verfügbare Basislösungen aggregieren lokale Vorhersagen unter der Annahme einer perfekten Diversität zwischen Experten. Diese Annahme wird jedoch in der Praxis oft verletzt und führt zu suboptimalen Lösungen. Diese Arbeit beschäftigt sich mit Abhängigkeitsproblemen zwischen Experten. Die Aggregation basierend auf den Interaktionen von Experten verbessert die Genauigkeit und kann zu statistisch konsistenten Ergebnissen führen. Nur wenige Arbeiten haben die Modellierung von Abhängigkeiten zwischen Experten in Betracht gezogen. Trotz ihrer theoretischen Vorteile sind ihre Vorhersageschritte kostspielig und hängen kubisch von der Anzahl der Experten ab. Wir profitieren von den Interaktionen der Experten sowohl bei abhängigkeits- als auch bei unabhängigkeitsbasierten Aggregationen. In konventionellen Aggregationsverfahren, die Experten unter Verwendung einer bedingten Unabhängigkeitsannahme kombinieren, transformieren wir den verfügbaren Expertensatz in Cluster von hochgradig korrelierten Experten unter Verwendung von spektralem Clustering. Die endgültige Aggregation verwendet diese Cluster anstelle der ursprünglichen Experten. Diese Vorgehensweise reduziert den Effekt der Unabhängigkeits- annahme in der Ensemble-Technik. Darüber hinaus entwickeln wir eine neuartige Aggregationsmethode für abhängige Experten unter Verwendung eines latenten Variablen-Grafikmodells und definieren die Zielfunktion als latente Variable in einem verbundenen ungerichteten Graphen. Außerdem schlagen wir zwei neue Expertenauswahlstrategien für verteiltes Lernen vor. Sie verbessern die Effizienz und Genauigkeit des Vorhersageschritts, indem sie schwache Experten in der Ensemble-Methode ausschließen. Das erste ist ein statisches Auswahlverfahren, das allen neuen Eintrittspunkten im Vorhersageschritt unter Verwendung des Markov-Zufallsfeldmodells eine feste Gruppe von Experten zuweist. Die zweite Lösung erhöht die Flexibilität des Auswahlschritts, indem sie ihn in ein Klassifizierungsproblem mit mehreren Labels umwandelt. Es bietet ein eintragsabhängiges Auswahlmodell und ordnet jedem Datenpunkt die relevantesten Experten zu. Wir gehen auf alle damit verbundenen theoretischen und praktischen Aspekte der vorgeschlagenen Lösungen ein. Die Ergebnisse stellen wertvolle Erkenntnisse für verteilte Lernmodelle dar und bringen den Stand der Technik in mehrere Richtungen voran. Tatsächlich benötigen sie keine eingeschränkten Annahmen und können leicht auf nicht-Gaußsche Experten für verteiltes und föderiertes Lernen erweitert werden
    corecore