104,959 research outputs found

    Clustering based on Mixtures of Sparse Gaussian Processes

    Full text link
    Creating low dimensional representations of a high dimensional data set is an important component in many machine learning applications. How to cluster data using their low dimensional embedded space is still a challenging problem in machine learning. In this article, we focus on proposing a joint formulation for both clustering and dimensionality reduction. When a probabilistic model is desired, one possible solution is to use the mixture models in which both cluster indicator and low dimensional space are learned. Our algorithm is based on a mixture of sparse Gaussian processes, which is called Sparse Gaussian Process Mixture Clustering (SGP-MIC). The main advantages to our approach over existing methods are that the probabilistic nature of this model provides more advantages over existing deterministic methods, it is straightforward to construct non-linear generalizations of the model, and applying a sparse model and an efficient variational EM approximation help to speed up the algorithm

    Primordial non-Gaussianity in the Bispectrum of the Halo Density Field

    Full text link
    The bispectrum vanishes for linear Gaussian fields and is thus a sensitive probe of non-linearities and non-Gaussianities in the cosmic density field. Hence, a detection of the bispectrum in the halo density field would enable tight constraints on non-Gaussian processes in the early Universe and allow inference of the dynamics driving inflation. We present a tree level derivation of the halo bispectrum arising from non-linear clustering, non-linear biasing and primordial non-Gaussianity. A diagrammatic description is developed to provide an intuitive understanding of the contributing terms and their dependence on scale, shape and the non-Gaussianity parameter fNL. We compute the terms based on a multivariate bias expansion and the peak-background split method and show that non-Gaussian modifications to the bias parameters lead to amplifications of the tree level bispectrum that were ignored in previous studies. Our results are in a good agreement with published simulation measurements of the halo bispectrum. Finally, we estimate the expected signal to noise on fNL and show that the constraint obtainable from the bispectrum analysis significantly exceeds the one obtainable from the power spectrum analysis.Comment: 34 pages, 15 figures, (v3): matches JCAP published versio

    Cluster-Specific Predictions with Multi-Task Gaussian Processes

    Full text link
    A model involving Gaussian processes (GPs) is introduced to simultaneously handle multi-task learning, clustering, and prediction for multiple functional data. This procedure acts as a model-based clustering method for functional data as well as a learning step for subsequent predictions for new tasks. The model is instantiated as a mixture of multi-task GPs with common mean processes. A variational EM algorithm is derived for dealing with the optimisation of the hyper-parameters along with the hyper-posteriors' estimation of latent variables and processes. We establish explicit formulas for integrating the mean processes and the latent clustering variables within a predictive distribution, accounting for uncertainty on both aspects. This distribution is defined as a mixture of cluster-specific GP predictions, which enhances the performances when dealing with group-structured data. The model handles irregular grid of observations and offers different hypotheses on the covariance structure for sharing additional information across tasks. The performances on both clustering and prediction tasks are assessed through various simulated scenarios and real datasets. The overall algorithm, called MagmaClust, is publicly available as an R package.Comment: 40 page

    Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making

    Full text link
    In multi-objective decision planning and learning, much attention is paid to producing optimal solution sets that contain an optimal policy for every possible user preference profile. We argue that the step that follows, i.e, determining which policy to execute by maximising the user's intrinsic utility function over this (possibly infinite) set, is under-studied. This paper aims to fill this gap. We build on previous work on Gaussian processes and pairwise comparisons for preference modelling, extend it to the multi-objective decision support scenario, and propose new ordered preference elicitation strategies based on ranking and clustering. Our main contribution is an in-depth evaluation of these strategies using computer and human-based experiments. We show that our proposed elicitation strategies outperform the currently used pairwise methods, and found that users prefer ranking most. Our experiments further show that utilising monotonicity information in GPs by using a linear prior mean at the start and virtual comparisons to the nadir and ideal points, increases performance. We demonstrate our decision support framework in a real-world study on traffic regulation, conducted with the city of Amsterdam.Comment: AAMAS 2018, Source code at https://github.com/lmzintgraf/gp_pref_elici

    Bayesian approach to Spatio-temporally Consistent Simulation of Daily Monsoon Rainfall over India

    Full text link
    Simulation of rainfall over a region for long time-sequences can be very useful for planning and policy-making, especially in India where the economy is heavily reliant on monsoon rainfall. However, such simulations should be able to preserve the known spatial and temporal characteristics of rainfall over India. General Circulation Models (GCMs) are unable to do so, and various rainfall generators designed by hydrologists using stochastic processes like Gaussian Processes are also difficult to apply over the vast and highly diverse landscape of India. In this paper, we explore a series of Bayesian models based on conditional distributions of latent variables that describe weather conditions at specific locations and over the whole country. During parameter estimation from observed data, we use spatio-temporal smoothing using Markov Random Field so that the parameters learnt are spatially and temporally coherent. Also, we use a nonparametric spatial clustering based on Chinese Restaurant Process to identify homogeneous regions, which are utilized by some of the proposed models to improve spatial correlations of the simulated rainfall. The models are able to simulate daily rainfall across India for years, and can also utilize contextual information for conditional simulation. We use two datasets of different spatial resolutions over India, and focus on the period 2000-2015. We propose a large number of metrics to study the spatio-temporal properties of the simulations by the models, and compare them with the observed data to evaluate the strengths and weaknesses of the models

    An Efficient Quality-Related Fault Diagnosis Method for Real-Time Multimode Industrial Process

    Get PDF
    Focusing on quality-related complex industrial process performance monitoring, a novel multimode process monitoring method is proposed in this paper. Firstly, principal component space clustering is implemented under the guidance of quality variables. Through extraction of model tags, clustering information of original training data can be acquired. Secondly, according to multimode characteristics of process data, the monitoring model integrated Gaussian mixture model with total projection to latent structures is effective after building the covariance description form. The multimode total projection to latent structures (MTPLS) model is the foundation of problem solving about quality-related monitoring for multimode processes. Then, a comprehensive statistics index is defined which is based on the posterior probability of the monitored samples belonging to each Gaussian component in the Bayesian theory. After that, a combined index is constructed for process monitoring. Finally, motivated by the application of traditional contribution plot in fault diagnosis, a gradient contribution rate is applied for analyzing the variation of variable contribution rate along samples. Our method can ensure the implementation of online fault monitoring and diagnosis for multimode processes. Performances of the whole proposed scheme are verified in a real industrial, hot strip mill process (HSMP) compared with some existing methods
    • …
    corecore