104,959 research outputs found
Clustering based on Mixtures of Sparse Gaussian Processes
Creating low dimensional representations of a high dimensional data set is an
important component in many machine learning applications. How to cluster data
using their low dimensional embedded space is still a challenging problem in
machine learning. In this article, we focus on proposing a joint formulation
for both clustering and dimensionality reduction. When a probabilistic model is
desired, one possible solution is to use the mixture models in which both
cluster indicator and low dimensional space are learned. Our algorithm is based
on a mixture of sparse Gaussian processes, which is called Sparse Gaussian
Process Mixture Clustering (SGP-MIC). The main advantages to our approach over
existing methods are that the probabilistic nature of this model provides more
advantages over existing deterministic methods, it is straightforward to
construct non-linear generalizations of the model, and applying a sparse model
and an efficient variational EM approximation help to speed up the algorithm
Primordial non-Gaussianity in the Bispectrum of the Halo Density Field
The bispectrum vanishes for linear Gaussian fields and is thus a sensitive
probe of non-linearities and non-Gaussianities in the cosmic density field.
Hence, a detection of the bispectrum in the halo density field would enable
tight constraints on non-Gaussian processes in the early Universe and allow
inference of the dynamics driving inflation. We present a tree level derivation
of the halo bispectrum arising from non-linear clustering, non-linear biasing
and primordial non-Gaussianity. A diagrammatic description is developed to
provide an intuitive understanding of the contributing terms and their
dependence on scale, shape and the non-Gaussianity parameter fNL. We compute
the terms based on a multivariate bias expansion and the peak-background split
method and show that non-Gaussian modifications to the bias parameters lead to
amplifications of the tree level bispectrum that were ignored in previous
studies. Our results are in a good agreement with published simulation
measurements of the halo bispectrum. Finally, we estimate the expected signal
to noise on fNL and show that the constraint obtainable from the bispectrum
analysis significantly exceeds the one obtainable from the power spectrum
analysis.Comment: 34 pages, 15 figures, (v3): matches JCAP published versio
Cluster-Specific Predictions with Multi-Task Gaussian Processes
A model involving Gaussian processes (GPs) is introduced to simultaneously
handle multi-task learning, clustering, and prediction for multiple functional
data. This procedure acts as a model-based clustering method for functional
data as well as a learning step for subsequent predictions for new tasks. The
model is instantiated as a mixture of multi-task GPs with common mean
processes. A variational EM algorithm is derived for dealing with the
optimisation of the hyper-parameters along with the hyper-posteriors'
estimation of latent variables and processes. We establish explicit formulas
for integrating the mean processes and the latent clustering variables within a
predictive distribution, accounting for uncertainty on both aspects. This
distribution is defined as a mixture of cluster-specific GP predictions, which
enhances the performances when dealing with group-structured data. The model
handles irregular grid of observations and offers different hypotheses on the
covariance structure for sharing additional information across tasks. The
performances on both clustering and prediction tasks are assessed through
various simulated scenarios and real datasets. The overall algorithm, called
MagmaClust, is publicly available as an R package.Comment: 40 page
Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making
In multi-objective decision planning and learning, much attention is paid to
producing optimal solution sets that contain an optimal policy for every
possible user preference profile. We argue that the step that follows, i.e,
determining which policy to execute by maximising the user's intrinsic utility
function over this (possibly infinite) set, is under-studied. This paper aims
to fill this gap. We build on previous work on Gaussian processes and pairwise
comparisons for preference modelling, extend it to the multi-objective decision
support scenario, and propose new ordered preference elicitation strategies
based on ranking and clustering. Our main contribution is an in-depth
evaluation of these strategies using computer and human-based experiments. We
show that our proposed elicitation strategies outperform the currently used
pairwise methods, and found that users prefer ranking most. Our experiments
further show that utilising monotonicity information in GPs by using a linear
prior mean at the start and virtual comparisons to the nadir and ideal points,
increases performance. We demonstrate our decision support framework in a
real-world study on traffic regulation, conducted with the city of Amsterdam.Comment: AAMAS 2018, Source code at
https://github.com/lmzintgraf/gp_pref_elici
Bayesian approach to Spatio-temporally Consistent Simulation of Daily Monsoon Rainfall over India
Simulation of rainfall over a region for long time-sequences can be very
useful for planning and policy-making, especially in India where the economy is
heavily reliant on monsoon rainfall. However, such simulations should be able
to preserve the known spatial and temporal characteristics of rainfall over
India. General Circulation Models (GCMs) are unable to do so, and various
rainfall generators designed by hydrologists using stochastic processes like
Gaussian Processes are also difficult to apply over the vast and highly diverse
landscape of India. In this paper, we explore a series of Bayesian models based
on conditional distributions of latent variables that describe weather
conditions at specific locations and over the whole country. During parameter
estimation from observed data, we use spatio-temporal smoothing using Markov
Random Field so that the parameters learnt are spatially and temporally
coherent. Also, we use a nonparametric spatial clustering based on Chinese
Restaurant Process to identify homogeneous regions, which are utilized by some
of the proposed models to improve spatial correlations of the simulated
rainfall. The models are able to simulate daily rainfall across India for
years, and can also utilize contextual information for conditional simulation.
We use two datasets of different spatial resolutions over India, and focus on
the period 2000-2015. We propose a large number of metrics to study the
spatio-temporal properties of the simulations by the models, and compare them
with the observed data to evaluate the strengths and weaknesses of the models
An Efficient Quality-Related Fault Diagnosis Method for Real-Time Multimode Industrial Process
Focusing on quality-related complex industrial process performance monitoring, a novel multimode process monitoring method is proposed in this paper. Firstly, principal component space clustering is implemented under the guidance of quality variables. Through extraction of model tags, clustering information of original training data can be acquired. Secondly, according to multimode characteristics of process data, the monitoring model integrated Gaussian mixture model with total projection to latent structures is effective after building the covariance description form. The multimode total projection to latent structures (MTPLS) model is the foundation of problem solving about quality-related monitoring for multimode processes. Then, a comprehensive statistics index is defined which is based on the posterior probability of the monitored samples belonging to each Gaussian component in the Bayesian theory. After that, a combined index is constructed for process monitoring. Finally, motivated by the application of traditional contribution plot in fault diagnosis, a gradient contribution rate is applied for analyzing the variation of variable contribution rate along samples. Our method can ensure the implementation of online fault monitoring and diagnosis for multimode processes. Performances of the whole proposed scheme are verified in a real industrial, hot strip mill process (HSMP) compared with some existing methods
- …