7,944 research outputs found
Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions
We propose a novel method for multiple clustering that assumes a
co-clustering structure (partitions in both rows and columns of the data
matrix) in each view. The new method is applicable to high-dimensional data. It
is based on a nonparametric Bayesian approach in which the number of views and
the number of feature-/subject clusters are inferred in a data-driven manner.
We simultaneously model different distribution families, such as Gaussian,
Poisson, and multinomial distributions in each cluster block. This makes our
method applicable to datasets consisting of both numerical and categorical
variables, which biomedical data typically do. Clustering solutions are based
on variational inference with mean field approximation. We apply the proposed
method to synthetic and real data, and show that our method outperforms other
multiple clustering methods both in recovering true cluster structures and in
computation time. Finally, we apply our method to a depression dataset with no
true cluster structure available, from which useful inferences are drawn about
possible clustering structures of the data
Bayesian Joint Matrix Decomposition for Data Integration with Heterogeneous Noise
Matrix decomposition is a popular and fundamental approach in machine
learning and data mining. It has been successfully applied into various fields.
Most matrix decomposition methods focus on decomposing a data matrix from one
single source. However, it is common that data are from different sources with
heterogeneous noise. A few of matrix decomposition methods have been extended
for such multi-view data integration and pattern discovery. While only few
methods were designed to consider the heterogeneity of noise in such multi-view
data for data integration explicitly. To this end, we propose a joint matrix
decomposition framework (BJMD), which models the heterogeneity of noise by
Gaussian distribution in a Bayesian framework. We develop two algorithms to
solve this model: one is a variational Bayesian inference algorithm, which
makes full use of the posterior distribution; and another is a maximum a
posterior algorithm, which is more scalable and can be easily paralleled.
Extensive experiments on synthetic and real-world datasets demonstrate that
BJMD considering the heterogeneity of noise is superior or competitive to the
state-of-the-art methods.Comment: 14 pages, 7 figures, 8 table
Optimal Continuous State POMDP Planning with Semantic Observations: A Variational Approach
This work develops novel strategies for optimal planning with semantic
observations using continuous state partially observable markov decision
processes (CPOMDPs). Two major innovations are presented in relation to
Gaussian mixture (GM) CPOMDP policy approximation methods. While existing
methods have many desirable theoretical properties, they are unable to
efficiently represent and reason over hybrid continuous-discrete probabilistic
models. The first major innovation is the derivation of closed-form variational
Bayes GM approximations of Point-Based Value Iteration Bellman policy backups,
using softmax models of continuous-discrete semantic observation probabilities.
A key benefit of this approach is that dynamic decision-making tasks can be
performed with complex non-Gaussian uncertainties, while also exploiting
continuous dynamic state space models (thus avoiding cumbersome and costly
discretization). The second major innovation is a new clustering-based
technique for mixture condensation that scales well to very large GM policy
functions and belief functions. Simulation results for a target search and
interception task with semantic observations show that the GM policies
resulting from these innovations are more effective than those produced by
other state of the art policy approximations, but require significantly less
modeling overhead and online runtime cost. Additional results show the
robustness of this approach to model errors and scaling to higher dimensions.Comment: Final version accepted to IEEE Transactions on Robotics (in press as
of August 2019
Reconciling meta-learning and continual learning with online mixtures of tasks
Learning-to-learn or meta-learning leverages data-driven inductive bias to
increase the efficiency of learning on a novel task. This approach encounters
difficulty when transfer is not advantageous, for instance, when tasks are
considerably dissimilar or change over time. We use the connection between
gradient-based meta-learning and hierarchical Bayes to propose a Dirichlet
process mixture of hierarchical Bayesian models over the parameters of an
arbitrary parametric model such as a neural network. In contrast to
consolidating inductive biases into a single set of hyperparameters, our
approach of task-dependent hyperparameter selection better handles latent
distribution shift, as demonstrated on a set of evolving, image-based, few-shot
learning benchmarks.Comment: updated experimental result
Vectorial Dimension Reduction for Tensors Based on Bayesian Inference
Dimensionality reduction for high-order tensors is a challenging problem. In
conventional approaches, higher order tensors are `vectorized` via Tucker
decomposition to obtain lower order tensors. This will destroy the inherent
high-order structures or resulting in undesired tensors, respectively. This
paper introduces a probabilistic vectorial dimensionality reduction model for
tensorial data. The model represents a tensor by employing a linear combination
of same order basis tensors, thus it offers a mechanism to directly reduce a
tensor to a vector. Under this expression, the projection base of the model is
based on the tensor CandeComp/PARAFAC (CP) decomposition and the number of free
parameters in the model only grows linearly with the number of modes rather
than exponentially. A Bayesian inference has been established via the
variational EM approach. A criterion to set the parameters (factor number of CP
decomposition and the number of extracted features) is empirically given. The
model outperforms several existing PCA-based methods and CP decomposition on
several publicly available databases in terms of classification and clustering
accuracy.Comment: Submiting to TNNL
Disease Trajectory Maps
Medical researchers are coming to appreciate that many diseases are in fact
complex, heterogeneous syndromes composed of subpopulations that express
different variants of a related complication. Time series data extracted from
individual electronic health records (EHR) offer an exciting new way to study
subtle differences in the way these diseases progress over time. In this paper,
we focus on answering two questions that can be asked using these databases of
time series. First, we want to understand whether there are individuals with
similar disease trajectories and whether there are a small number of degrees of
freedom that account for differences in trajectories across the population.
Second, we want to understand how important clinical outcomes are associated
with disease trajectories. To answer these questions, we propose the Disease
Trajectory Map (DTM), a novel probabilistic model that learns low-dimensional
representations of sparse and irregularly sampled time series. We propose a
stochastic variational inference algorithm for learning the DTM that allows the
model to scale to large modern medical datasets. To demonstrate the DTM, we
analyze data collected on patients with the complex autoimmune disease,
scleroderma. We find that DTM learns meaningful representations of disease
trajectories and that the representations are significantly associated with
important clinical outcomes
Clustering Airbnb Reviews
In the last decade, online customer reviews increasingly exert influence on
consumers' decision when booking accommodation online. The renewal importance
to the concept of word-of mouth is reflected in the growing interests in
investigating consumers' experience by analyzing their online reviews through
the process of text mining and sentiment analysis. A clustering approach is
developed for Boston Airbnb reviews submitted in the English language and
collected from 2009 to 2016. This approach is based on a mixture of latent
variable models, which provides an appealing framework for handling clustered
binary data. We address here the problem of discovering meaningful segments of
consumers that are coherent from both the underlying topics and the sentiment
behind the reviews. A penalized mixture of latent traits approach is developed
to reduce the number of parameters and identify variables that are not
informative for clustering. The introduction of component-specific rate
parameters avoids the over-penalization that can occur when inferring a shared
rate parameter on clustered data. We divided the guests into four groups --
property driven guests, host driven guests, guests with recent overall negative
stay and guests with some negative experiences
Probabilistic Combination of Classifier and Cluster Ensembles for Non-transductive Learning
Unsupervised models can provide supplementary soft constraints to help
classify new target data under the assumption that similar objects in the
target set are more likely to share the same class label. Such models can also
help detect possible differences between training and target distributions,
which is useful in applications where concept drift may take place. This paper
describes a Bayesian framework that takes as input class labels from existing
classifiers (designed based on labeled data from the source domain), as well as
cluster labels from a cluster ensemble operating solely on the target data to
be classified, and yields a consensus labeling of the target data. This
framework is particularly useful when the statistics of the target data drift
or change from those of the training data. We also show that the proposed
framework is privacy-aware and allows performing distributed learning when
data/models have sharing restrictions. Experiments show that our framework can
yield superior results to those provided by applying classifier ensembles only
Model-Based Clustering of Time-Evolving Networks through Temporal Exponential-Family Random Graph Models
Dynamic networks are a general language for describing time-evolving complex
systems, and discrete time network models provide an emerging statistical
technique for various applications. It is a fundamental research question to
detect the community structure in time-evolving networks. However, due to
significant computational challenges and difficulties in modeling communities
of time-evolving networks, there is little progress in the current literature
to effectively find communities in time-evolving networks. In this work, we
propose a novel model-based clustering framework for time-evolving networks
based on discrete time exponential-family random graph models. To choose the
number of communities, we use conditional likelihood to construct an effective
model selection criterion. Furthermore, we propose an efficient variational
expectation-maximization (EM) algorithm to find approximate maximum likelihood
estimates of network parameters and mixing proportions. By using variational
methods and minorization-maximization (MM) techniques, our method has appealing
scalability for large-scale time-evolving networks. The power of our method is
demonstrated in simulation studies and empirical applications to international
trade networks and the collaboration networks of a large American research
university.Comment: 30 pages, 4 figure
A Truncated EM Approach for Spike-and-Slab Sparse Coding
We study inference and learning based on a sparse coding model with
`spike-and-slab' prior. As in standard sparse coding, the model used assumes
independent latent sources that linearly combine to generate data points.
However, instead of using a standard sparse prior such as a Laplace
distribution, we study the application of a more flexible `spike-and-slab'
distribution which models the absence or presence of a source's contribution
independently of its strength if it contributes. We investigate two approaches
to optimize the parameters of spike-and-slab sparse coding: a novel truncated
EM approach and, for comparison, an approach based on standard factored
variational distributions. The truncated approach can be regarded as a
variational approach with truncated posteriors as variational distributions. In
applications to source separation we find that both approaches improve the
state-of-the-art in a number of standard benchmarks, which argues for the use
of `spike-and-slab' priors for the corresponding data domains. Furthermore, we
find that the truncated EM approach improves on the standard factored approach
in source separation taskswhich hints to biases introduced by assuming
posterior independence in the factored variational approach. Likewise, on a
standard benchmark for image denoising, we find that the truncated EM approach
improves on the factored variational approach. While the performance of the
factored approach saturates with increasing numbers of hidden dimensions, the
performance of the truncated approach improves the state-of-the-art for higher
noise levels.Comment: To appear in JMLR (2014
- …