Search CORE

7,944 research outputs found

Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

Author: Doya Kenji
Okada Go
Okamoto Yasumasa
Shimizu Yu
Takamura Masahiro
Toki Shigeru
Tokuda Tomoki
Yamamoto Tetsuya
Yamawaki Shigeto
Yoshimoto Junichiro
Yoshimura Shinpei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 21/10/2015
Field of study

We propose a novel method for multiple clustering that assumes a co-clustering structure (partitions in both rows and columns of the data matrix) in each view. The new method is applicable to high-dimensional data. It is based on a nonparametric Bayesian approach in which the number of views and the number of feature-/subject clusters are inferred in a data-driven manner. We simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block. This makes our method applicable to datasets consisting of both numerical and categorical variables, which biomedical data typically do. Clustering solutions are based on variational inference with mean field approximation. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data

arXiv.org e-Print Archive

Bayesian Joint Matrix Decomposition for Data Integration with Heterogeneous Noise

Author: Zhang Chihao
Zhang Shihua
Publication venue
Publication date: 08/12/2017
Field of study

Matrix decomposition is a popular and fundamental approach in machine learning and data mining. It has been successfully applied into various fields. Most matrix decomposition methods focus on decomposing a data matrix from one single source. However, it is common that data are from different sources with heterogeneous noise. A few of matrix decomposition methods have been extended for such multi-view data integration and pattern discovery. While only few methods were designed to consider the heterogeneity of noise in such multi-view data for data integration explicitly. To this end, we propose a joint matrix decomposition framework (BJMD), which models the heterogeneity of noise by Gaussian distribution in a Bayesian framework. We develop two algorithms to solve this model: one is a variational Bayesian inference algorithm, which makes full use of the posterior distribution; and another is a maximum a posterior algorithm, which is more scalable and can be easily paralleled. Extensive experiments on synthetic and real-world datasets demonstrate that BJMD considering the heterogeneity of noise is superior or competitive to the state-of-the-art methods.Comment: 14 pages, 7 figures, 8 table

arXiv.org e-Print Archive

Optimal Continuous State POMDP Planning with Semantic Observations: A Variational Approach

Author: Ahmed Nisar
Burks Luke
Loefgren Ian
Publication venue
Publication date: 07/08/2019
Field of study

This work develops novel strategies for optimal planning with semantic observations using continuous state partially observable markov decision processes (CPOMDPs). Two major innovations are presented in relation to Gaussian mixture (GM) CPOMDP policy approximation methods. While existing methods have many desirable theoretical properties, they are unable to efficiently represent and reason over hybrid continuous-discrete probabilistic models. The first major innovation is the derivation of closed-form variational Bayes GM approximations of Point-Based Value Iteration Bellman policy backups, using softmax models of continuous-discrete semantic observation probabilities. A key benefit of this approach is that dynamic decision-making tasks can be performed with complex non-Gaussian uncertainties, while also exploiting continuous dynamic state space models (thus avoiding cumbersome and costly discretization). The second major innovation is a new clustering-based technique for mixture condensation that scales well to very large GM policy functions and belief functions. Simulation results for a target search and interception task with semantic observations show that the GM policies resulting from these innovations are more effective than those produced by other state of the art policy approximations, but require significantly less modeling overhead and online runtime cost. Additional results show the robustness of this approach to model errors and scaling to higher dimensions.Comment: Final version accepted to IEEE Transactions on Robotics (in press as of August 2019

arXiv.org e-Print Archive

Reconciling meta-learning and continual learning with online mixtures of tasks

Author: Grant Erin
Griffiths Thomas L.
Heller Katherine
Jerfel Ghassen
Publication venue
Publication date: 19/06/2019
Field of study

Learning-to-learn or meta-learning leverages data-driven inductive bias to increase the efficiency of learning on a novel task. This approach encounters difficulty when transfer is not advantageous, for instance, when tasks are considerably dissimilar or change over time. We use the connection between gradient-based meta-learning and hierarchical Bayes to propose a Dirichlet process mixture of hierarchical Bayesian models over the parameters of an arbitrary parametric model such as a neural network. In contrast to consolidating inductive biases into a single set of hyperparameters, our approach of task-dependent hyperparameter selection better handles latent distribution shift, as demonstrated on a set of evolving, image-based, few-shot learning benchmarks.Comment: updated experimental result

arXiv.org e-Print Archive

Vectorial Dimension Reduction for Tensors Based on Bayesian Inference

Author: Gao Junbin
Hu Yongli
Ju Fujiao
Sun Yanfeng
Yin Baocai
Publication venue
Publication date: 02/07/2017
Field of study

Dimensionality reduction for high-order tensors is a challenging problem. In conventional approaches, higher order tensors are `vectorized` via Tucker decomposition to obtain lower order tensors. This will destroy the inherent high-order structures or resulting in undesired tensors, respectively. This paper introduces a probabilistic vectorial dimensionality reduction model for tensorial data. The model represents a tensor by employing a linear combination of same order basis tensors, thus it offers a mechanism to directly reduce a tensor to a vector. Under this expression, the projection base of the model is based on the tensor CandeComp/PARAFAC (CP) decomposition and the number of free parameters in the model only grows linearly with the number of modes rather than exponentially. A Bayesian inference has been established via the variational EM approach. A criterion to set the parameters (factor number of CP decomposition and the number of extracted features) is empirically given. The model outperforms several existing PCA-based methods and CP decomposition on several publicly available databases in terms of classification and clustering accuracy.Comment: Submiting to TNNL

arXiv.org e-Print Archive

Disease Trajectory Maps

Author: Arora Raman
Schulam Peter
Publication venue
Publication date: 29/06/2016
Field of study

Medical researchers are coming to appreciate that many diseases are in fact complex, heterogeneous syndromes composed of subpopulations that express different variants of a related complication. Time series data extracted from individual electronic health records (EHR) offer an exciting new way to study subtle differences in the way these diseases progress over time. In this paper, we focus on answering two questions that can be asked using these databases of time series. First, we want to understand whether there are individuals with similar disease trajectories and whether there are a small number of degrees of freedom that account for differences in trajectories across the population. Second, we want to understand how important clinical outcomes are associated with disease trajectories. To answer these questions, we propose the Disease Trajectory Map (DTM), a novel probabilistic model that learns low-dimensional representations of sparse and irregularly sampled time series. We propose a stochastic variational inference algorithm for learning the DTM that allows the model to scale to large modern medical datasets. To demonstrate the DTM, we analyze data collected on patients with the complex autoimmune disease, scleroderma. We find that DTM learns meaningful representations of disease trajectories and that the representations are significantly associated with important clinical outcomes

arXiv.org e-Print Archive

Clustering Airbnb Reviews

Author: McNicholas Paul D.
Tang Yang
Publication venue
Publication date: 27/06/2019
Field of study

In the last decade, online customer reviews increasingly exert influence on consumers' decision when booking accommodation online. The renewal importance to the concept of word-of mouth is reflected in the growing interests in investigating consumers' experience by analyzing their online reviews through the process of text mining and sentiment analysis. A clustering approach is developed for Boston Airbnb reviews submitted in the English language and collected from 2009 to 2016. This approach is based on a mixture of latent variable models, which provides an appealing framework for handling clustered binary data. We address here the problem of discovering meaningful segments of consumers that are coherent from both the underlying topics and the sentiment behind the reviews. A penalized mixture of latent traits approach is developed to reduce the number of parameters and identify variables that are not informative for clustering. The introduction of component-specific rate parameters avoids the over-penalization that can occur when inferring a shared rate parameter on clustered data. We divided the guests into four groups -- property driven guests, host driven guests, guests with recent overall negative stay and guests with some negative experiences

arXiv.org e-Print Archive

Probabilistic Combination of Classifier and Cluster Ensembles for Non-transductive Learning

Author: Acharya Ayan
Ghosh Joydeep
Hruschka Eduardo R.
Ruvini Jean-David
Sarwar Badrul
Publication venue
Publication date: 10/11/2012
Field of study

Unsupervised models can provide supplementary soft constraints to help classify new target data under the assumption that similar objects in the target set are more likely to share the same class label. Such models can also help detect possible differences between training and target distributions, which is useful in applications where concept drift may take place. This paper describes a Bayesian framework that takes as input class labels from existing classifiers (designed based on labeled data from the source domain), as well as cluster labels from a cluster ensemble operating solely on the target data to be classified, and yields a consensus labeling of the target data. This framework is particularly useful when the statistics of the target data drift or change from those of the training data. We also show that the proposed framework is privacy-aware and allows performing distributed learning when data/models have sharing restrictions. Experiments show that our framework can yield superior results to those provided by applying classifier ensembles only

arXiv.org e-Print Archive

Model-Based Clustering of Time-Evolving Networks through Temporal Exponential-Family Random Graph Models

Author: Hunter David R.
Lee Kevin H.
Xue Lingzhou
Publication venue
Publication date: 20/12/2017
Field of study

Dynamic networks are a general language for describing time-evolving complex systems, and discrete time network models provide an emerging statistical technique for various applications. It is a fundamental research question to detect the community structure in time-evolving networks. However, due to significant computational challenges and difficulties in modeling communities of time-evolving networks, there is little progress in the current literature to effectively find communities in time-evolving networks. In this work, we propose a novel model-based clustering framework for time-evolving networks based on discrete time exponential-family random graph models. To choose the number of communities, we use conditional likelihood to construct an effective model selection criterion. Furthermore, we propose an efficient variational expectation-maximization (EM) algorithm to find approximate maximum likelihood estimates of network parameters and mixing proportions. By using variational methods and minorization-maximization (MM) techniques, our method has appealing scalability for large-scale time-evolving networks. The power of our method is demonstrated in simulation studies and empirical applications to international trade networks and the collaboration networks of a large American research university.Comment: 30 pages, 4 figure

arXiv.org e-Print Archive

A Truncated EM Approach for Spike-and-Slab Sparse Coding

Author: Lücke Jörg
Sheikh Abdul-Saboor
Shelton Jacquelyn A.
Publication venue
Publication date: 03/09/2014
Field of study

We study inference and learning based on a sparse coding model with `spike-and-slab' prior. As in standard sparse coding, the model used assumes independent latent sources that linearly combine to generate data points. However, instead of using a standard sparse prior such as a Laplace distribution, we study the application of a more flexible `spike-and-slab' distribution which models the absence or presence of a source's contribution independently of its strength if it contributes. We investigate two approaches to optimize the parameters of spike-and-slab sparse coding: a novel truncated EM approach and, for comparison, an approach based on standard factored variational distributions. The truncated approach can be regarded as a variational approach with truncated posteriors as variational distributions. In applications to source separation we find that both approaches improve the state-of-the-art in a number of standard benchmarks, which argues for the use of `spike-and-slab' priors for the corresponding data domains. Furthermore, we find that the truncated EM approach improves on the standard factored approach in source separation tasks

-

which hints to biases introduced by assuming posterior independence in the factored variational approach. Likewise, on a standard benchmark for image denoising, we find that the truncated EM approach improves on the factored variational approach. While the performance of the factored approach saturates with increasing numbers of hidden dimensions, the performance of the truncated approach improves the state-of-the-art for higher noise levels.Comment: To appear in JMLR (2014

arXiv.org e-Print Archive