Search CORE

16 research outputs found

Infinite factorization of multiple non-parametric views

Author: A. Gelman
A. Klami
A. Klami
A. Rodriguez
A. Vinokourov
Arto Klami
C. Archambeau
C. Rasmussen
D. Blackwell
D. Blei
D. Cohn
D. Lee
D. M. Blei
D. M. Roy
G. Englebienne
I. Rivals
I. S. Dhillon
Janne Sinkkonen
K. Barnard
M. Welling
Mark Girolami
N. Friedman
N. L. Johnson
R. M. Neal
S. Becker
S. Rogers
Samuel Kaski
Simon Rogers
T. Hofmann
Y. W. Teh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Combined analysis of multiple data sources has increasing application interest, in particular for distinguishing shared and source-specific aspects. We extend this rationale of classical canonical correlation analysis into a flexible, generative and non-parametric clustering setting, by introducing a novel non-parametric hierarchical mixture model. The lower level of the model describes each source with a flexible non-parametric mixture, and the top level combines these to describe commonalities of the sources. The lower-level clusters arise from hierarchical Dirichlet Processes, inducing an infinite-dimensional contingency table between the views. The commonalities between the sources are modeled by an infinite block model of the contingency table, interpretable as non-negative factorization of infinite matrices, or as a prior for infinite contingency tables. With Gaussian mixture components plugged in for continuous measurements, the model is applied to two views of genes, mRNA expression and abundance of the produced proteins, to expose groups of genes that are co-regulated in either or both of the views. Cluster analysis of co-expression is a standard simple way of screening for co-regulation, and the two-view analysis extends the approach to distinguishing between pre- and post-translational regulation

CUED - Cambridge University Engineering Department

Recommended from our members

Mixtures of Hierarchical Topics with Pachinko Allocation

Author: Mimno David
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2007
Field of study

The four-level pachinko allocation model (PAM) (Li & McCallum, 2006) represents correlations among topics using a DAG struc- ture. It does not, however, represent a nested hierarchy of topics, with some top- ical word distributions representing the vo- cabulary that is shared among several more specic topics. This paper presents hierar- chical PAM|an enhancement that explic- itly represents a topic hierarchy. This model can be seen as combining the advantages of hLDA\u27s topical hierarchy representation with PAM\u27s ability to mix multiple leaves of the topic hierarchy. Experimental results show improvements in likelihood of held-out docu- ments, as well as mutual information between automatically-discovered topics and human- generated categories such as journals

ScholarWorks@UMass Amherst

The supervised hierarchical Dirichlet process

Author: Dai Andrew M.
Storkey Amos J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/12/2014
Field of study

We propose the supervised hierarchical Dirichlet process (sHDP), a nonparametric generative model for the joint distribution of a group of observations and a response variable directly associated with that whole group. We compare the sHDP with another leading method for regression on grouped data, the supervised latent Dirichlet allocation (sLDA) model. We evaluate our method on two real-world classification problems and two real-world regression problems. Bayesian nonparametric regression models based on the Dirichlet process, such as the Dirichlet process-generalised linear models (DP-GLM) have previously been explored; these models allow flexibility in modelling nonlinear relationships. However, until now, Hierarchical Dirichlet Process (HDP) mixtures have not seen significant use in supervised problems with grouped data since a straightforward application of the HDP on the grouped data results in learnt clusters that are not predictive of the responses. The sHDP solves this problem by allowing for clusters to be learnt jointly from the group structure and from the label assigned to each group.Comment: 14 page

arXiv.org e-Print Archive

CiteSeerX

Distance Dependent Chinese Restaurant Processes

Author: Blei David M.
Frazier Peter I.
Publication venue
Publication date: 01/01/2010
Field of study

We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for non-exchangeability. This class can be used to model many kinds of dependencies between data in infinite clustering models, including dependencies across time or space. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both observed and mixture settings. We study its performance with three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data. We also show its alternative formulation of the traditional CRP leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation

arXiv.org e-Print Archive

CiteSeerX

Keep it simple with time: A reexamination of probabilistic topic detection models

Author: Banerjee Arindam
CHANG Kuiyu
HE Qi
LIM Ee Peng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Institutional Knowledge at Singapore Management University