Search CORE

2,010 research outputs found

A joint finite mixture model for clustering genes from independent Gaussian and beta distributed data

Author: A Kobayashi
C Biernacki
C Fraley
DX Jiang
G Mclachlan
G Schwarz
GD Jr
H Akaike
H Bjökbacka
H Bozdogan
H Lähdesmäki
Harri Lähdesmäki
LA O'Neill
Olli Yli-Harja
P Smyth
R Durbin
SA Ramsey
T Oyake
TI Lee
Timo Erkkilä
W Pan
Xiaofeng Dai
Y Ji
Y Okada
Publication venue: BioMed Central
Publication date: 29/05/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Integrative Model-based clustering of microarray methylation and expression data

Author: Booth James G.
Figueroa Maria E.
Kormaksson Matthias
Melnick Ari
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

In many fields, researchers are interested in large and complex biological processes. Two important examples are gene expression and DNA methylation in genetics. One key problem is to identify aberrant patterns of these processes and discover biologically distinct groups. In this article we develop a model-based method for clustering such data. The basis of our method involves the construction of a likelihood for any given partition of the subjects. We introduce cluster specific latent indicators that, along with some standard assumptions, impose a specific mixture distribution on each cluster. Estimation is carried out using the EM algorithm. The methods extend naturally to multiple data types of a similar nature, which leads to an integrated analysis over multiple data platforms, resulting in higher discriminating power.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS533 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Miami: Scholarship Miami

Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data

Author: Banfield
Bao
Battle
Bezdek
Fernández
Fraley
Frühwirth-Schnatter
Green
Kuan
McLachlan
Mo
Nagalakshmi
Pettitt
Raftery
Ramos
Spiegelhalter
Stephens
Thomas
Xing
Zhang
Publication venue
Publication date: 12/05/2016
Field of study

Model-based clustering is a technique widely used to group a collection of units into mutually exclusive groups. There are, however, situations in which an observation could in principle belong to more than one cluster. In the context of Next-Generation Sequencing (NGS) experiments, for example, the signal observed in the data might be produced by two (or more) different biological processes operating together and a gene could participate in both (or all) of them. We propose a novel approach to cluster NGS discrete data, coming from a ChIP-Seq experiment, with a mixture model, allowing each unit to belong potentially to more than one group: these multiple allocation clusters can be flexibly defined via a function combining the features of the original groups without introducing new parameters. The formulation naturally gives rise to a `zero-inflation group' in which values close to zero can be allocated, acting as a correction for the abundance of zeros that manifest in this type of data. We take into account the spatial dependency between observations, which is described through a latent Conditional Auto-Regressive process that can reflect different dependency patterns. We assess the performance of our model within a simulation environment and then we apply it to ChIP-seq real data.Comment: 25 pages; 3 tables, 6 figure

arXiv.org e-Print Archive

Proceedings - University of Groningen

Crossref

University of Groningen

ARTS repository - University of Groningen

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Dissertations of the University of Groningen

Sparse covariance estimation in heterogeneous samples

Author: Dobra Adrian
Lenkoski Alex
Rodriguez Abel
Publication venue
Publication date: 23/01/2010
Field of study

Standard Gaussian graphical models (GGMs) implicitly assume that the conditional independence among variables is common to all observations in the sample. However, in practice, observations are usually collected form heterogeneous populations where such assumption is not satisfied, leading in turn to nonlinear relationships among variables. To tackle these problems we explore mixtures of GGMs; in particular, we consider both infinite mixture models of GGMs and infinite hidden Markov models with GGM emission distributions. Such models allow us to divide a heterogeneous population into homogenous groups, with each cluster having its own conditional independence structure. The main advantage of considering infinite mixtures is that they allow us easily to estimate the number of number of subpopulations in the sample. As an illustration, we study the trends in exchange rate fluctuations in the pre-Euro era. This example demonstrates that the models are very flexible while providing extremely interesting interesting insights into real-life applications

arXiv.org e-Print Archive

CiteSeerX

Crossref

Unsupervised Bayesian linear unmixing of gene expression microarrays

Author: A Hyvärinen
Aimee K Zaas
AK Zaas
Alfred O Hero III
B Chen
CM Carvalho
CP Robert
Cécile Bazot
D Dueck
DD Lee
EJ Fertig
Geoffrey S Ginsburg
GJ McLachlan
J Baek
J Paisley
Jean-Yves Tourneret
JM Nascimento
KY Yeung
M West
ME Winter
N Dobigeon
N Dobigeon
Nicolas Dobigeon
P Fogel
PJ Green
RO Duda
TD Moloshok
TF Cox
V Nikulin
WR Gilks
Y Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: This paper introduces a new constrained model and the corresponding algorithm, called unsupervised Bayesian linear unmixing (uBLU), to identify biological signatures from high dimensional assays like gene expression microarrays. The basis for uBLU is a Bayesian model for the data samples which are represented as an additive mixture of random positive gene signatures, called factors, with random positive mixing coefficients, called factor scores, that specify the relative contribution of each signature to a specific sample. The particularity of the proposed method is that uBLU constrains the factor loadings to be non-negative and the factor scores to be probability distributions over the factors. Furthermore, it also provides estimates of the number of factors. A Gibbs sampling strategy is adopted here to generate random samples according to the posterior distribution of the factors, factor scores, and number of factors. These samples are then used to estimate all the unknown parameters. Results: Firstly, the proposed uBLU method is applied to several simulated datasets with known ground truth and compared with previous factor decomposition methods, such as principal component analysis (PCA), non negative matrix factorization (NMF), Bayesian factor regression modeling (BFRM), and the gradient-based algorithm for general matrix factorization (GB-GMF). Secondly, we illustrate the application of uBLU on a real time-evolving gene expression dataset from a recent viral challenge study in which individuals have been inoculated with influenza A/H3N2/Wisconsin. We show that the uBLU method significantly outperforms the other methods on the simulated and real data sets considered here. Conclusions: The results obtained on synthetic and real data illustrate the accuracy of the proposed uBLU method when compared to other factor decomposition methods from the literature (PCA, NMF, BFRM, and GB-GMF). The uBLU method identifies an inflammatory component closely associated with clinical symptom scores collected during the study. Using a constrained model allows recovery of all the inflammatory genes in a single factor

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Springer - Publisher Connector

Open Archive Toulouse Archive Ouverte

PubMed Central

Deep Blue Documents at the University of Michigan

Gamma-based clustering via ordered means with application to gene-expression analysis

Author: Chung Lisa M.
Newton Michael A.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 09/11/2012
Field of study

Discrete mixture models provide a well-known basis for effective clustering algorithms, although technical challenges have limited their scope. In the context of gene-expression data analysis, a model is presented that mixes over a finite catalog of structures, each one representing equality and inequality constraints among latent expected values. Computations depend on the probability that independent gamma-distributed variables attain each of their possible orderings. Each ordering event is equivalent to an event in independent negative-binomial random variables, and this finding guides a dynamic-programming calculation. The structuring of mixture-model components according to constraints among latent means leads to strict concavity of the mixture log likelihood. In addition to its beneficial numerical properties, the clustering method shows promising results in an empirical study.Comment: Published in at http://dx.doi.org/10.1214/10-AOS805 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref