637 research outputs found
Discovering transcriptional modules by Bayesian data integration
Motivation: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets.
Results: We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs
Mixed membership stochastic blockmodels
Observations consisting of measurements on relationships for pairs of objects
arise in many settings, such as protein interaction and gene regulatory
networks, collections of author-recipient email, and social networks. Analyzing
such data with probabilisic models can be delicate because the simple
exchangeability assumptions underlying many boilerplate models no longer hold.
In this paper, we describe a latent variable model of such data called the
mixed membership stochastic blockmodel. This model extends blockmodels for
relational data to ones which capture mixed membership latent relational
structure, thus providing an object-specific low-dimensional representation. We
develop a general variational inference algorithm for fast approximate
posterior inference. We explore applications to social and protein interaction
networks.Comment: 46 pages, 14 figures, 3 table
The EM Algorithm and the Rise of Computational Biology
In the past decade computational biology has grown from a cottage industry
with a handful of researchers to an attractive interdisciplinary field,
catching the attention and imagination of many quantitatively-minded
scientists. Of interest to us is the key role played by the EM algorithm during
this transformation. We survey the use of the EM algorithm in a few important
computational biology problems surrounding the "central dogma"; of molecular
biology: from DNA to RNA and then to proteins. Topics of this article include
sequence motif discovery, protein sequence alignment, population genetics,
evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Computing the likelihood of sequence segmentation under Markov modelling
I tackle the problem of partitioning a sequence into homogeneous segments,
where homogeneity is defined by a set of Markov models. The problem is to study
the likelihood that a sequence is divided into a given number of segments.
Here, the moments of this likelihood are computed through an efficient
algorithm. Unlike methods involving Hidden Markov Models, this algorithm does
not require probability transitions between the models. Among many possible
usages of the likelihood, I present a maximum \textit{a posteriori} probability
criterion to predict the number of homogeneous segments into which a sequence
can be divided, and an application of this method to find CpG islands
Model selection and sensitivity analysis for sequence pattern models
In this article we propose a maximal a posteriori (MAP) criterion for model
selection in the motif discovery problem and investigate conditions under which
the MAP asymptotically gives a correct prediction of model size. We also
investigate robustness of the MAP to prior specification and provide guidelines
for choosing prior hyper-parameters for motif models based on sensitivity
considerations.Comment: Published in at http://dx.doi.org/10.1214/193940307000000301 the IMS
Collections (http://www.imstat.org/publications/imscollections.htm) by the
Institute of Mathematical Statistics (http://www.imstat.org
- âŠ