9,565 research outputs found
Laplace Approximated EM Microarray Analysis: An Empirical Bayes Approach for Comparative Microarray Experiments
A two-groups mixed-effects model for the comparison of (normalized)
microarray data from two treatment groups is considered. Most competing
parametric methods that have appeared in the literature are obtained as special
cases or by minor modification of the proposed model. Approximate maximum
likelihood fitting is accomplished via a fast and scalable algorithm, which we
call LEMMA (Laplace approximated EM Microarray Analysis). The posterior odds of
treatment gene interactions, derived from the model, involve shrinkage
estimates of both the interactions and of the gene specific error variances.
Genes are classified as being associated with treatment based on the posterior
odds and the local false discovery rate (f.d.r.) with a fixed cutoff. Our
model-based approach also allows one to declare the non-null status of a gene
by controlling the false discovery rate (FDR). It is shown in a detailed
simulation study that the approach outperforms well-known competitors. We also
apply the proposed methodology to two previously analyzed microarray examples.
Extensions of the proposed method to paired treatments and multiple treatments
are also discussed.Comment: Published in at http://dx.doi.org/10.1214/10-STS339 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Partial mixture model for tight clustering of gene expression time-course
Background: Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to
this area of research. On the other hand, there has been extensive use of maximum likelihood techniques for model parameter estimation. By contrast, the minimum distance estimator has been largely ignored.
Results: In this paper we show the inherent robustness of the minimum distance estimator that makes it a powerful tool for parameter estimation in model-based time-course clustering. To apply minimum distance estimation, a partial mixture model that can naturally incorporate replicate
information and allow scattered genes is formulated. We provide experimental results of simulated data fitting, where the minimum distance estimator demonstrates superior performance to the maximum likelihood estimator. Both biological and statistical validations are conducted on a
simulated dataset and two real gene expression datasets. Our proposed partial regression clustering algorithm scores top in Gene Ontology driven evaluation, in comparison with four other popular clustering algorithms.
Conclusion: For the first time partial mixture model is successfully extended to time-course data analysis. The robustness of our partial regression clustering algorithm proves the suitability of the ombination of both partial mixture model and minimum distance estimator in this field. We show that tight clustering not only is capable to generate more profound understanding of the dataset
under study well in accordance to established biological knowledge, but also presents interesting new hypotheses during interpretation of clustering results. In particular, we provide biological evidences that scattered genes can be relevant and are interesting subjects for study, in contrast to prevailing opinion
The latent process decomposition of cDNA microarray data sets
We present a new computational technique (a software implementation, data sets, and supplementary information are available at http://www.enm.bris.ac.uk/lpd/) which enables the probabilistic analysis of cDNA microarray data and we demonstrate its effectiveness in identifying features of biomedical importance. A hierarchical Bayesian model, called latent process decomposition (LPD), is introduced in which each sample in the data set is represented as a combinatorial mixture over a finite set of latent processes, which are expected to correspond to biological processes. Parameters in the model are estimated using efficient variational methods. This type of probabilistic model is most appropriate for the interpretation of measurement data generated by cDNA microarray technology. For determining informative substructure in such data sets, the proposed model has several important advantages over the standard use of dendrograms. First, the ability to objectively assess the optimal number of sample clusters. Second, the ability to represent samples and gene expression levels using a common set of latent variables (dendrograms cluster samples and gene expression values separately which amounts to two distinct reduced space representations). Third, in contrast to standard cluster models, observations are not assigned to a single cluster and, thus, for example, gene expression levels are modeled via combinations of the latent processes identified by the algorithm. We show this new method compares favorably with alternative cluster analysis methods. To illustrate its potential, we apply the proposed technique to several microarray data sets for cancer. For these data sets it successfully decomposes the data into known subtypes and indicates possible further taxonomic subdivision in addition to highlighting, in a wholly unsupervised manner, the importance of certain genes which are known to be medically significant. To illustrate its wider applicability, we also illustrate its performance on a microarray data set for yeast
The EM Algorithm and the Rise of Computational Biology
In the past decade computational biology has grown from a cottage industry
with a handful of researchers to an attractive interdisciplinary field,
catching the attention and imagination of many quantitatively-minded
scientists. Of interest to us is the key role played by the EM algorithm during
this transformation. We survey the use of the EM algorithm in a few important
computational biology problems surrounding the "central dogma"; of molecular
biology: from DNA to RNA and then to proteins. Topics of this article include
sequence motif discovery, protein sequence alignment, population genetics,
evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
MENGA: a new comprehensive tool for the integration of neuroimaging data and the Allen human brain transcriptome atlas
Brain-wide mRNA mappings offer a great potential for neuroscience research as they can provide information about system proteomics. In a previous work we have correlated mRNA maps with the binding patterns of radioligands targeting specific molecular systems and imaged with positron emission tomography (PET) in unrelated control groups. This approach is potentially applicable to any imaging modality as long as an efficient procedure of imaging-genomic matching is provided. In the original work we considered mRNA brain maps of the whole human genome derived from the Allen human brain database (ABA) and we performed the analysis with a specific region-based segmentation with a resolution that was limited by the PET data parcellation. There we identified the need for a platform for imaging-genomic integration that should be usable with any imaging modalities and fully exploit the high resolution mapping of ABA dataset.In this work we present MENGA (Multimodal Environment for Neuroimaging and Genomic Analysis), a software platform that allows the investigation of the correlation patterns between neuroimaging data of any sort (both functional and structural) with mRNA gene expression profiles derived from the ABA database at high resolution.We applied MENGA to six different imaging datasets from three modalities (PET, single photon emission tomography and magnetic resonance imaging) targeting the dopamine and serotonin receptor systems and the myelin molecular structure. We further investigated imaging-genomic correlations in the case of mismatch between selected proteins and imaging targets
Developmental constraints on vertebrate genome evolution
Constraints in embryonic development are thought to bias the direction of
evolution by making some changes less likely, and others more likely, depending
on their consequences on ontogeny. Here, we characterize the constraints acting
on genome evolution in vertebrates. We used gene expression data from two
vertebrates: zebrafish, using a microarray experiment spanning 14 stages of
development, and mouse, using EST counts for 26 stages of development. We show
that, in both species, genes expressed early in development (1) have a more
dramatic effect of knock-out or mutation and (2) are more likely to revert to
single copy after whole genome duplication, relative to genes expressed late.
This supports high constraints on early stages of vertebrate development,
making them less open to innovations (gene gain or gene loss). Results are
robust to different sources of data-gene expression from microarrays, ESTs, or
in situ hybridizations; and mutants from directed KO, transgenic insertions,
point mutations, or morpholinos. We determine the pattern of these constraints,
which differs from the model used to describe vertebrate morphological
conservation ("hourglass" model). While morphological constraints reach a
maximum at mid-development (the "phylotypic" stage), genomic constraints appear
to decrease in a monotonous manner over developmental time
Weighted-Lasso for Structured Network Inference from Time Course Data
We present a weighted-Lasso method to infer the parameters of a first-order
vector auto-regressive model that describes time course expression data
generated by directed gene-to-gene regulation networks. These networks are
assumed to own a prior internal structure of connectivity which drives the
inference method. This prior structure can be either derived from prior
biological knowledge or inferred by the method itself. We illustrate the
performance of this structure-based penalization both on synthetic data and on
two canonical regulatory networks, first yeast cell cycle regulation network by
analyzing Spellman et al's dataset and second E. coli S.O.S. DNA repair network
by analysing U. Alon's lab data
- …