3,483 research outputs found
Probabilities of spurious connections in gene networks: Application to expression time series
Motivation: The reconstruction of gene networks from gene expression
microarrays is gaining popularity as methods improve and as more data become
available. The reliability of such networks could be judged by the probability
that a connection between genes is spurious, resulting from chance fluctuations
rather than from a true biological relationship. Results: Unlike the false
discovery rate and positive false discovery rate, the decisive false discovery
rate (dFDR) is exactly equal to a conditional probability without assuming
independence or the randomness of hypothesis truth values. This property is
useful not only in the common application to the detection of differential gene
expression, but also in determining the probability of a spurious connection in
a reconstructed gene network. Estimators of the dFDR can estimate each of three
probabilities: 1. The probability that two genes that appear to be associated
with each other lack such association. 2. The probability that a time ordering
observed for two associated genes is misleading. 3. The probability that a time
ordering observed for two genes is misleading, either because they are not
associated or because they are associated without a lag in time. The first
probability applies to both static and dynamic gene networks, and the other two
only apply to dynamic gene networks. Availability: Cross-platform software for
network reconstruction, probability estimation, and plotting is free from
http://www.davidbickel.com as R functions and a Java application.Comment: Like q-bio.GN/0404032, this was rejected in March 2004 because it was
submitted to the math archive. The only modification is a corrected reference
to q-bio.GN/0404032, which was not modified at al
Empirical Bayes estimation of posterior probabilities of enrichment
To interpret differentially expressed genes or other discovered features,
researchers conduct hypothesis tests to determine which biological categories
such as those of the Gene Ontology (GO) are enriched in the sense of having
differential representation among the discovered features. We study application
of better estimators of the local false discovery rate (LFDR), a probability
that the biological category has equivalent representation among the
preselected features.
We identified three promising estimators of the LFDR for detecting
differential representation: a semiparametric estimator (SPE), a normalized
maximum likelihood estimator (NMLE), and a maximum likelihood estimator (MLE).
We found that the MLE performs at least as well as the SPE for on the order of
100 of GO categories even when the ideal number of components in its underlying
mixture model is unknown. However, the MLE is unreliable when the number of GO
categories is small compared to the number of PMM components. Thus, if the
number of categories is on the order of 10, the SPE is a more reliable LFDR
estimator. The NMLE depends not only on the data but also on a specified value
of the prior probability of differential representation. It is therefore an
appropriate LFDR estimator only when the number of GO categories is too small
for application of the other methods.
For enrichment detection, we recommend estimating the LFDR by the MLE given
at least a medium number (~100) of GO categories, by the SPE given a small
number of GO categories (~10), and by the NMLE given a very small number (~1)
of GO categories.Comment: exhaustive revision of Zhenyu Yang and David R. Bickel, "Minimum
Description Length Measures of Evidence for Enrichment" (December 2010).
COBRA Preprint Series. Article 76. http://biostats.bepress.com/cobra/ps/art7
- …