320 research outputs found

    Empirical Bayes estimation of posterior probabilities of enrichment

    Get PDF
    To interpret differentially expressed genes or other discovered features, researchers conduct hypothesis tests to determine which biological categories such as those of the Gene Ontology (GO) are enriched in the sense of having differential representation among the discovered features. We study application of better estimators of the local false discovery rate (LFDR), a probability that the biological category has equivalent representation among the preselected features. We identified three promising estimators of the LFDR for detecting differential representation: a semiparametric estimator (SPE), a normalized maximum likelihood estimator (NMLE), and a maximum likelihood estimator (MLE). We found that the MLE performs at least as well as the SPE for on the order of 100 of GO categories even when the ideal number of components in its underlying mixture model is unknown. However, the MLE is unreliable when the number of GO categories is small compared to the number of PMM components. Thus, if the number of categories is on the order of 10, the SPE is a more reliable LFDR estimator. The NMLE depends not only on the data but also on a specified value of the prior probability of differential representation. It is therefore an appropriate LFDR estimator only when the number of GO categories is too small for application of the other methods. For enrichment detection, we recommend estimating the LFDR by the MLE given at least a medium number (~100) of GO categories, by the SPE given a small number of GO categories (~10), and by the NMLE given a very small number (~1) of GO categories.Comment: exhaustive revision of Zhenyu Yang and David R. Bickel, "Minimum Description Length Measures of Evidence for Enrichment" (December 2010). COBRA Preprint Series. Article 76. http://biostats.bepress.com/cobra/ps/art7

    A prior-free framework of coherent inference and its derivation of simple shrinkage estimators

    Get PDF
    The reasoning behind uses of confidence intervals and p-values in scientific practice may be made coherent by modeling the inferring statistician or scientist as an idealized intelligent agent. With other things equal, such an agent regards a hypothesis coinciding with a confidence interval of a higher confidence level as more certain than a hypothesis coinciding with a confidence interval of a lower confidence level. The agent uses different methods of confidence intervals conditional on what information is available. The coherence requirement means all levels of certainty of hypotheses about the parameter agree with the same distribution of certainty over parameter space. The result is a unique and coherent fiducial distribution that encodes the post-data certainty levels of the agent. While many coherent fiducial distributions coincide with confidence distributions or Bayesian posterior distributions, there is a general class of coherent fiducial distributions that equates the two-sided p-value with the probability that the null hypothesis is true. The use of that class leads to point estimators and interval estimators that can be derived neither from the dominant frequentist theory nor from Bayesian theories that rule out data-dependent priors. These simple estimators shrink toward the parameter value of the null hypothesis without relying on asymptotics or on prior distributions
    • …
    corecore