29 research outputs found

    Testing a Large Number of Composite Null Hypotheses Using Conditionally Symmetric Multidimensional Gaussian Mixtures in Genome-Wide Studies

    Full text link
    Causal mediation analysis, pleiotropy analysis, and replication analysis are three highly popular genetic study designs. Although these analyses address different scientific questions, the underlying inference problems all involve large-scale testing of composite null hypotheses. The goal is to determine whether all null hypotheses - as opposed to at least one - in a set of individual tests should simultaneously be rejected. Various recent methodology has been proposed for the aforementioned situations, and an appealing empirical Bayes strategy is to apply the popular two-group model, calculating local false discovery rates (lfdr) for each set of hypotheses. However, such a strategy is difficult due to the need for multivariate density estimation. Furthermore, the multiple testing rules for the empirical Bayes lfdr approach and conventional frequentist z-statistics can disagree, which is troubling for a field that ubiquitously utilizes summary statistics. This work proposes a framework to unify two-group testing in genetic association composite null settings, the conditionally symmetric multidimensional Gaussian mixture model (csmGmm). The csmGmm is shown to demonstrate more robust operating characteristics than recently-proposed alternatives. Crucially, the csmGmm also offers strong interpretability guarantees by harmonizing lfdr and z-statistic testing rules. We extend the base csmGmm to cover each of the mediation, pleiotropy, and replication settings, and we prove that the lfdr z-statistic agreement holds in each situation. We apply the model to a collection of translational lung cancer genetic association studies that motivated this work

    Sample size calculation for randomized selection trials with a time‐to‐event endpoint and a margin of practical equivalence

    Get PDF
    Selection trials are used to compare potentially active experimental treatments without a control arm. While sample size calculation methods exist for binary endpoints, no such methods are available for time-to-event endpoints, even though these are ubiquitous in clinical trials. Recent selection trials have begun using progression-free survival as their primary endpoint, but have dichotomized it at a specific time point for sample size calculation and analysis. This changes the clinical question and may reduce power to detect a difference between the arms. In this article, we develop the theory for sample size calculation in selection trials where the time-to-event endpoint is assumed to follow an exponential or Weilbull distribution. We provide a free web application for sample size calculation, as well as an R package, that researchers can use in the design of their studies

    Fitting Gaussian mixture models on incomplete data

    No full text
    International audienceBackground Bioinformatics investigators often gain insights by combining information across multiple and disparate data sets. Merging data from multiple sources frequently results in data sets that are incomplete or contain missing values. Although missing data are ubiquitous, existing implementations of Gaussian mixture models (GMMs) either cannot accommodate missing data, or do so by imposing simplifying assumptions that limit the applicability of the model. In the presence of missing data, a standard ad hoc practice is to perform complete case analysis or imputation prior to model fitting. Both approaches have serious drawbacks, potentially resulting in biased and unstable parameter estimates.Results Here we present missingness-aware Gaussian mixture models (), an package for fitting GMMs in the presence of missing data. Unlike existing GMM implementations that can accommodate missing data, places no restrictions on the form of the covariance matrix. Using three case studies on real and simulated ’omics data sets, we demonstrate that, when the underlying data distribution is near-to a GMM, is more effective at recovering the true cluster assignments than either the existing GMM implementations that accommodate missing data, or fitting a standard GMM after state of the art imputation. Moreover, provides an accurate assessment of cluster assignment uncertainty, even when the generative distribution is not a GMM.Conclusion Compared to state-of-the-art competitors, demonstrates a better ability to recover the true cluster assignments for a wide variety of data sets and a large range of missingness rates. provides the bioinformatics community with a powerful, easy-to-use, and statistically sound tool for performing clustering and density estimation in the presence of missing data. is publicly available as an package on CRAN: https://CRAN.R-project.org/package=MGMM

    MGMM: an R package for fitting Gaussian Mixture Models on Incomplete Genomics Data

    No full text
    Motivation: Although missing data are prevalent in genetic and genomics data sets, existing implementations of Gaussian mixture models (GMMs) require complete data. Standard practice is to perform complete case analysis or imputation prior to model fitting. Both approaches have serious drawbacks, potentially resulting in biased and unstable parameter estimates. Results: Here we present MGMM, an R package for fitting GMMs in the presence of missing data. Using three case studies on real and simulated data sets, we demonstrate that MGMM is more effective at recovering true cluster assignments than standard GMM following state of the art imputation
    corecore