4 research outputs found

    Comparison of the <i>p</i>-values of association between somatic mutations and survival time for TCGA glioblastoma (GBM) and ovarian (OV) datasets.

    No full text
    <p>Each data point represents a gene. (A) Comparison of the Rsurvdiff<i>p</i>-values and the exact permutational <i>p</i>-values for the GBM dataset. (B) Comparison of the Rsurvdiff<i>p</i>-values and the exact permutational <i>p</i>-values for the OV dataset.</p

    Difference between survival analysis in a clinical setting with balanced populations and genomics setting, with unbalanced populations.

    No full text
    <p>(A) In a typical clinical study, two pre-selected groups of similar size are compared. Because the groups are balanced and each has a suitable number of patients, the asymptotic approximation (normal distribution) used in common implementations of the log-rank test gives an accurate approximation of the exact distribution, resulting in accurate <i>p</i>-values. (B) In a genomics study, the two groups are defined by a genetic variant. In many cases, the sizes of the groups are unbalanced, with one group being much larger than the other. In this situation, the asymptotic distribution does not accurately approximate the exact distribution of the log-rank statistic, and the resulting <i>p</i>-values computed from the tail of the distribution (see inset) are inaccurate.</p

    Differences between observed and expected <i>p</i>-values from different forms of the log-rank test on a randomized cancer dataset consisting of somatic mutations in 6184 genes.

    No full text
    <p>The <i>p</i>-values for the genes should be distributed uniformly (green line), since there is no association between mutations and survival in this random data. Asymptotic approximations of the log-rank statistic (purple and blue) yield <i>p</i>-values that deviate significantly from the uniform distribution, incorrectly reporting many genes whose mutations are significantly associated with survival. In particular, the asymptotic log-rank test in R reports 110 genes with significant association, using a Bonferroni corrected <i>p</i>-value < 0.05 (black line), or 291 genes with significant association using a less conservative FDR = 0.05. In contrast, the exact test makes no false discoveries.</p

    Accurate Computation of Survival Statistics in Genome-Wide Studies

    Get PDF
    <div><p>A key challenge in genomics is to identify genetic variants that distinguish patients with different <i>survival time</i> following diagnosis or treatment. While the log-rank test is widely used for this purpose, nearly all implementations of the log-rank test rely on an asymptotic approximation that is not appropriate in many genomics applications. This is because: the two populations determined by a genetic variant may have very different sizes; and the evaluation of many possible variants demands highly accurate computation of very small <i>p</i>-values. We demonstrate this problem for cancer genomics data where the standard log-rank test leads to many false positive associations between somatic mutations and survival time. We develop and analyze a novel algorithm, Exact Log-rank Test (ExaLT), that accurately computes the <i>p</i>-value of the log-rank statistic under an exact distribution that is appropriate for any size populations. We demonstrate the advantages of ExaLT on data from published cancer genomics studies, finding significant differences from the reported <i>p</i>-values. We analyze somatic mutations in six cancer types from The Cancer Genome Atlas (TCGA), finding mutations with known association to survival as well as several novel associations. In contrast, standard implementations of the log-rank test report dozens-hundreds of likely false positive associations as more significant than these known associations.</p></div
    corecore