26 research outputs found

    EAMA: Empirically adjusted meta-analysis for large-scale simultaneous hypothesis testing in genomic experiments

    No full text
    <div><p>Recent developments in high throughput genomic assays have opened up the possibility of testing hundreds and thousands of genes simultaneously. However, adhering to the regular statistical assumptions regarding the null distributions of test statistics in such large-scale multiple testing frameworks has the potential of leading to incorrect significance testing results and biased inference. This problem gets worse when one combines results from different independent genomic experiments with a possibility of ending up with gross false discoveries of significant genes. In this article, we develop a meta-analysis method of combining p-values from different independent experiments involving large-scale multiple testing frameworks, through empirical adjustments of the individual test statistics and p-values. Even though, it is based on various existing ideas, this specific combination is novel and potentially useful. Through simulation studies and real genomic datasets we show that our method outperforms the standard meta-analysis approach of significance testing in terms of accurately identifying the truly significant set of genes.</p></div

    Temporal prediction of future state occupation in a multistate model from high-dimensional baseline covariates via pseudo-value regression

    No full text
    <p>In many complex diseases such as cancer, a patient undergoes various disease stages before reaching a terminal state (say disease free or death). This fits a multistate model framework where a prognosis may be equivalent to predicting the state occupation at a future time <i>t</i>. With the advent of high-throughput genomic and proteomic assays, a clinician may intent to use such high-dimensional covariates in making better prediction of state occupation. In this article, we offer a practical solution to this problem by combining a useful technique, called pseudo-value (PV) regression, with a latent factor or a penalized regression method such as the partial least squares (PLS) or the least absolute shrinkage and selection operator (LASSO), or their variants. We explore the predictive performances of these combinations in various high-dimensional settings via extensive simulation studies. Overall, this strategy works fairly well provided the models are tuned properly. Overall, the PLS turns out to be slightly better than LASSO in most settings investigated by us, for the purpose of temporal prediction of future state occupation. We illustrate the utility of these PV-based high-dimensional regression methods using a lung cancer data set where we use the patients’ baseline gene expression values.</p

    The number of patients in each of the two lung cancer types within each dataset.

    No full text
    <p>The number of patients in each of the two lung cancer types within each dataset.</p

    Histogram of the original z-values along with the empirical null distribution.

    No full text
    <p>Histogram of the original z-values along with the empirical null distribution.</p

    The violin plots of the gene with ID 472 for the two cancer types in each of the five datasets.

    No full text
    <p>The violin plots of the gene with ID 472 for the two cancer types in each of the five datasets.</p

    Performance assessment with 10 experiments, 1000 uncorrelated genes and absolute differences in differential expressions as 8.

    No full text
    <p>Performance assessment with 10 experiments, 1000 uncorrelated genes and absolute differences in differential expressions as 8.</p

    The performances of EAMA and that of the naïve method using the simulated count datasets.

    No full text
    <p>The performances of EAMA and that of the naïve method using the simulated count datasets.</p

    Performance assessment of the two methods where a hidden variable does not act as a confounder.

    No full text
    <p>Performance assessment of the two methods where a hidden variable does not act as a confounder.</p

    Incorporation of biological knowledge into distance for clustering genes-0

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Incorporation of biological knowledge into distance for clustering genes"</p><p>Bioinformation 2007;1(10):396-405.</p><p>Published online 10 Apr 2007</p><p>PMCID:PMC1896054.</p><p></p>stered with the UPGMA method with (triangles) and without (circles) functional informatio

    Incorporation of biological knowledge into distance for clustering genes-1

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Incorporation of biological knowledge into distance for clustering genes"</p><p>Bioinformation 2007;1(10):396-405.</p><p>Published online 10 Apr 2007</p><p>PMCID:PMC1896054.</p><p></p>stered with DIANA with (triangles) and without (circles) functional informatio
    corecore