63 research outputs found

    A direct approach to estimating false discovery rates conditional on covariates

    Get PDF
    Modern scientific studies from many diverse areas of research abound with multiple hypothesis testing concerns. The false discovery rate (FDR) is one of the most commonly used approaches for measuring and controlling error rates when performing multiple tests. Adaptive FDRs rely on an estimate of the proportion of null hypotheses among all the hypotheses being tested. This proportion is typically estimated once for each collection of hypotheses. Here, we propose a regression framework to estimate the proportion of null hypotheses conditional on observed covariates. This may then be used as a multiplication factor with the Benjaminiā€“Hochberg adjusted p-values, leading to a plug-in FDR estimator. We apply our method to a genome-wise association meta-analysis for body mass index. In our framework, we are able to use the sample sizes for the individual genomic loci and the minor allele frequencies as covariates. We further evaluate our approach via a number of simulation scenarios. We provide an implementation of this novel method for estimating the proportion of null hypotheses in a regression framework as part of the Bioconductor package swfdr

    A DECISION-THEORY APPROACH TO INTERPRETABLE SET ANALYSIS FOR HIGH-DIMENSIONAL DATA

    Get PDF
    A ubiquitous problem in igh-dimensional analysis is the identification of pre-defined sets that are enriched for features showing an association of interest. In this situation, inference is performed on sets, not individual features. We propose an approach which focuses on estimating the fraction of non-null features in a set. We search for unions of disjoint sets (atoms), using as the loss function a weighted average of the number of false and missed discoveries. We prove that the solution is equivalent to thresholding the atomic false discovery rate and that our approach results in a more interpretable set analysis

    Patient-oriented gene set analysis for cancer mutation data

    Get PDF
    Recent research has revealed complex heterogeneous genomic landscapes in human cancers. However, mutations tend to occur within a core group of pathways and biological processes that can be grouped into gene sets. To better understand the significance of these pathways, we have developed an approach that initially scores each gene set at the patient rather than the gene level. In mutation analysis, these patient-oriented methods are more transparent, interpretable, and statistically powerful than traditional gene-oriented methods

    STATISTICAL METHODS FOR THE ANALYSIS OF CANCER GENOME SEQUENCING DATA

    Get PDF
    The purpose of cancer genome sequencing studies is to determine the nature and types of alterations present in a typical cancer and to discover genes mutated at high frequencies. In this article we discuss statistical methods for the analysis of data generated in these studies. We place special emphasis on a two-stage study design introduced by Sjoblom et al.[1]. In this context, we describe statistical methods for constructing scores that can be used to prioritize candidate genes for further investigation and to assess the statistical signicance of the candidates thus identfied

    Quantification and expert evaluation of evidence for chemopredictive biomarkers to personalize cancer treatment.

    Get PDF
    Predictive biomarkers have the potential to facilitate cancer precision medicine by guiding the optimal choice of therapies for patients. However, clinicians are faced with an enormous volume of often-contradictory evidence regarding the therapeutic context of chemopredictive biomarkers.We extensively surveyed public literature to systematically review the predictive effect of 7 biomarkers claimed to predict response to various chemotherapy drugs: ERCC1-platinums, RRM1-gemcitabine, TYMS-5-fluorouracil/Capecitabine, TUBB3-taxanes, MGMT-temozolomide, TOP1-irinotecan/topotecan, and TOP2A-anthracyclines. We focused on studies that investigated changes in gene or protein expression as predictors of drug sensitivity or resistance. We considered an evidence framework that ranked studies from high level I evidence for randomized controlled trials to low level IV evidence for pre-clinical studies and patient case studies.We found that further in-depth analysis will be required to explore methodological issues, inconsistencies between studies, and tumor specific effects present even within high evidence level studies. Some of these nuances will lend themselves to automation, others will require manual curation. However, the comprehensive cataloging and analysis of dispersed public data utilizing an evidence framework provides a high level perspective on clinical actionability of these protein biomarkers. This framework and perspective will ultimately facilitate clinical trial design as well as therapeutic decision-making for individual patients

    An Open-Publishing Response to the COVID-19 Infodemic

    Get PDF
    The COVID-19 pandemic catalyzed the rapid dissemination of papers and preprints investigating the disease and its associated virus, SARS-CoV-2. The multifaceted nature of COVID-19 demands a multidisciplinary approach, but the urgency of the crisis combined with the need for social distancing measures present unique challenges to collaborative science. We applied a massive online open publishing approach to this problem using Manubot. Through GitHub, collaborators summarized and critiqued COVID-19 literature, creating a review manuscript. Manubot automatically compiled citation information for referenced preprints, journal publications, websites, and clinical trials. Continuous integration workflows retrieved up-to-date data from online sources nightly, regenerating some of the manuscript\u27s figures and statistics. Manubot rendered the manuscript into PDF, HTML, LaTeX, and DOCX outputs, immediately updating the version available online upon the integration of new content. Through this effort, we organized over 50 scientists from a range of backgrounds who evaluated over 1,500 sources and developed seven literature reviews. While many efforts from the computational community have focused on mining COVID-19 literature, our project illustrates the power of open publishing to organize both technical and non-technical scientists to aggregate and disseminate information in response to an evolving crisis
    • ā€¦
    corecore