2 research outputs found

    Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines

    Get PDF
    The Cancer Genome Atlas (TCGA) cancer genomicsdataset includes over 10,000 tumor-normal exomepairs across 33 different cancer types, in total >400TB of raw data files requiring analysis. Here wedescribe the Multi-Center Mutation Calling in Multi-ple Cancers project, our effort to generate a compre-hensive encyclopedia of somatic mutation calls forthe TCGA data to enable robust cross-tumor-typeanalyses. Our approach accounts for varianceand batch effects introduced by the rapid advance-ment of DNA extraction, hybridization-capture,sequencing, and analysis methods over time. Wepresent best practices for applying an ensemble ofseven mutation-calling algorithms with scoring andartifact filtering. The dataset created by this analysisincludes 3.5 million somatic variants and forms thebasis for PanCan Atlas papers. The results havebeen made available to the research communityalong with the methods used to generate them.This project is the result of collaboration from a num-ber of institutes and demonstrates how team sciencedrives extremely large genomics projects
    corecore