5 research outputs found

    MDSINE2 Dataset Inference Analysis [Full Inference]

    No full text
    <p>MDSINE2 Inference Analysis files. This archive contains the output from the inference stage of the dataset (MCMC samples) using ten different seeds. Each ZIP archive has been split due to technical limitations while uploading large files through Zenodo's web API.</p><p>For the full project/source pipeline, refer to <a href="https://github.com/gerberlab/MDSINE2_Paper">https://github.com/gerberlab/MDSINE2_Paper</a>.</p><p>To download the files, we recommend either our in-house tool (https://github.com/gibsonlab/zenodo_download) or Zenodo's endorsed tool (https://github.com/dvolgyes/zenodo_get).</p><p>To unzip a multi-part ZIP file, we recommend using 7-zip. For example, to unpack seed 10 from the command line:</p><p>7z x healthy-seed10.zip</p><p><strong>NOTE: this zenodo record does not contain replicate-only inference, nor the cross-validation runs used for the manuscript. This is due to file-size limitations; please refer to </strong><a href="https://doi.org/10.5281/zenodo.8006853"><strong>https://doi.org/10.5281/zenodo.8006853</strong></a><strong> for those files.</strong></p&gt

    MDSINE2 Dataset Inference Analysis

    No full text
    <p>MDSINE2 Inference Analysis files. This archive contains:</p><p>1) Replicates for negative binomial fitting.</p><p>2) Cross-Validation inference with comparator analysis included.</p><p>For the full project/source pipeline, refer to <a href="https://github.com/gerberlab/MDSINE2_Paper">https://github.com/gerberlab/MDSINE2_Paper</a>. Each ZIP archive has been split due to technical limitations while uploading large files through Zenodo's web API.</p><p>To download the files, we recommend either our in-house tool (https://github.com/gibsonlab/zenodo_download) or Zenodo's endorsed tool (https://github.com/dvolgyes/zenodo_get).</p><p>To unzip a multi-part ZIP file, we recommend using 7-zip. For example, to unpack the cross-validation MDSINE2 default runs, do the following from the command line:</p><p>7z x mdsine2-modules.zip</p><p><strong>NOTE: this updated zenodo record does not contain the full inference samples, due to file size limitations. (Version 1 of this upload has invalid tar.xf archives due to an erroneous file transfer). Please refer to </strong><a href="https://doi.org/10.5281/zenodo.8208502">https://doi.org/10.5281/zenodo.8208502</a> <strong>for those files.</strong></p&gt

    How Many Subpopulations Is Too Many? Exponential Lower Bounds for Inferring Population Histories

    No full text
    © Copyright 2020, Mary Ann Liebert, Inc., publishers 2020. Reconstruction of population histories is a central problem in population genetics. Existing coalescent-based methods, such as the seminal work of Li and Durbin, attempt to solve this problem using sequence data but have no rigorous guarantees. Determining the amount of data needed to correctly reconstruct population histories is a major challenge. Using a variety of tools from information theory, the theory of extremal polynomials, and approximation theory, we prove new sharp information-theoretic lower bounds on the problem of reconstructing population structure - the history of multiple subpopulations that merge, split, and change sizes over time. Our lower bounds are exponential in the number of subpopulations, even when reconstructing recent histories. We demonstrate the sharpness of our lower bounds by providing algorithms for distinguishing and learning population histories with matching dependence on the number of subpopulations. Along the way and of independent interest, we essentially determine the optimal number of samples needed to learn an exponential mixture distribution information-theoretically, proving the upper bound by analyzing natural (and efficient) algorithms for this problem

    Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes

    No full text
    Cancers exhibit extensive mutational heterogeneity, and the resulting long-tail phenomenon complicates the discovery of genes and pathways that are significantly mutated in cancer. We perform a pan-cancer analysis of mutated networks in 3,281 samples from 12 cancer types from The Cancer Genome Atlas (TCGA) using HotNet2, a new algorithm to find mutated subnetworks that overcomes the limitations of existing single-gene, pathway and network approaches. We identify 16 significantly mutated subnetworks that comprise well-known cancer signaling pathways as well as subnetworks with less characterized roles in cancer, including cohesin, condensin and others. Many of these subnetworks exhibit co-occurring mutations across samples. These subnetworks contain dozens of genes with rare somatic mutations across multiple cancers; many of these genes have additional evidence supporting a role in cancer. By illuminating these rare combinations of mutations, pan-cancer network analyses provide a roadmap to investigate new diagnostic and therapeutic opportunities across cancer types.This work is supported by US National Science Foundation (NSF) grant IIS-1016648 and US National Institutes of Health (NIH) grants R01HG005690, R01HG007069 and R01CA180776 to B.J.R. and by National Human Genome Research Institute (NHGRI) grant U01HG006517 to L.D. B.J.R. is supported by a Career Award at the Scientific Interface from the Burroughs Wellcome Fund, an Alfred P. Sloan Research Fellowship and an NSF CAREER Award (CCF-1053753). M.D.M.L. is supported by NSF fellowship GRFP DGE 022824

    Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes

    No full text
    Cancers exhibit extensive mutational heterogeneity, and the resulting long-tail phenomenon complicates the discovery of genes and pathways that are significantly mutated in cancer. We perform a pan-cancer analysis of mutated networks in 3,281 samples from 12 cancer types from The Cancer Genome Atlas (TCGA) using HotNet2, a new algorithm to find mutated subnetworks that overcomes the limitations of existing single-gene, pathway and network approaches. We identify 16 significantly mutated subnetworks that comprise well-known cancer signaling pathways as well as subnetworks with less characterized roles in cancer, including cohesin, condensin and others. Many of these subnetworks exhibit co-occurring mutations across samples. These subnetworks contain dozens of genes with rare somatic mutations across multiple cancers; many of these genes have additional evidence supporting a role in cancer. By illuminating these rare combinations of mutations, pan-cancer network analyses provide a roadmap to investigate new diagnostic and therapeutic opportunities across cancer types.This work is supported by US National Science Foundation (NSF) grant IIS-1016648 and US National Institutes of Health (NIH) grants R01HG005690, R01HG007069 and R01CA180776 to B.J.R. and by National Human Genome Research Institute (NHGRI) grant U01HG006517 to L.D. B.J.R. is supported by a Career Award at the Scientific Interface from the Burroughs Wellcome Fund, an Alfred P. Sloan Research Fellowship and an NSF CAREER Award (CCF-1053753). M.D.M.L. is supported by NSF fellowship GRFP DGE 022824
    corecore