16 research outputs found

    Efficient sampling for Bayesian inference of conjunctive Bayesian networks

    Get PDF
    Motivation: Cancer development is driven by the accumulation of advantageous mutations and subsequent clonal expansion of cells harbouring these mutations, but the order in which mutations occur remains poorly understood. Advances in genome sequencing and the soon-arriving flood of cancer genome data produced by large cancer sequencing consortia hold the promise to elucidate cancer progression. However, new computational methods are needed to analyse these large datasets. Results: We present a Bayesian inference scheme for Conjunctive Bayesian Networks, a probabilistic graphical model in which mutations accumulate according to partial order constraints and cancer genotypes are observed subject to measurement noise. We develop an efficient MCMC sampling scheme specifically designed to overcome local optima induced by dependency structures. We demonstrate the performance advantage of our sampler over traditional approaches on simulated data and show the advantages of adopting a Bayesian perspective when reanalyzing cancer datasets and comparing our results to previous maximum-likelihood-based approaches. Availability: An R package including the sampler and examples is available at http://www.cbg.ethz.ch/software/bayes-cbn. Contacts: [email protected]

    BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies.

    Get PDF
    Cancer has long been understood as a somatic evolutionary process, but many details of tumor progression remain elusive. Here, we present BitPhylogenyBitPhylogeny, a probabilistic framework to reconstruct intra-tumor evolutionary pathways. Using a full Bayesian approach, we jointly estimate the number and composition of clones in the sample as well as the most likely tree connecting them. We validate our approach in the controlled setting of a simulation study and compare it against several competing methods. In two case studies, we demonstrate how BitPhylogeny BitPhylogeny reconstructs tumor phylogenies from methylation patterns in colon cancer and from single-cell exomes in myeloproliferative neoplasm.KY and FM would like to acknowledge the support of the University of Cambridge, Cancer Research UK and Hutchison Whampoa Limited.This is the final published version. It first appeared at http://genomebiology.com/2015/16/1/36

    BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies

    No full text
    Cancer has long been understood as a somatic evolutionary process, but many details of tumor progression remain elusive. Here, we present BitPhylogeny, a probabilistic framework to reconstruct intra-tumor evolutionary pathways. Using a full Bayesian approach, we jointly estimate the number and composition of clones in the sample as well as the most likely tree connecting them. We validate our approach in the controlled setting of a simulation study and compare it against several competing methods. In two case studies, we demonstrate how BitPhylogeny reconstructs tumor phylogenies from methylation patterns in colon cancer and from single-cell exomes in myeloproliferative neoplasm

    Whole genome phylogenies reflect the distributions of recombination rates for many bacterial species

    No full text
    Although recombination is accepted to be common in bacteria, for many species robust phylogenies with well-resolved branches can be reconstructed from whole genome alignments of strains, and these are generally interpreted to reflect clonal relationships. Using new methods based on the statistics of single-nucleotide polymorphism (SNP) splits, we show that this interpretation is incorrect. For many species, each locus has recombined many times along its line of descent, and instead of many loci supporting a common phylogeny, the phylogeny changes many thousands of times along the genome alignment. Analysis of the patterns of allele sharing among strains shows that bacterial populations cannot be approximated as either clonal or freely recombining but are structured such that recombination rates between lineages vary over several orders of magnitude, with a unique pattern of rates for each lineage. Thus, rather than reflecting clonal ancestry, whole genome phylogenies reflect distributions of recombination rates.ISSN:2050-084

    Predicting cancer type from tumour DNA signatures

    No full text
    Background Establishing the cancer type and site of origin is important in determining the most appropriate course of treatment for cancer patients. Patients with cancer of unknown primary, where the site of origin cannot be established from an examination of the metastatic cancer cells, typically have poor survival. Here, we evaluate the potential and limitations of utilising gene alteration data from tumour DNA to identify cancer types. Methods Using sequenced tumour DNA downloaded via the cBioPortal for Cancer Genomics, we collected the presence or absence of calls for gene alterations for 6640 tumour samples spanning 28 cancer types, as predictive features. We employed three machine-learning techniques, namely linear support vector machines with recursive feature selection, L 1-regularised logistic regression and random forest, to select a small subset of gene alterations that are most informative for cancer-type prediction. We then evaluated the predictive performance of the models in a comparative manner. Results We found the linear support vector machine to be the most predictive model of cancer type from gene alterations. Using only 100 somatic point-mutated genes for prediction, we achieved an overall accuracy of 49.4±0.4 % (95 % confidence interval). We observed a marked increase in the accuracy when copy number alterations are included as predictors. With a combination of somatic point mutations and copy number alterations, a mere 50 genes are enough to yield an overall accuracy of 77.7±0.3 %. Conclusions A general cancer diagnostic tool that utilises either only somatic point mutations or only copy number alterations is not sufficient for distinguishing a broad range of cancer types. The combination of both gene alteration types can dramatically improve the performance.ISSN:1756-994

    Performance of mutation timing and frequency-based gene ranking in simulation studies on three fixed networks.

    No full text
    <p>The area under the ROC curve (AUC) values shown were computed from 100 simulations for each setting. The AUCs for mutation timing ranking and frequency-based ranking are always shown next to each other, with the left one (white) being the mutation timing classifier and the right one (grey) being the frequency-based classifier. (A,B and C) For networks A, B and C, respectively, the sample size was varied, and the error rate was set to 0.01. (D, E, and F) For networks A, B and C, respectively, the error rate was varied, and the sample size was set to 500. (G, H and I) For networks A, B anc C, respectively, the variation of the passenger rates, , (corresponding to the independent nodes) was varied, while the error rate was constant at 0.01, and the sample size was 500.</p

    Identification of Constrained Cancer Driver Genes Based on Mutation Timing

    No full text
    <div><p>Cancer drivers are genomic alterations that provide cells containing them with a selective advantage over their local competitors, whereas neutral passengers do not change the somatic fitness of cells. Cancer-driving mutations are usually discriminated from passenger mutations by their higher degree of recurrence in tumor samples. However, there is increasing evidence that many additional driver mutations may exist that occur at very low frequencies among tumors. This observation has prompted alternative methods for driver detection, including finding groups of mutually exclusive mutations and incorporating prior biological knowledge about gene function or network structure. Dependencies among drivers due to epistatic interactions can also result in low mutation frequencies, but this effect has been ignored in driver detection so far. Here, we present a new computational approach for identifying genomic alterations that occur at low frequencies because they depend on other events. Unlike passengers, these constrained mutations display punctuated patterns of occurrence in time. We test this driver–passenger discrimination approach based on mutation timing in extensive simulation studies, and we apply it to cross-sectional copy number alteration (CNA) data from ovarian cancer, CNA and single-nucleotide variant (SNV) data from breast tumors and SNV data from colorectal cancer. Among the top ranked predicted drivers, we find low-frequency genes that have already been shown to be involved in carcinogenesis, as well as many new candidate drivers. The mutation timing approach is orthogonal and complementary to existing driver prediction methods. It will help identifying from cancer genome data the alterations that drive tumor progression.</p></div

    The three mutation dependency networks used for simulation.

    No full text
    <p>Nodes represent genes (or other mutatable entities, for example, pathways) and arrows represent mutational dependencies. Evolutionary rate parameters (parameterizing exponential waiting time distributions for modeling the time until an event happens) for each mutation are given next to the respective node. The evolutionary rate parameters for the independent nodes were drawn from various distributions in the different simulation settings. The mutation processes are stopped at the time of observation after an exponentially distributed waiting time with rate parameter 1 for all simulations. (A) Dependency network used for simulation with ten dependent nodes. (B) Dependency network used for simulation with 11 dependent nodes. (C) Linear dependency network with ten dependent nodes.</p

    Schematic overview of the mutation accumulation process and the mutation timing approach to separate dependent from independent events.

    No full text
    <p>(A) The occurrence of drivers is subject to hidden constraints, represented by a dependency structure (blue), whereas passengers are independent (orange). (B) Every cancer sample is an independent realization of the common underlying oncogenesis process. (C) Noisy cross-sectional mutation data from a set of tumor samples is the basis for discrimination of dependent from independent events. (D) Conditional mutation probabilities <i>P</i><sub>m</sub>(<i>k</i>)  =  Pr(mutation <i>m</i> has occurred given <i>k</i> or less mutations have occurred so far) of dependent mutations (blue) and (E) unconstrained mutations (orange) under the assumption of identical evolutionary rates have different characteristic shapes. (F) Schematic representation of a sigmoidal curve (black) used to approximate and the slope of this curve at the inflection point (red); this slope is used for ranking the genes or loci of interest.</p
    corecore