159 research outputs found

    Bayesian Cox Regression for Population-scale Inference in Electronic Health Records

    Full text link
    The Cox model is an indispensable tool for time-to-event analysis, particularly in biomedical research. However, medicine is undergoing a profound transformation, generating data at an unprecedented scale, which opens new frontiers to study and understand diseases. With the wealth of data collected, new challenges for statistical inference arise, as datasets are often high dimensional, exhibit an increasing number of measurements at irregularly spaced time points, and are simply too large to fit in memory. Many current implementations for time-to-event analysis are ill-suited for these problems as inference is computationally demanding and requires access to the full data at once. Here we propose a Bayesian version for the counting process representation of Cox's partial likelihood for efficient inference on large-scale datasets with millions of data points and thousands of time-dependent covariates. Through the combination of stochastic variational inference and a reweighting of the log-likelihood, we obtain an approximation for the posterior distribution that factorizes over subsamples of the data, enabling the analysis in big data settings. Crucially, the method produces viable uncertainty estimates for large-scale and high-dimensional datasets. We show the utility of our method through a simulation study and an application to myocardial infarction in the UK Biobank.Comment: 35 pages, 5 figures, 4 table

    Evolutionary Games with Affine Fitness Functions: Applications to Cancer

    Full text link
    We analyze the dynamics of evolutionary games in which fitness is defined as an affine function of the expected payoff and a constant contribution. The resulting inhomogeneous replicator equation has an homogeneous equivalent with modified payoffs. The affine terms also influence the stochastic dynamics of a two-strategy Moran model of a finite population. We then apply the affine fitness function in a model for tumor-normal cell interactions to determine which are the most successful tumor strategies. In order to analyze the dynamics of concurrent strategies within a tumor population, we extend the model to a three-strategy game involving distinct tumor cell types as well as normal cells. In this model, interaction with normal cells, in combination with an increased constant fitness, is the most effective way of establishing a population of tumor cells in normal tissue.Comment: The final publication is available at http://www.springerlink.com, http://dx.doi.org/10.1007/s13235-011-0029-

    Quantifying cancer progression with conjunctive Bayesian networks

    Get PDF
    Motivation: Cancer is an evolutionary process characterized by accumulating mutations. However, the precise timing and the order of genetic alterations that drive tumor progression remain enigmatic. Results: We present a specific probabilistic graphical model for the accumulation of mutations and their interdependencies. The Bayesian network models cancer progression by an explicit unobservable accumulation process in time that is separated from the observable but error-prone detection of mutations. Model parameters are estimated by an Expectation-Maximization algorithm and the underlying interaction graph is obtained by a simulated annealing procedure. Applying this method to cytogenetic data for different cancer types, we find multiple complex oncogenetic pathways deviating substantially from simplified models, such as linear pathways or trees. We further demonstrate how the inferred progression dynamics can be used to improve genetics-based survival predictions which could support diagnostics and prognosis. Availability: The software package ct-cbn is available under a GPL license on the web site cbg.ethz.ch/software/ct-cbn Contact: [email protected]

    Waiting time models of cancer progression

    Full text link
    Cancer progression is an evolutionary process that is driven by mutation and selection in a population of tumor cells. We discuss mathematical models of cancer progression, starting from traditional multistage theory. Each stage is associated with the occurrence of genetic alterations and their fixation in the population. We describe the accumulation of mutations using conjunctive Bayesian networks, an exponential family of waiting time models in which the occurrence of mutations is constrained to a partial temporal order. Two opposing limit cases arise if mutations either follow a linear order or occur independently. We derive exact analytical expressions for the waiting time until a specific number of mutations have accumulated in these limit cases as well as for the general conjunctive Bayesian network. Finally, we analyze a stochastic population genetics model that explicitly accounts for mutation and selection. In this model, waves of clonal expansions sweep through the population at equidistant intervals. We present an approximate analytical expression for the waiting time in this model and compare it to the results obtained for the conjunctive Bayesian networks

    The Temporal Order of Genetic and Pathway Alterations in Tumorigenesis

    Get PDF
    Cancer evolves through the accumulation of mutations, but the order in which mutations occur is poorly understood. Inference of a temporal ordering on the level of genes is challenging because clinically and histologically identical tumors often have few mutated genes in common. This heterogeneity may at least in part be due to mutations in different genes having similar phenotypic effects by acting in the same functional pathway. We estimate the constraints on the order in which alterations accumulate during cancer progression from cross-sectional mutation data using a probabilistic graphical model termed Hidden Conjunctive Bayesian Network (H-CBN). The possible orders are analyzed on the level of genes and, after mapping genes to functional pathways, also on the pathway level. We find stronger evidence for pathway order constraints than for gene order constraints, indicating that temporal ordering results from selective pressure acting at the pathway level. The accumulation of changes in core pathways differs among cancer types, yet a common feature is that progression appears to begin with mutations in genes that regulate apoptosis pathways and to conclude with mutations in genes involved in invasion pathways. H-CBN models provide a quantitative and intuitive model of tumorigenesis showing that the genetic events can be linked to the phenotypic progression on the level of pathways

    Estimation of the test to test distribution as a proxy for generation interval distribution for the Omicron variant in England (Preprint)

    Get PDF
    Background: Early estimates from South Africa indicated that the Omicron COVID-19 variant may be both more transmissible and have greater immune escape than the previously dominant Delta variant. The rapid turnover of the latest epidemic wave in South Africa as well as initial evidence from contact tracing and household infection studies has prompted speculation that the generation time of the Omicron variant may be shorter in comparable settings than the generation time of the Delta variant. Methods: We estimated daily growth rates for the Omicron and Delta variants in each UKHSA region from the 23rd of November to the 23rd of December 2021 using surveillance case counts by date of specimen and S-gene target failure status with an autoregressive model that allowed for time-varying differences in the transmission advantage of the Delta variant where the evidence supported this. By assuming a gamma distributed generation distribution we then estimated the generation time distribution and transmission advantage of the Omicron variant that would be required to explain this time varying advantage. We repeated this estimation process using two different prior estimates for the generation time of the Delta variant first based on household transmission and then based on its intrinsic generation time. Results: Visualising our growth rate estimates provided initial evidence for a difference in generation time distributions. Assuming a generation time distribution for Delta with a mean of 2.5-4 days (90% credible interval) and a standard deviation of 1.9-3 days we estimated a shorter generation time distribution for Omicron with a mean of 1.5-3.2 days and a standard deviation of 1.3-4.6 days. This implied a transmission advantage for Omicron in this setting of 160%-210% compared to Delta. We found similar relative results using an estimate of the intrinsic generation time for Delta though all estimates increased in magnitude due to the longer assumed generation time. Conclusions: We found that a reduction in the generation time of Omicron compared to Delta was able to explain the observed variation over time in the transmission advantage of the Omicron variant. However, this analysis cannot rule out the role of other factors such as differences in the populations the variants were mixing in, differences in immune escape between variants or bias due to using the test to test distribution as a proxy for the generation time distribution

    Mutational signatures of DNA mismatch repair deficiency in C. elegans and human cancers

    Get PDF
    Throughout their lifetime, cells are subject to extrinsic and intrinsic mutational processes leaving behind characteristic signatures in the genome. DNA mismatch repair (MMR) deficiency leads to hypermutation and is found in different cancer types. Although it is possible to associate mutational signatures extracted from human cancers with possible mutational processes, the exact causation is often unknown. Here, we use C. elegans genome sequencing of pms-2 and mlh-1 knockouts to reveal the mutational patterns linked to C. elegans MMR deficiency and their dependency on endogenous replication errors and errors caused by deletion of the polymerase epsilon subunit pole-4. Signature extraction from 215 human colorectal and 289 gastric adenocarcinomas revealed three MMR-associated signatures, one of which closely resembles the C. elegans MMR spectrum and strongly discriminates microsatellite stable and unstable tumors (AUC = 98%). A characteristic difference between human and C. elegans MMR deficiency is the lack of elevated levels of N (C) under barG > NTG mutations in C. elegans, likely caused by the absence of cytosine (CpG) methylation in worms. The other two human MMR signatures may reflect the interaction between MMR deficiency and other mutagenic processes, but their exact cause remains unknown. In summary, combining information from genetically defined models and cancer samples allows for better aligning mutational signatures to causal mutagenic processes

    The evolutionary history of 2,658 cancers

    Get PDF
    Cancer develops through a process of somatic evolution(1,2). Sequencing data from a single biopsy represent a snapshot of this process that can reveal the timing of specific genomic aberrations and the changing influence of mutational processes(3). Here, by whole-genome sequencing analysis of 2,658 cancers as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA)(4), we reconstruct the life history and evolution of mutational processes and driver mutation sequences of 38 types of cancer. Early oncogenesis is characterized by mutations in a constrained set of driver genes, and specific copy number gains, such as trisomy 7 in glioblastoma and isochromosome 17q in medulloblastoma. The mutational spectrum changes significantly throughout tumour evolution in 40% of samples. A nearly fourfold diversification of driver genes and increased genomic instability are features of later stages. Copy number alterations often occur in mitotic crises, and lead to simultaneous gains of chromosomal segments. Timing analyses suggest that driver mutations often precede diagnosis by many years, if not decades. Together, these results determine the evolutionary trajectories of cancer, and highlight opportunities for early cancer detection.Peer reviewe

    The BET protein FSH functionally interacts with ASH1 to orchestrate global gene activity in Drosophila

    Get PDF
    BACKGROUND: The question of how cells re-establish gene expression states after cell division is still poorly understood. Genetic and molecular analyses have indicated that Trithorax group (TrxG) proteins are critical for the long-term maintenance of active gene expression states in many organisms. A generally accepted model suggests that TrxG proteins contribute to maintenance of transcription by protecting genes from inappropriate Polycomb group (PcG)-mediated silencing, instead of directly promoting transcription. RESULTS AND DISCUSSION: Here we report a physical and functional interaction in Drosophila between two members of the TrxG, the histone methyltransferase ASH1 and the bromodomain and extraterminal family protein FSH. We investigated this interface at the genome level, uncovering a widespread co-localization of both proteins at promoters and PcG-bound intergenic elements. Our integrative analysis of chromatin maps and gene expression profiles revealed that the observed ASH1-FSH binding pattern at promoters is a hallmark of active genes. Inhibition of FSH-binding to chromatin resulted in global down-regulation of transcription. In addition, we found that genes displaying marks of robust PcG-mediated repression also have ASH1 and FSH bound to their promoters. CONCLUSIONS: Our data strongly favor a global coactivator function of ASH1 and FSH during transcription, as opposed to the notion that TrxG proteins impede inappropriate PcG-mediated silencing, but are dispensable elsewhere. Instead, our results suggest that PcG repression needs to overcome the transcription-promoting function of ASH1 and FSH in order to silence genes

    C. elegans genome-wide analysis reveals DNA repair pathways that act cooperatively to preserve genome integrity upon ionizing radiation.

    Get PDF
    Ionizing radiation (IR) is widely used in cancer therapy and accidental or environmental exposure is a major concern. However, little is known about the genome-wide effects IR exerts on germ cells and the relative contribution of DNA repair pathways for mending IR-induced lesions. Here, using C. elegans as a model system and using primary sequencing data from our recent high-level overview of the mutagenic consequences of 11 genotoxic agents, we investigate in detail the genome-wide mutagenic consequences of exposing wild-type and 43 DNA repair and damage response defective C. elegans strains to a Caesium (Cs-137) source, emitting γ-rays. Cs-137 radiation induced single nucleotide variants (SNVs) at a rate of ~1 base substitution per 3 Gy, affecting all nucleotides equally. In nucleotide excision repair mutants, this frequency increased 2-fold concurrently with increased dinucleotide substitutions. As observed for DNA damage induced by bulky DNA adducts, small deletions were increased in translesion polymerase mutants, while base changes decreased. Structural variants (SVs) were augmented with dose, but did not arise with significantly higher frequency in any DNA repair mutants tested. Moreover, 6% of all mutations occurred in clusters, but clustering was not significantly altered in any DNA repair mutant background. Our data is relevant for better understanding how DNA repair pathways modulate IR-induced lesions
    corecore