32 research outputs found

    Stability analysis of mixtures of mutagenetic trees

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mixture models of mutagenetic trees are evolutionary models that capture several pathways of ordered accumulation of genetic events observed in different subsets of patients. They were used to model HIV progression by accumulation of resistance mutations in the viral genome under drug pressure and cancer progression by accumulation of chromosomal aberrations in tumor cells. From the mixture models a genetic progression score (GPS) can be derived that estimates the genetic status of single patients according to the corresponding progression along the tree models. GPS values were shown to have predictive power for estimating drug resistance in HIV or the survival time in cancer. Still, the reliability of the exact values of such complex markers derived from graphical models can be questioned.</p> <p>Results</p> <p>In a simulation study, we analyzed various aspects of the stability of estimated mutagenetic trees mixture models. It turned out that the induced probabilistic distributions and the tree topologies are recovered with high precision by an EM-like learning algorithm. However, only for models with just one major model component, also GPS values of single patients can be reliably estimated.</p> <p>Conclusion</p> <p>It is encouraging that the estimation process of mutagenetic trees mixture models can be performed with high confidence regarding induced probability distributions and the general shape of the tree topologies. For a model with only one major disease progression process, even genetic progression scores for single patients can be reliably estimated. However, for models with more than one relevant component, alternative measures should be introduced for estimating the stage of disease progression.</p

    Efficient sampling for Bayesian inference of conjunctive Bayesian networks

    Get PDF
    Motivation: Cancer development is driven by the accumulation of advantageous mutations and subsequent clonal expansion of cells harbouring these mutations, but the order in which mutations occur remains poorly understood. Advances in genome sequencing and the soon-arriving flood of cancer genome data produced by large cancer sequencing consortia hold the promise to elucidate cancer progression. However, new computational methods are needed to analyse these large datasets. Results: We present a Bayesian inference scheme for Conjunctive Bayesian Networks, a probabilistic graphical model in which mutations accumulate according to partial order constraints and cancer genotypes are observed subject to measurement noise. We develop an efficient MCMC sampling scheme specifically designed to overcome local optima induced by dependency structures. We demonstrate the performance advantage of our sampler over traditional approaches on simulated data and show the advantages of adopting a Bayesian perspective when reanalyzing cancer datasets and comparing our results to previous maximum-likelihood-based approaches. Availability: An R package including the sampler and examples is available at http://www.cbg.ethz.ch/software/bayes-cbn. Contacts: [email protected]

    Modeling HIV Drug Resistance

    Get PDF
    Despite the development of antiviral drugs and the optimization of therapies, the emergence of drug resistance remains one of the most challenging issues for successful treatments of HIV-infected patients. The availability of massive HIV drug resistance data provides us not only exciting opportunities for HIV research, but also the curse of high dimensionality. We provide several statistical learning methods in this thesis to analyze sequence data from different perspectives. We propose a hierarchical random graph approach to identify possible covariation among residue-specific mutations. Viral progression pathways were inferred using an EM-like algorithm in literature, and we present a normalization method to improve the accuracy of parameter estimations. To predict the drug resistance from genotypic data, we also build a novel regression model utilizing the information from progression pathways. Finally, we introduce a computational approach to determine viral fitness, for which our initial computational results closely agree with experimental results. Work on two other topics are presented in the Appendices. Latent class models find applications in several areas including social and biological sciences. Finding explicit maximum likelihood estimation has been elusive. We present a positive solution to a conjecture on a special latent class model proposed by Bernd Sturmfels from UC Berkeley. Monomial ideals provide ubiquitous links between combinatorics and commutative algebra. Irreducible decomposition of monomial ideals is a basic computational problem and it finds applications in several areas. We present two algorithms for finding irreducible decomposition of monomial ideals

    Quantifying cancer progression with conjunctive Bayesian networks

    Get PDF
    Motivation: Cancer is an evolutionary process characterized by accumulating mutations. However, the precise timing and the order of genetic alterations that drive tumor progression remain enigmatic. Results: We present a specific probabilistic graphical model for the accumulation of mutations and their interdependencies. The Bayesian network models cancer progression by an explicit unobservable accumulation process in time that is separated from the observable but error-prone detection of mutations. Model parameters are estimated by an Expectation-Maximization algorithm and the underlying interaction graph is obtained by a simulated annealing procedure. Applying this method to cytogenetic data for different cancer types, we find multiple complex oncogenetic pathways deviating substantially from simplified models, such as linear pathways or trees. We further demonstrate how the inferred progression dynamics can be used to improve genetics-based survival predictions which could support diagnostics and prognosis. Availability: The software package ct-cbn is available under a GPL license on the web site cbg.ethz.ch/software/ct-cbn Contact: [email protected]

    Efficient Algorithm for Finding Minimal Spanning Tree in Directed Graphs With Integer-Valued Weights

    Full text link
    In this paper the task of finding minimal spanning tree in a weighted directed graphs is considered. Here the short survey of existed algorithms solving the given problem with various complexities is conducted. A comparatively simple algorithm that solves the given problem for graphs with integer-valued weights of arcs with the time complexity O(m+nlog n) is developed as well. This result was get because of using radix sort instead of sort by comparison

    Robust unmixing of tumor states in array comparative genomic hybridization data

    Get PDF
    Motivation: Tumorigenesis is an evolutionary process by which tumor cells acquire sequences of mutations leading to increased growth, invasiveness and eventually metastasis. It is hoped that by identifying the common patterns of mutations underlying major cancer sub-types, we can better understand the molecular basis of tumor development and identify new diagnostics and therapeutic targets. This goal has motivated several attempts to apply evolutionary tree reconstruction methods to assays of tumor state. Inference of tumor evolution is in principle aided by the fact that tumors are heterogeneous, retaining remnant populations of different stages along their development along with contaminating healthy cell populations. In practice, though, this heterogeneity complicates interpretation of tumor data because distinct cell types are conflated by common methods for assaying the tumor state. We previously proposed a method to computationally infer cell populations from measures of tumor-wide gene expression through a geometric interpretation of mixture type separation, but this approach deals poorly with noisy and outlier data

    Bioinformatical approaches to ranking of anti-HIV combination therapies and planning of treatment schedules

    Get PDF
    The human immunodeficiency virus (HIV) pandemic is one of the most serious health challenges humanity is facing today. Combination therapy comprising multiple antiretroviral drugs resulted in a dramatic decline in HIV-related mortality in the developed countries. However, the emergence of drug resistant HIV variants during treatment remains a problem for permanent treatment success and seriously hampers the composition of new active regimens. In this thesis we use statistical learning for developing novel methods that rank combination therapies according to their chance of achieving treatment success. These depend on information regarding the treatment composition, the viral genotype, features of viral evolution, and the patient's therapy history. Moreover, we investigate different definitions of response to antiretroviral therapy and their impact on prediction performance of our method. We address the problem of extending purely data-driven approaches to support novel drugs with little available data. In addition, we explore the prospect of prediction systems that are centered on the patient's treatment history instead of the viral genotype. We present a framework for rapidly simulating resistance development during combination therapy that will eventually allow application of combination therapies in the best order. Finally, we analyze surface proteins of HIV regarding their susceptibility to neutralizing antibodies with the aim of supporting HIV vaccine development.Die Humane Immundefizienz-Virus (HIV) Pandemie ist eine der schwerwiegendsten gesundheitlichen Herausforderungen weltweit. Kombinationstherapien bestehend aus mehreren Medikamenten führten in entwickelten Ländern zu einem drastischen Rückgang der HIV-bedingten Sterblichkeit. Die Entstehung von Arzneimittel-resistenten Varianten während der Behandlung stellt allerdings ein Problem für den anhaltenden Behandlungserfolg dar und erschwert die Zusammenstellung von neuen aktiven Kombinationen. In dieser Arbeit verwenden wir statistisches Lernen zur Entwicklung neuer Methoden, welche Kombinationstherapien bezüglich ihres erwarteten Behandlungserfolgs sortieren. Dabei nutzen wir Informationen über die Medikamente, das virale Erbgut, die Virus Evolution und die Therapiegeschichte des Patienten. Außerdem untersuchen wir unterschiedliche Definitionen für Therapieerfolg und ihre Auswirkungen auf die Güte unserer Modelle. Wir gehen das Problem der Erweiterung von daten-getriebenen Modellen bezüglich neuer Wirkstoffen an, und untersuchen weiterhin die Therapiegeschichte des Patienten als Ersatz für das virale Genom bei der Vorhersage. Wir stellen das Rahmenwerk für die schnelle Simulation von Resistenzentwicklung vor, welches schließlich erlaubt, die bestmögliche Reihenfolge von Kombinationstherapien zu suchen. Schließlich analysieren wir das HIV Oberflächenprotein im Hinblick auf seine Anfälligkeit für neutralisierende Antikörper mit dem Ziel die Impfstoff Entwicklung zu unterstützen

    Bioinformatical approaches to ranking of anti-HIV combination therapies and planning of treatment schedules

    Get PDF
    The human immunodeficiency virus (HIV) pandemic is one of the most serious health challenges humanity is facing today. Combination therapy comprising multiple antiretroviral drugs resulted in a dramatic decline in HIV-related mortality in the developed countries. However, the emergence of drug resistant HIV variants during treatment remains a problem for permanent treatment success and seriously hampers the composition of new active regimens. In this thesis we use statistical learning for developing novel methods that rank combination therapies according to their chance of achieving treatment success. These depend on information regarding the treatment composition, the viral genotype, features of viral evolution, and the patient's therapy history. Moreover, we investigate different definitions of response to antiretroviral therapy and their impact on prediction performance of our method. We address the problem of extending purely data-driven approaches to support novel drugs with little available data. In addition, we explore the prospect of prediction systems that are centered on the patient's treatment history instead of the viral genotype. We present a framework for rapidly simulating resistance development during combination therapy that will eventually allow application of combination therapies in the best order. Finally, we analyze surface proteins of HIV regarding their susceptibility to neutralizing antibodies with the aim of supporting HIV vaccine development.Die Humane Immundefizienz-Virus (HIV) Pandemie ist eine der schwerwiegendsten gesundheitlichen Herausforderungen weltweit. Kombinationstherapien bestehend aus mehreren Medikamenten führten in entwickelten Ländern zu einem drastischen Rückgang der HIV-bedingten Sterblichkeit. Die Entstehung von Arzneimittel-resistenten Varianten während der Behandlung stellt allerdings ein Problem für den anhaltenden Behandlungserfolg dar und erschwert die Zusammenstellung von neuen aktiven Kombinationen. In dieser Arbeit verwenden wir statistisches Lernen zur Entwicklung neuer Methoden, welche Kombinationstherapien bezüglich ihres erwarteten Behandlungserfolgs sortieren. Dabei nutzen wir Informationen über die Medikamente, das virale Erbgut, die Virus Evolution und die Therapiegeschichte des Patienten. Außerdem untersuchen wir unterschiedliche Definitionen für Therapieerfolg und ihre Auswirkungen auf die Güte unserer Modelle. Wir gehen das Problem der Erweiterung von daten-getriebenen Modellen bezüglich neuer Wirkstoffen an, und untersuchen weiterhin die Therapiegeschichte des Patienten als Ersatz für das virale Genom bei der Vorhersage. Wir stellen das Rahmenwerk für die schnelle Simulation von Resistenzentwicklung vor, welches schließlich erlaubt, die bestmögliche Reihenfolge von Kombinationstherapien zu suchen. Schließlich analysieren wir das HIV Oberflächenprotein im Hinblick auf seine Anfälligkeit für neutralisierende Antikörper mit dem Ziel die Impfstoff Entwicklung zu unterstützen

    Learning Monotonic Genotype-Phenotype Maps

    Get PDF
    Evolutionary escape of pathogens from the selective pressure of immune responses and from medical interventions is driven by the accumulation of mutations. We introduce a statistical model for jointly estimating the dynamics and dependencies among genetic alterations and the associated phenotypic changes. The model integrates conjunctive Bayesian networks, which define a partial order on the occurrences of genetic events, with isotonic regression. The resulting genotype-phenotype map is non-decreasing in the lattice of genotypes. It describes evolutionary escape as a directed process following a phenotypic gradient, such as a monotonic fitness landscape. We present efficient algorithms for parameter estimation and model selection. The model is validated using simulated data and applied to HIV drug resistance data. We find that the effect of many resistance mutations is non-linear and depends on the genetic background in which they occu
    corecore