24 research outputs found

    A New Implementation and Detailed Study of Breakpoint Analysis

    Get PDF
    Phylogenies derived from gene order data may prove crucial in answering some fundamental open questions in biomolecular evolution. Yet very few techniques are available for such phylogenetic reconstructions. One method is breakpoint analysis, developed by Blanchette and Sankoff 2 for solving the breakpoint phylogeny.\u27 Our earlier studies 5;6 confirmed the usefulness of this approach, but also found that BPAnalysis, the implementation developed by Sankoff and Blanchette, was too slow to use on all but very small datasets. We report here on a reimplementation of BPAnalysis using the principles of algorithmic engineering. Our faster (by 2 to 3 orders of magnitude) and flexible implementation allowed us to conduct studies on the characteristics of breakpoint analysis, in terms of running time, quality, and robustness, as well as to analyze datasets that had so far been considered out of reach. We report on these findings and also discuss future directions for our new implementation.\u2

    Molecular signals of arms race evolution between RNA viruses and their hosts

    Get PDF
    Viruses are intracellular parasites that hijack their hosts’ cellular machinery to replicate themselves. This creates an evolutionary “arms race” between hosts and viruses, where the former develop mechanisms to restrict viral infection and the latter evolve ways to circumvent these molecular barriers. In this thesis, I explore examples of this virus-host molecular interplay, focusing on events in the evolutionary histories of both viruses and hosts. The thesis begins by examining how recombination, the exchange of genetic material between related viruses, expands the genomic diversity of the Sarbecovirus subgenus, which includes SARS-CoV responsible for the 2002 SARS epidemic and SARS-CoV-2 responsible for the COVID-19 pandemic. On the host side, I examine the evolutionary interaction between RNA viruses and two interferon-stimulated genes expressed in hosts. First, I show how the 2′-5′-oligoadenylate synthetase 1 (OAS1) gene of horseshoe bats (Rhinolophoidea), the reservoir host of sarbecoviruses, lost its anti-coronaviral activity at the base of this bat superfamily. By reconstructing the Rhinolophoidea common ancestor OAS1 protein, I first validate the loss of antiviral function and highlight the implications of this event in the virus-host association between sarbecoviruses and horseshoe bat hosts. Second, I focus on the evolution of the human butyrophilin subfamily 3 member A3 (BTN3A3) gene which restricts infection by avian influenza A viruses (IAV). The evolutionary analysis reveals that BTN3A3’s anti-IAV function was gained within the primates and that specific amino acid substitutions need to be acquired in IAVs’ NP protein to evade the human BTN3A3 activity. Gain of BTN3A3-evasion-conferring substitutions correlate with all major human IAV pandemics and epidemics, making these NP residues key markers for IAV transmissibility potential to humans. In the final part of the thesis, I present a novel approach for evaluating dinucleotide compositional biases in virus genomes. An application of my metric on the Flaviviridae virus family uncovers how ancestral host shifts of these viruses correlate with adaptive shifts in their genomes’ dinucleotide representation. Collectively, the contents of this thesis extend our understanding of how viruses interact with their hosts along their intertangled evolution and provide insights into virus host switching and pandemic preparedness

    Festparameter-Algorithmen fuer die Konsens-Analyse Genomischer Daten

    Get PDF
    Fixed-parameter algorithms offer a constructive and powerful approach to efficiently obtain solutions for NP-hard problems combining two important goals: Fixed-parameter algorithms compute optimal solutions within provable time bounds despite the (almost inevitable) computational intractability of NP-hard problems. The essential idea is to identify one or more aspects of the input to a problem as the parameters, and to confine the combinatorial explosion of computational difficulty to a function of the parameters such that the costs are polynomial in the non-parameterized part of the input. This makes especially sense for parameters which have small values in applications. Fixed-parameter algorithms have become an established algorithmic tool in a variety of application areas, among them computational biology where small values for problem parameters are often observed. A number of design techniques for fixed-parameter algorithms have been proposed and bounded search trees are one of them. In computational biology, however, examples of bounded search tree algorithms have been, so far, rare. This thesis investigates the use of bounded search tree algorithms for consensus problems in the analysis of DNA and RNA data. More precisely, we investigate consensus problems in the contexts of sequence analysis, of quartet methods for phylogenetic reconstruction, of gene order analysis, and of RNA secondary structure comparison. In all cases, we present new efficient algorithms that incorporate the bounded search tree paradigm in novel ways. On our way, we also obtain results of parameterized hardness, showing that the respective problems are unlikely to allow for a fixed-parameter algorithm, and we introduce integer linear programs (ILP's) as a tool for classifying problems as fixed-parameter tractable, i.e., as having fixed-parameter algorithms. Most of our algorithms were implemented and tested on practical data.Festparameter-Algorithmen bieten einen konstruktiven Ansatz zur Loesung von kombinatorisch schwierigen, in der Regel NP-harten Problemen, der zwei Ziele beruecksichtigt: innerhalb von beweisbaren Laufzeitschranken werden optimale Ergebnisse berechnet. Die entscheidende Idee ist dabei, einen oder mehrere Aspekte der Problemeingabe als Parameter der Problems aufzufassen und die kombinatorische Explosion der algorithmischen Schwierigkeit auf diese Parameter zu beschraenken, so dass die Laufzeitkosten polynomiell in Bezug auf den nicht-parametrisierten Teil der Eingabe sind. Gibt es einen Festparameter-Algorithmus fuer ein kombinatorisches Problem, nennt man das Problem festparameter-handhabbar. Die Entwicklung von Festparameter-Algorithmen macht vor allem dann Sinn, wenn die betrachteten Parameter im Anwendungsfall nur kleine Werte annehmen. Festparameter-Algorithmen sind zu einem algorithmischen Standardwerkzeug in vielen Anwendungsbereichen geworden, unter anderem in der algorithmischen Biologie, wo in vielen Anwendungen kleine Parameterwerte beobachtet werden koennen. Zu den bekannten Techniken fuer den Entwurf von Festparameter-Algorithmen gehoeren unter anderem groessenbeschraenkte Suchbaeume. In der algorithmischen Biologie gibt es bislang nur wenige Beispiele fuer die Anwendung von groessenbeschraenkten Suchbaeumen. Diese Arbeit untersucht den Einsatz groessenbeschraenkter Suchbaeume fuer NP-harte Konsens-Probleme in der Analyse von DNS- und RNS-Daten. Wir betrachten Konsens-Probleme in der Analyse von DNS-Sequenzdaten, in der Analyse von sogenannten Quartettdaten zur Erstellung von phylogenetischen Hypothesen, in der Analyse von Daten ueber die Anordnung von Genen und beim Vergleich von RNS-Strukturdaten. In allen Faellen stellen wir neue effiziente Algorithmen vor, in denen das Paradigma der groessenbeschraenkten Suchbaeume auf neuartige Weise realisiert wird. Auf diesem Weg zeigen wir auch Ergebnisse parametrisierter Haerte, die zeigen, dass fuer die dabei betrachteten Probleme ein Festparameter-Algorithmus unwahrscheinlich ist. Ausserdem fuehren wir ganzzahliges lineares Programmieren als eine neue Technik ein, um die Festparameter-Handhabbarkeit eines Problems zu zeigen. Die Mehrzahl der hier vorgestellten Algorithmen wurde implementiert und auf Anwendungsdaten getestet

    Understanding the evolutionary history of the papillomaviruses

    Get PDF
    This thesis focuses on the evolutionary history of the papillomaviruses (PVs) using phylogenetic approaches. Two aspects have been examined: the first is the level of phylogenetic compatibility among PV genes and the second is determining the ancestral diversification mechanisms of the PVs in order to explain the origin of the observed associations with host species. Bayesian phylogenetic analysis has been used to make evolutionary inferences. The existence of phylogenetic compatibility among genes was examined by estimating constrained and unconstrained phylogenies for pairs of PV genes. The Bayes' factor statistic derived from comparison of the constrained and unconstrained models indicated significant evidence against identical phylogenies between any of the 6 PV genes investigated and may indicate the existence of ancestral recombination events. The formation of new host-virus associations can occur via a process of 'codivergence', where, following host speciation, the ancestral virus association is effectively inherited by the descendant host species; 'prior divergence' of the virus, which results in multiple virus associations with the host; and 'host transfer', in which the virus lineage is transferred between contemporaneous host species. To distinguish between these mechanisms of virus diversification, an approach based on temporal comparisons of host and virus divergence times was devised. Difficulties associated with the direct estimation of PV divergence times led to the incorporation of a biased sampling approach into Bayesian phylogenetic estimation. This allowed for viral divergence events to be biased in favour of codivergence but allowed sampling of times that violate this assumption and therefore indicate either prior divergence or host transfer. Statistical evaluation of the proportion of violations at each viral divergence identified significant evidence of prior divergence events behind many of the observed PV-host associations and one ancestral host transfer event

    Evolutionary genomics : statistical and computational methods

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward

    Evolutionary Genomics

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward

    EVOLUTION, COMPARATIVE GENOMICS AND GENOMIC EPIDEMIOLOGY OF BACTERIA OF PUBLIC HEALTH IMPORTANCE

    Get PDF
    La presente tesi \ue8 incentrata sull'epidemiologia genomica delle infezioni batteriche ospedaliere. L'ambiente ospedaliero \ue8 peculiare, in quanto al suo interno si concentrano un elevato numero di agenti batterici, pazienti con un sistema immunitario debole e un uso massiccio di sostanze antimicrobiche. Questa combinazione favorisce lo sviluppo e la selezione di ceppi resistenti agli antibiotici e la diffusione di infezioni opportunistiche: in generale il prosperare dei patogeni nosocomiali. Alcune tecniche all'avanguardia per lo studio di questo tipo di infezioni sono basate sull\u2019uso della genomica e di approcci evoluzionistici: esse permettono di conoscere le caratteristiche genomiche dei ceppi batterici e di ricostruire la loro storia evolutiva. Grazie alla possibilit\ue0 di sequenziare il DNA ad un prezzo sempre pi\uf9 economico, i progetti di ricerca sono supportati da un numero sempre crescente di genomi e i dati genomici depositati nelle banche dati sono in crescita esponenziale: questo rende possibile eseguire una variet\ue0 sempre maggiore di analisi. Il primo lavoro qui riportato descrive l'evoluzione del Clonal Complex 258 (CC258) di Klebsiella pneumoniae. Le mutazioni puntiformi (single nucleotide polymorphism, SNP) hanno permesso di ricostruire la filogenesi globale di tutta la specie e di collocare il CC258 nel suo contesto evolutivo. Successivamente, \ue8 stato possibile rilevare la presenza di una ricombinazione di 1,3 Mb nei genomi del clade in analisi. Un\u2019analisi del molecular clock ha poi consentito di datare sia questo che gli altri eventi di ricombinazione scoperti in lavori precedenti. Questi risultati sono stati usati per completare il quadro della storia evolutiva del CC258, caratterizzata da frequenti eventi di macro-ricombinazione. Un\u2019evoluzione rapida e caratterizzata da scambi di elevate quantit\ue0 di informazioni genomiche \ue8 una caratteristica comune ad altri patogeni nosocomiali che sviluppano fenotipi da "superbatteri". Sebbene frequente, il modello di evoluzione per macro-ricombinazioni non \ue8 comune a tutti i batteri responsabili di infezioni nosocomiali. Un\u2019eccezione \ue8 il ceppo SMAL di Acinetobacter baumannii, presentato in un altro sottoprogetto di questa tesi. In questo lavoro sono stati analizzati i genomi del sequence type (ST) 78 di A. baumannii. La filogenesi e la genomica comparativa hanno rivelato la presenza di due differenti cladi all'interno del ST che presentano differenti "stili" evolutivi. Un gruppo (contenente i genomi SMAL) \ue8 caratterizzato da una minore variabilit\ue0 del contenuto genico e dalla presenza di un numero pi\uf9 elevato di copie di insertion sequence (IS). Una IS interrompe il gene comEC/rec2 in tutti i genomi SMAL. Questo gene codifica per una proteina coinvolta nell\u2019acquisizione del DNA esogeno, quindi la sua inattivazione limita lo scambio di geni. Questo suggerisce una spiegazione per la bassa plasticit\ue0 genomica. In un altro lavoro presentato in questa tesi, l'epidemiologia genomica \ue8 stata applicata per ricostruire la diffusione di un focolaio epidemico di K. pneumoniae in un\u2019unit\ue0 di terapia intensiva ospedaliera. In un primo momento, \ue8 stato utilizzato un approccio filogenetico per separare gli isolati appartenenti all'epidemia da quelli sporadici. Poi le date di isolamento e gli SNP genomici hanno permesso di costruire una rete genomica che modellasse la propagazione delle infezioni nel reparto. La ricostruzione ha indicato una diffusione radiale del patogeno dal paziente zero a tutti gli altri infetti, rivelando cos\uec un errore sistematico nelle procedure di biosicurezza dell'ospedale. Questa applicazione quasi forense dell'epidemiologia genomica \ue8 stata utilizzata anche in altri due lavori qui presentati, entrambi riguardanti la ricostruzione di infezioni alimentari. In uno degli articoli, incentrato su Salmonella enterica, l\u2019analisi filogenetica \ue8 stata eseguita solamente con gli SNP sinonimi al fine di filtrare le mutazioni patoadattative. Nell'altro lavoro sono stati utilizzati dati epidemiologici, tipizzazione molecolare e filogenesi basata sugli SNP per studiare l'infezione di nove isolati di Listeria monocytogenes, che si ritenevano essere parte dello stesso focolaio e alla fine sono risultati genomicamente non correlati. Infine, viene qui presentato anche un articolo di review riguardante l'epidemiologia genomica. L'articolo \ue8 focalizzato sulle ultime pubblicazioni ad alto impatto che analizzano l'evoluzione genomica degli agenti patogeni batterici e le dinamiche di propagazione delle epidemie in brevi periodi di tempo. L'articolo descrive, infine, le ultime ricostruzioni epidemiologiche a livello storico, che sono possibili grazie alle moderne tecnologie di isolamento e sequenza del DNA.The present thesis is focused on genomic epidemiology of bacterial hospital infections. The hospital environment is unique, as it concentrates a high number of bacterial agents, frequent antibiotic use, and patients with weak immune systems. This combination favours the development and selection of antibiotic resistant strains and the spread of opportunistic infections: in general the thriving of nosocomial pathogens. Genomics and evolutionary approaches have emerged as the cutting edge tools for studying this kind of infections, allowing to study the genomic features of bacterial strains and their evolution. Thanks to the possibility to sequence DNA at a constantly cheaper price, research projects are supported by a growing number of genomes and a considerable amount of genomic data is available in the databases, expanding the amount of possible investigations that can be performed. The first work presented here describes the evolution of the Clonal Complex 258 (CC258) of Klebsiella pneumoniae. Single nucleotide polymorphisms (SNPs) allowed to reconstruct the global phylogeny of the entire species and to collocate the CC258 in its evolutionary context. Furthermore, it was possible to detect the presence of a 1.3 Mb recombination in the genomes of the clade in analysis. A molecular clock approach allowed to date this and other previously discovered recombination events. These findings were used to complete the picture of the evolutionary history of CC258, which is characterized by frequent macro-recombination events. A quick evolutive strategy characterized by exchange of high amount of information is a common feature to other nosocomial pathogens, which develop \u201csuperbug\u201d phenotypes. Although common, the macro-recombination evolution model is not shared by all nosocomial infection bacteria. One exception is the SMAL strain of Acinetobacter baumannii, presented in another subproject of this thesis. In this work, the genomes of Sequence Type (ST) 78 of A. baumannii were analyzed. Phylogeny and comparative genomics revealed the presence of two different clades within the ST, presenting different evolutive \u201clifestyles\u201d. One group (containing the SMAL genomes) was characterized by a lower gene content variability and by the presence of a higher copy number of insertion sequences (ISs). One IS interrupts the comEC/rec2 gene in all the SMAL genomes. This gene codes for a protein involved in the exogenous DNA importation, thus its inactivation limits the gene exchange, suggesting an explanation for the low genomic plasticity. In another work presented in this document, genomic epidemiology was applied to reconstruct the spreading routes of a K. pneumoniae epidemic event in an hospital intensive care unit. At first, a phylogenetic approach was used to separate the isolates that belonged to the outbreak from the sporadic ones. Then the isolation dates and genomic SNPs allowed to build a genomic network, which modelled the chain of infection events in the ward. The reconstruction suggested a star-like diffusion of the pathogen from patient zero to the other infected ones, thus revealing a systematic error in the biosafety procedures of the hospital. This almost-forensic application of genomic epidemiology was also used in two other works presented, both of them concerning the reconstruction of food-borne infections. In one of the works, focused on Salmonella enterica, only synonymous SNPs were used as input to a phylogenetic based investigation, in order to filter out pathoadaptative mutations. In the other article, epidemiological data, molecular typing and SNP-based phylogeny were used to investigate the infection of nine Listeria monocytogenes isolates, which were believed to be part of the same outbreak and in the end proved to be genomically unrelated. Lastly, a review paper on genomic epidemiology is also presented. The article is focused on the latest high impact publications analyzing the genome evolution of bacterial pathogens as well as the propagation dynamics of epidemic outbreaks in very short periods of time. The article also describes the latest historical epidemiological studies, which are possible thanks to modern DNA isolation and sequencing technologies
    corecore