    Evolutionary systems biology of virus-host interactions

    The evolution of virus-host interactions occurs at multiple levels of biological complexity, such as organismal, genetic, and molecular levels. In the first part of this study, the evolution of associations between herpesviruses (HVs) and theirhosts are examined across more than 400 million years. Recent studies have been demonstrating that cospeciations are not always the main event driving HV evolution, asinterhost speciations and host switches also play important roles. The present study shows that more than topological incongruences, mismatches on divergence times are the main source of disagreements between host and viral phylogenies, which reveals host switches, intrahost speciations and viral losses along the evolution of HVs. Herpesviruses have large genomes encoding dozens of proteins. Apart from amino acid substitutions, these viruses also evolve by acquiring, duplicating and losing protein domains. Although the domain repertoires of HVs differ across species, a core set of domains is shared among all of them. This second part of this study reveals that 28 out 41 core domains encoded by HV ancestors are still found in present-day repertoires, which over time were expanded by domain gains and duplications. Distinct evolutionary strategies led HVs to developed very specific domain repertoires, which may explain their host range and tissue tropism, and provide hints on the origins of herpesviruses. Despite the fact that most mutations in proteins are deleterious, few of them end up improving viral fitness and defining how viruses interact with their hosts. By using an integrative approach, the third part of this study investigates the evolution of protein-protein interactions (PPIs) involving the membrane proteins Nectins, and the herpesviral envelope glycoproteins D/G. By means of ancestral sequence reconstruction and homology modelling, ancestral structures of these protein complexes were generated, and analysis of their interaction energies revealed important differences of binding affinity along their evolution.Open Acces

    Méthodes et algorithmes pour l’amélioration de l’inférence de l’histoire évolutive des génomes

    Les phylogénies de gènes offrent un cadre idéal pour l’étude comparative des génomes. Non seulement elles incorporent l’évolution des espèces par spéciation, mais permettent aussi de capturer l’expansion et la contraction des familles de gènes par gains et pertes de gènes. La détermination de l’ordre et de la nature de ces événements équivaut à inférer l’histoire évolutive des familles de gènes, et constitue un prérequis à plusieurs analyses en génomique comparative. En effet, elle est requise pour déterminer efficacement les relations d’orthologies entre gènes, importantes pour la prédiction des structures et fonctions de protéines et les analyses phylogénétiques, pour ne citer que ces applications. Les méthodes d’inférence d’histoires évolutives de familles de gènes supposent que les phylogénies considérées sont dénuées d’erreurs. Ces phylogénies de gènes, souvent recons- truites à partir des séquences d’acides aminés ou de nucléotides, ne représentent cependant qu’une estimation du vrai arbre de gènes et sont sujettes à des erreurs provenant de sources variées, mais bien documentées. Pour garantir l’exactitude des histoires inférées, il faut donc s’assurer de l’absence d’erreurs au sein des arbres de gènes. Dans cette thèse, nous étudions cette problématique sous deux aspects. Le premier volet de cette thèse concerne l’identification des déviations du code génétique, l’une des causes d’erreurs d’annotations se propageant ensuite dans les phylogénies. Nous développons à cet effet, une méthodologie pour l’inférence de déviations du code génétique standard par l’analyse des séquences codantes et des ARNt. Cette méthodologie est cen- trée autour d’un algorithme de prédiction de réaffectations de codons, appelé CoreTracker. Nous montrons tout d’abord l’efficacité de notre méthode, puis l’utilisons pour démontrer l’évolution du code génétique dans les génomes mitochondriaux des algues vertes. Le second volet de la thèse concerne le développement de méthodes efficaces pour la correction et la construction d’arbres phylogénétiques de gènes. Nous présentons deux méthodes exploitant l’information sur l’évolution des espèces. La première, ProfileNJ , est déterministe et très rapide. Elle corrige les arbres de gènes en ciblant exclusivement les sous-arbres présentant un support statistique faible. Son application sur les familles de gènes d’Ensembl Compara montre une amélioration nette de la qualité des arbres, par comparaison à ceux proposés par la base de données. La seconde, GATC, utilise un algorithme génétique et traite le problème comme celui de l’optimisation multi-objectif de la topologie des arbres de gènes, étant données des contraintes relatives à l’évolution des familles de gènes par mutation de séquences et par gain/perte de gènes. Nous montrons qu’une telle approche est non seulement efficace, mais appropriée pour la construction d’ensemble d’arbres de référence.Gene trees offer a proper framework for comparative genomics. Not only do they provide information about species evolution through speciation events, but they also capture gene family expansion and contraction by gene gains and losses. They are thus used to infer the evolutionary history of gene families and accurately predict the orthologous relationship between genes, on which several biological analyses rely. Methods for inferring gene family evolution explicitly assume that gene trees are known without errors. However, standard phylogenetic methods for tree construction based on se- quence data are well documented as error-prone. Gene trees constructed using these methods will usually introduce biases during the inference of gene family histories. In this thesis, we present new methods aiming to improve the quality of phylogenetic gene trees and thereby the accuracy of underlying evolutionary histories of their corresponding gene families. We start by providing a framework to study genetic code deviations, one possible reason of annotation errors that could then spread to the phylogeny reconstruction. Our framework is based on analysing coding sequences and tRNAs to predict codon reassignments. We first show its efficiency, then apply it to green plant mitochondrial genomes. The second part of this thesis focuses on the development of efficient species tree aware methods for gene tree construction. We present ProfileNJ , a fast and deterministic correction method that targets weakly supported branches of a gene tree. When applied to the gene families of the Ensembl Compara database, ProfileNJ produces an arguably better set of gene trees compared to the ones available in Ensembl Compara. We later use a different strategy, based on a genetic algorithm, allowing both construction and correction of gene trees. This second method called GATC, treats the problem as a multi-objective optimisation problem in which we are looking for the set of gene trees optimal for both sequence data and information of gene family evolution through gene gain and loss. We show that this approach yields accurate trees and is suitable for the construction of reference datasets to benchmark other methods

    Evolutionary genomics : statistical and computational methods

    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward

    Evaluating, Accelerating and Extending the Multispecies Coalescent Model of Evolution

    So much research builds on evolutionary histories of species and genes. They are used in genomics to infer synteny, in ecology to describe and predict biodiversity, and in molecular biology to transfer knowledge acquired in model organisms to humans and crops. Beyond downstream applications, expanding our knowledge of life on Earth is important in its own right. From Naturalis Historia to On the Origin of Species, the acquisition of this knowledge has been a part of human development. Evolutionary histories are commonly represented as trees, where a common ancestor progressively splits into descendant species or alleles. Time trees add more information by using height to represent genetic distance or elapsed time. Species and gene trees can be inferred from molecular sequences using methods which are explicitly model-based, or implicitly assume or are statistically consistent with a particular model of evolution. One such model, the multispecies coalescent (MSC), is the topic of my thesis. Under this model, separate trees are inferred for the species history and for each gene’s history. Gene trees are embedded within the species tree according to a coalescent process. Researchers often avoid the MSC when reconstructing time trees because of claims that available implementations are too computationally demanding. Instead, the species history is inferred using a single tree by concatenating the sequences from each gene. I began my thesis research by evaluating the effect of this approximation. In a realistic simulation based on parameters inferred from empirical data, concatenation was grossly inaccurate, especially when estimating recent species divergence times. In a later simulation study I demonstrated that when using concatenation, credible intervals often excluded the true values. To address reluctance towards using the MSC, I developed a faster implementation of the model. StarBEAST2 is a Markov chain Monte Carlo (MCMC) method, meaning it characterizes the probability distribution over trees by randomly walking the parameter space. I improved computational performance by developing more efficient proposals used to traverse the space, and reducing the number of parameters in the model through analytical integration of population sizes. Despite its sophistication, the MSC has theoretical limitations. One is that the substitution rate is assumed to stay constant, or uncorrelated between lineages of different genes. However substitution rates do vary and are associated with species traits like body size. I addressed this assumption in StarBEAST2 by extending the MSC to estimate substitution rates for each species. Another assumption is that genetic material cannot be transferred horizontally, but a more general model called the multispecies network coalescent (MSNC) permits introgression of alleles across species boundaries. My collaborators and I have developed and evaluated an MCMC implementation of the the MSNC. My final thesis project was to combine the MSC with the fossilized birth-death (FBD) process, which models how species are fossilized and sampled through time. To demonstrate the utility of the FBD-MSC model, I used it to reconstruct the evolutionary history of Caninae (dogs and foxes) using fossil data and molecular sequences

    Graph-based modeling and evolutionary analysis of microbial metabolism

    Microbial organisms are responsible for most of the metabolic innovations on Earth. Understanding microbial metabolism helps shed the light on questions that are central to biology, biomedicine, energy and the environment. Graph-based modeling is a powerful tool that has been used extensively for elucidating the organising principles of microbial metabolism and the underlying evolutionary forces that act upon it. Nevertheless, various graph-theoretic representations and techniques have been applied to metabolic networks, rendering the modeling aspect ad hoc and highlighting the conflicting conclusions based on the different representations. The contribution of this dissertation is two-fold. In the first half, I revisit the modeling aspect of metabolic networks, and present novel techniques for their representation and analysis. In particular, I explore the limitations of standard graphs representations, and the utility of the more appropriate model---hypergraphs---for capturing metabolic network properties. Further, I address the task of metabolic pathway inference and the necessity to account for chemical symmetries and alternative tracings in this crucial task. In the second part of the dissertation, I focus on two evolutionary questions. First, I investigate the evolutionary underpinnings of the formation of communities in metabolic networks---a phenomenon that has been reported in the literature and implicated in an organism's adaptation to its environment. I find that the metabolome size better explains the observed community structures. Second, I correlate evolution at the genome level with emergent properties at the metabolic network level. In particular, I quantify the various evolutionary events (e.g., gene duplication, loss, transfer, fusion, and fission) in a group of proteobacteria, and analyze their role in shaping the metabolic networks and determining the organismal fitness. As metabolism gains an increasingly prominent role in biomedical, energy, and environmental research, understanding how to model this process and how it came about during evolution become more crucial. My dissertation provides important insights in both directions

    The role of visual adaptation in cichlid fish speciation

    D. Shane Wright (1) , Ole Seehausen (2), Ton G.G. Groothuis (1), Martine E. Maan (1) (1) University of Groningen; GELIFES; EGDB(2) Department of Fish Ecology & Evolution, EAWAG Centre for Ecology, Evolution and Biogeochemistry, Kastanienbaum AND Institute of Ecology and Evolution, Aquatic Ecology, University of Bern.In less than 15,000 years, Lake Victoria cichlid fishes have radiated into as many as 500 different species. Ecological and sexual sel ection are thought to contribute to this ongoing speciation process, but genetic differentiation remains low. However, recent work in visual pigment genes, opsins, has shown more diversity. Unlike neighboring Lakes Malawi and Tanganyika, Lake Victoria is highly turbid, resulting in a long wavelength shift in the light spectrum with increasing depth, providing an environmental gradient for exploring divergent coevolution in sensory systems and colour signals via sensory drive. Pundamilia pundamila and Pundamilia nyererei are two sympatric species found at rocky islands across southern portions of Lake Victoria, differing in male colouration and the depth they reside. Previous work has shown species differentiation in colour discrimination, corresponding to divergent female preferences for conspecific male colouration. A mechanistic link between colour vision and preference would provide a rapid route to reproductive isolation between divergently adapting populations. This link is tested by experimental manip ulation of colour vision - raising both species and their hybrids under light conditions mimicking shallow and deep habitats. We quantify the expression of retinal opsins and test behaviours important for speciation: mate choice, habitat preference, and fo raging performance
