6,080 research outputs found

    Phylodynamic Reconstruction Reveals Norovirus GII.4 Epidemic Expansions and their Molecular Determinants

    Get PDF
    Noroviruses are the most common cause of viral gastroenteritis. An increase in the number of globally reported norovirus outbreaks was seen the past decade, especially for outbreaks caused by successive genogroup II genotype 4 (GII.4) variants. Whether this observed increase was due to an upswing in the number of infections, or to a surveillance artifact caused by heightened awareness and concomitant improved reporting, remained unclear. Therefore, we set out to study the population structure and changes thereof of GII.4 strains detected through systematic outbreak surveillance since the early 1990s. We collected 1383 partial polymerase and 194 full capsid GII.4 sequences. A Bayesian MCMC coalescent analysis revealed an increase in the number of GII.4 infections during the last decade. The GII.4 strains included in our analyses evolved at a rate of 4.3–9.0×10−3 mutations per site per year, and share a most recent common ancestor in the early 1980s. Determinants of adaptation in the capsid protein were studied using different maximum likelihood approaches to identify sites subject to diversifying or directional selection and sites that co-evolved. While a number of the computationally determined adaptively evolving sites were on the surface of the capsid and possible subject to immune selection, we also detected sites that were subject to constrained or compensatory evolution due to secondary RNA structures, relevant in virus-replication. We highlight codons that may prove useful in identifying emerging novel variants, and, using these, indicate that the novel 2008 variant is more likely to cause a future epidemic than the 2007 variant. While norovirus infections are generally mild and self-limiting, more severe outcomes of infection frequently occur in elderly and immunocompromized people, and no treatment is available. The observed pattern of continually emerging novel variants of GII.4, causing elevated numbers of infections, is therefore a cause for concern

    Quantifying evolutionary constraints on B cell affinity maturation

    Full text link
    The antibody repertoire of each individual is continuously updated by the evolutionary process of B cell receptor mutation and selection. It has recently become possible to gain detailed information concerning this process through high-throughput sequencing. Here, we develop modern statistical molecular evolution methods for the analysis of B cell sequence data, and then apply them to a very deep short-read data set of B cell receptors. We find that the substitution process is conserved across individuals but varies significantly across gene segments. We investigate selection on B cell receptors using a novel method that side-steps the difficulties encountered by previous work in differentiating between selection and motif-driven mutation; this is done through stochastic mapping and empirical Bayes estimators that compare the evolution of in-frame and out-of-frame rearrangements. We use this new method to derive a per-residue map of selection, which provides a more nuanced view of the constraints on framework and variable regions.Comment: Previously entitled "Substitution and site-specific selection driving B cell affinity maturation is consistent across individuals

    Molecular Variation at a Candidate Gene Implicated in the Regulation of Fire Ant Social Behavior

    Get PDF
    The fire ant Solenopsis invicta and its close relatives display an important social polymorphism involving differences in colony queen number. Colonies are headed by either a single reproductive queen (monogyne form) or multiple queens (polygyne form). This variation in social organization is associated with variation at the gene Gp-9, with monogyne colonies harboring only B-like allelic variants and polygyne colonies always containing b-like variants as well. We describe naturally occurring variation at Gp-9 in fire ants based on 185 full-length sequences, 136 of which were obtained from S. invicta collected over much of its native range. While there is little overall differentiation between most of the numerous alleles observed, a surprising amount is found in the coding regions of the gene, with such substitutions usually causing amino acid replacements. This elevated coding-region variation may result from a lack of negative selection acting to constrain amino acid replacements over much of the protein, different mutation rates or biases in coding and non-coding sequences, negative selection acting with greater strength on non-coding than coding regions, and/or positive selection acting on the protein. Formal selection analyses provide evidence that the latter force played an important role in the basal b-like lineages coincident with the emergence of polygyny. While our data set reveals considerable paraphyly and polyphyly of S. invicta sequences with respect to those of other fire ant species, the b-like alleles of the socially polymorphic species are monophyletic. An expanded analysis of colonies containing alleles of this clade confirmed the invariant link between their presence and expression of polygyny. Finally, our discovery of several unique alleles bearing various combinations of b-like and B-like codons allows us to conclude that no single b-like residue is completely predictive of polygyne behavior and, thus, potentially causally involved in its expression. Rather, all three typical b-like residues appear to be necessary

    Prevalence of Epistasis in the Evolution of Influenza A Surface Proteins

    Get PDF
    The surface proteins of human influenza A viruses experience positive selection to escape both human immunity and, more recently, antiviral drug treatments. In bacteria and viruses, immune-escape and drug-resistant phenotypes often appear through a combination of several mutations that have epistatic effects on pathogen fitness. However, the extent and structure of epistasis in influenza viral proteins have not been systematically investigated. Here, we develop a novel statistical method to detect positive epistasis between pairs of sites in a protein, based on the observed temporal patterns of sequence evolution. The method rests on the simple idea that a substitution at one site should rapidly follow a substitution at another site if the sites are positively epistatic. We apply this method to the surface proteins hemagglutinin and neuraminidase of influenza A virus subtypes H3N2 and H1N1. Compared to a non-epistatic null distribution, we detect substantial amounts of epistasis and determine the identities of putatively epistatic pairs of sites. In particular, using sequence data alone, our method identifies epistatic interactions between specific sites in neuraminidase that have recently been demonstrated, in vitro, to confer resistance to the drug oseltamivir; these epistatic interactions are responsible for widespread drug resistance among H1N1 viruses circulating today. This experimental validation demonstrates the predictive power of our method to identify epistatic sites of importance for viral adaptation and public health. We conclude that epistasis plays a large role in shaping the molecular evolution of influenza viruses. In particular, sites with , which would normally not be identified as positively selected, can facilitate viral adaptation through epistatic interactions with their partner sites. The knowledge of specific interactions among sites in influenza proteins may help us to predict the course of antigenic evolution and, consequently, to select more appropriate vaccines and drugs

    Bayesian codon models for detecting convergent molecular adaptation

    Full text link
    Modéliser le jeu combiné de la mutation et de la sélection au niveau moléculaire représente un des objectifs majeurs des sciences de l’évolution. L’acquisition massive de séquences génétiques au cours des dernières années a fourni un matériel abondant pour de telles analyses empiriques. Les modèles à codons sont de plus en plus utilisés en vue de fournir une description réaliste des processus de substitution des séquences codant pour les protéines. Parmi eux, les modèles mécanistes paramétrisent de façon séparée les effets mutationnels et sélectifs qui se combinent au sein du processus substitutionnel. Ces approches mécanistes caractérisent les effets sélectifs en s’appuyant sur un modèle explicite du paysage de fitness auquel la séquence protéique est soumise. Toutefois, jusqu’à présent, le paysage de fitness a toujours été considéré comme constant, alors qu’il existe des situations empiriques pour lesquelles le paysage de fitness subit en réalité des fluctuations écologiques au cours du temps. Lorsqu’une information empirique est par ailleurs disponible, concernant des différences systématiques de pression de sélection en fonction des fluctuations environnementales, il est alors possible de modéliser explicitement ces modulations du paysage de fitness. Nous avons développé un modèle à codons mécaniste, dont le but est de détecter ces effets sélectifs différentiels dépendant des conditions environnementales. Ce modèle a été implémenté dans un cadre d’inférence bayésienne, et a tout d’abord été appliqué au cas de l’évolution du VIH. Le VIH évolue sous la pression du système immunitaire de son hôte humain. Notre modèle de sélection différentielle (DS) décrit les mécanismes détaillés de l’évolution du VIH sous les contraintes induites par le fond génétique de l’hôte (par exemple, le HLA). De ce fait, il permet de trouver des associations entre adaptations du virus et profil HLA des hôtes. À long terme, notre approche permettra une meilleure compréhension du phénomène d’échappement du virus à la surveillance immunitaire de l’hôte, ce qui fournira alors des informations utiles en vue de l’élaboration d’un vaccin efficace contre le SIDA. Nous avons également appliqué notre modèle au gène de la Rubisco, une enzyme responsable d’une étape majeure de la photosynthèse. L’évolution de la Rubisco semble montrer des différences systématiques entre plantes dites C3 et C4, différences liées à des changements environnementaux. En utilisant le modèle DS, nous avons mis en évidence des effets systématiques d’adaptation convergente au niveau moléculaire, chez les espèces C4, par rapport aux espèces C3. Finalement, nous avons contrasté les résultats obtenus avec le modèle DS sur cet exemple avec ceux fournis par les modèles à codons classiques, basés sur l’estimation du dN/dS. Cette analyse comparée nous permet d’illustrer une différence conceptuelle fondamentale entre ces deux types de modèles à codons, concernant le type de régime sélectif que chaque type de modèle cherche à caractériser: à savoir, sélection directionnelle, contre adaptation continuelle.Modeling the interplay between mutation and selection at the molecular level is one of the primary goals in molecular evolution. Massive acquisition of genetic sequence data in recent years has provided a wealth of information for such empirically-driven studies. Codon-based models are increasingly used to give a realistic description of the substitution process in protein-coding genes. Among them, the mechanistic codon-based modeling approach distinctly parameterizes mutational and selective effects bearing on the overall substitution process. These mechanistic approaches characterize the selective pressure by relying on an explicit model of the amino acid fitness landscape over the sequence. Thus far, a constant fitness landscape has generally been assumed. Yet, there are some situations in which the fitness landscape experiences some environmental fluctuations through time. When the empirical knowledge about the systematic difference in selective pressures is available, regarding the fluctuating environment, it is possible to explicitly model condition-specific amino acid fitness modulations. In this thesis, we developed a codon-based model to capture these differential condition-specific selective effects on coding sequences. This model was implemented in a Bayesian framework and was first applied to HIV, which evolves under the selection pressure of the host immune system. Our Differential Selection (DS) model describes the detailed mechanisms of evolution of HIV under the constraints defined by host genetic backgrounds (e.g., Human Leukocyte Antigen). Therefore, it is possible to find associations between specific viral adaptations and specific HLA alleles of the hosts. Ultimately, our approach will enable us to understand better how the virus escapes from the host immune response, which will, in turn, provide a useful guideline for designing an efficient vaccine against AIDS. We also applied the DS model on Rubisco, an enzyme responsible for a major step in photosynthesis. The evolution of Rubisco has been shown to be different in C3 and C4 plants, as a consequence of differing environmental conditions. We used the DS model to reveal the consistent patterns of convergent adaptation in Rubisco in C4 plants, compared to C3 plants. Finally, we contrasted our results from DS model with those obtained under classical codon models based on the estimation of dN/dS. This comparative analysis allows us to illustrate a fundamental conceptual difference between these two types of codon models, which are meant to detect different selective regimes: directional selection versus ongoing adaptation

    A MOSAIC of methods: Improving ortholog detection through integration of algorithmic diversity

    Full text link
    Ortholog detection (OD) is a critical step for comparative genomic analysis of protein-coding sequences. In this paper, we begin with a comprehensive comparison of four popular, methodologically diverse OD methods: MultiParanoid, Blat, Multiz, and OMA. In head-to-head comparisons, these methods are shown to significantly outperform one another 12-30% of the time. This high complementarity motivates the presentation of the first tool for integrating methodologically diverse OD methods. We term this program MOSAIC, or Multiple Orthologous Sequence Analysis and Integration by Cluster optimization. Relative to component and competing methods, we demonstrate that MOSAIC more than quintuples the number of alignments for which all species are present, while simultaneously maintaining or improving functional-, phylogenetic-, and sequence identity-based measures of ortholog quality. Further, we demonstrate that this improvement in alignment quality yields 40-280% more confidently aligned sites. Combined, these factors translate to higher estimated levels of overall conservation, while at the same time allowing for the detection of up to 180% more positively selected sites. MOSAIC is available as python package. MOSAIC alignments, source code, and full documentation are available at http://pythonhosted.org/bio-MOSAIC

    Codon pairs of the HIV-1 vif gene correlate with CD4+T cell count

    Get PDF
    Background: the human APOBEC3G (A3G) protein activity is associated with innate immunity against HIV-1 by inducing high rates of guanosines to adenosines (G-to-A) mutations (viz., hypermutation) in the viral DNA. If hypermutation is not enough to disrupt the reading frames of viral genes, it may likely increase the HIV-1 diversity. To counteract host innate immunity HIV-1 encodes the Vif protein that binds A3G protein and form complexes to be degraded by cellular proteolysis.Methods: Here we studied the pattern of substitutions in the vif gene and its association with clinical status of HIV-1 infected individuals. To perform the study, unique vif gene sequences were generated from 400 antiretroviral-naive individuals.Results: the codon pairs: 78-154, 85-154, 101-157, 105-157, and 105-176 of vif gene were associated with CD4+ T cell count lower than 500 cells per mm(3). Some of these codons were located in the (81)LGQGVSIEW(89) region and within the BC-Box. We also identified codons under positive selection clustered in the N-terminal region of Vif protein, between (WKSLVK26)-W-21 and (YRHHY44)-Y-40 regions (i.e., 31, 33, 37, 39), within the BC-Box (i.e., 155, 159) and the Cullin5-Box (i.e., 168) of vif gene. All these regions are involved in the Vif-induced degradation of A3G/F complexes and the N-terminal of Vif protein binds to viral and cellular RNA.Conclusions: Adaptive evolution of vif gene was mostly to optimize viral RNA binding and A3G/F recognition. Additionally, since there is not a fully resolved structure of the Vif protein, codon pairs associated with CD4+ T cell count may elucidate key regions that interact with host cell factors. Here we identified and discriminated codons under positive selection and codons under functional constraint in the vif gene of HIV-1.Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Japan Society for the Promotion of Science (SPS KAKENHI)Universidade Federal de São Paulo, Dept Med, São Paulo, BrazilUniv Tokyo, Grad Sch Agr & Life Sci, Tokyo, JapanFed Univ Para, Inst Biotechnol, BR-66059 Belem, Para, BrazilUniv Vigo, Dept Biochem Genet & Immunol, Bioinformat & Mol Evolut Lab, Vigo 36310, SpainUniversidade Federal de São Paulo, Dept Med, São Paulo, BrazilFAPESP: 06/50109-5Japan Society for the Promotion of Science (SPS KAKENHI): 19300094Web of Scienc

    The molecular phylogeny of placental mammals and its application to uncovering signatures of molecular adaptation.

    Get PDF
    Considerable conflict remains in the literature as to the position of the root of placental mammals, and the placement of several intra-ordinal groups. Debate continues over the use of DNA or amino acids datasets and over the use of Supertree or Supermatrix approaches. Known phenomena exist within mammal data that complicate the reconstruction of phylogeny. These include (but are not limited to), variation in longevity, body size, metabolic rates, and germ-line generation time that result in variation in mutation rates and composition biases. Previous attempts to resolve the placental mammal phylogeny have used homogeneous evolutionary models that cannot capture and adequately describe these features across the species sampled. In this thesis I explore the properties of different datasets and data types and their suitability to the resolution of the mammal phylogeny at different depths: (i) the position of the root of the placental mammals, and (ii), the intraordinal placements within the Laurasiatheria. The datasets tested were (i) mitochondrial and nuclear data types, (ii) previously published datasets for mammals, and (iii), datasets I assembled specifically for analyses at different phylogenetic depths. I propose and apply the use of heterogeneous models to resolve the position of the root of the placental mammal phylogeny to these datasets. Reconstruction of a robust mammal phylogeny provides us with an essential framework for understanding the molecular underpinnings of adaptation to environment. The placental mammals display a huge variations in life traits such longevity, body size and DNA repair efficiency, since they emerged ~100 million years ago. With this robust phylogeny, I set out to determine the level of adaptive and non-adaptive processes acting on a set of mammal genes that are linked with longevity and cancer. The results of these analyses yield important insights into data and model suitability, and provide strong evidence for a single hypothesis for the rooting of placental mammals. These results also show that Laurasiatheria intra-ordinal placements are not fully resolved and additional sampling from this diverse clade is required. Using this resolved phylogeny, specific molecular adaptations and non-adaptive mechanisms were identified in the mammalia for a set of telomere-associated genes

    Phylogenetic influence of complex, evolutionary models: a Bayesian approach

    Get PDF
    Molecular evolution recovers the history of living species by comparing genetic information, exploring genome structure and function from an evolutionary perspective. Here we infer substitution rates and ancestral reconstructions, to better understand mutation responses to some known biochemical phenomena. Mutation processes are commonly inferred using parsimony, maximum likelihood and Bayesian. Parsimony is not explicitly model-based, and is statistically biased due to unrealistic assumptions. The model-based maximum likelihood approaches become computationally inefficient while analyzing large or high-dimensional datasets, leaving little opportunities to incorporate complex evolutionary models. We implemented a posterior probability (Bayesian) approach that evaluates evolutionary models, applying it to primate mitochondrial genomes. The species nucleotide sequence data were augmented with ancestral states at the internal nodes of the phylogeny. We simplified probability calculations for substitution events along the branches by assuming that only up to one or two substitution events occurred per branch per site. These conditional pathway calculations introduce very little bias into the inferred reconstructions, while increasing the feasibility of incorporating complex evolutionary models with higher dimensions. Compositional bias tests, including functional predictions of ancestral tRNAs, show that ancestral sequences from the Bayesian approach are more biologically realistic than those reconstructed by maximum likelihood. To explore other model complexity, we allowed substitution rates to vary among sites by having a different model at each site. With a strand-symmetric model as the base model, asymmetric substitution probabilities for specific substitution types were varied among sites. This model would not be feasible with standard matrix exponentiation methods, particularly maximum likelihood. We observed for A--\u3eG and C--\u3eT substitutions almost linear, respectively, almost asymptotic responses (with some regional deviations). Note that the HMM models had no a priori response built in them. Observed responses fitted predictions from earlier gene by gene likelihood analyses. For A--\u3eG substitutions, deviations from the expected linear response correlated positively with the loop-forming propensity of the corresponding site in the mRNA secondary structure. In the COI region, C--\u3eT substitutions have a prominent dip, suggesting protection against mutations. The C--\u3eT substitution responses differed significantly between primate sub-groups defined based on their single genome A--\u3eG responses

    HIV-1 infected monozygotic twins: a tale of two outcomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Replicate experiments are often difficult to find in evolutionary biology, as this field is inherently an historical science. However, viruses, bacteria and phages provide opportunities to study evolution in both natural and experimental contexts, due to their accelerated rates of evolution and short generation times. Here we investigate HIV-1 evolution by using a natural model represented by monozygotic twins infected synchronically at birth with an HIV-1 population from a shared blood transfusion source. We explore the evolutionary processes and population dynamics that shape viral diversity of HIV in these monozygotic twins.</p> <p>Results</p> <p>Despite the identical host genetic backdrop of monozygotic twins and the identical source and timing of the HIV-1 inoculation, the resulting HIV populations differed in genetic diversity, growth rate, recombination rate, and selection pressure between the two infected twins.</p> <p>Conclusions</p> <p>Our study shows that the outcome of evolution is strikingly different between these two "replicates" of viral evolution. Given the identical starting points at infection, our results support the impact of random epigenetic selection in early infection dynamics. Our data also emphasize the need for a better understanding of the impact of host-virus interactions in viral evolution.</p
    corecore