58 research outputs found

    HIV-Specific Probabilistic Models of Protein Evolution

    Get PDF
    Comparative sequence analyses, including such fundamental bioinformatics techniques as similarity searching, sequence alignment and phylogenetic inference, have become a mainstay for researchers studying type 1 Human Immunodeficiency Virus (HIV-1) genome structure and evolution. Implicit in comparative analyses is an underlying model of evolution, and the chosen model can significantly affect the results. In general, evolutionary models describe the probabilities of replacing one amino acid character with another over a period of time. Most widely used evolutionary models for protein sequences have been derived from curated alignments of hundreds of proteins, usually based on mammalian genomes. It is unclear to what extent these empirical models are generalizable to a very different organism, such as HIV-1–the most extensively sequenced organism in existence. We developed a maximum likelihood model fitting procedure to a collection of HIV-1 alignments sampled from different viral genes, and inferred two empirical substitution models, suitable for describing between-and within-host evolution. Our procedure pools the information from multiple sequence alignments, and provided software implementation can be run efficiently in parallel on a computer cluster. We describe how the inferred substitution models can be used to generate scoring matrices suitable for alignment and similarity searches. Our models had a consistently superior fit relative to the best existing models and to parameter-rich data-driven models when benchmarked on independent HIV-1 alignments, demonstrating evolutionary biases in amino-acid substitution that are unique to HIV, and that are not captured by the existing models. The scoring matrices derived from the models showed a marked difference from common amino-acid scoring matrices. The use of an appropriate evolutionary model recovered a known viral transmission history, whereas a poorly chosen model introduced phylogenetic error. We argue that our model derivation procedure is immediately applicable to other organisms with extensive sequence data available, such as Hepatitis C and Influenza A viruses

    Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution

    Get PDF
    Models of protein evolution currently come in two flavors: generalist and specialist. Generalist models (e.g. PAM, JTT, WAG) adopt a one-size-fits-all approach, where a single model is estimated from a number of different protein alignments. Specialist models (e.g. mtREV, rtREV, HIVbetween) can be estimated when a large quantity of data are available for a single organism or gene, and are intended for use on that organism or gene only. Unsurprisingly, specialist models outperform generalist models, but in most instances there simply are not enough data available to estimate them. We propose a method for estimating alignment-specific models of protein evolution in which the complexity of the model is adapted to suit the richness of the data. Our method uses non-negative matrix factorization (NNMF) to learn a set of basis matrices from a general dataset containing a large number of alignments of different proteins, thus capturing the dimensions of important variation. It then learns a set of weights that are specific to the organism or gene of interest and for which only a smaller dataset is available. Thus the alignment-specific model is obtained as a weighted sum of the basis matrices. Having been constrained to vary along only as many dimensions as the data justify, the model has far fewer parameters than would be required to estimate a specialist model. We show that our NNMF procedure produces models that outperform existing methods on all but one of 50 test alignments. The basis matrices we obtain confirm the expectation that amino acid properties tend to be conserved, and allow us to quantify, on specific alignments, how the strength of conservation varies across different properties. We also apply our new models to phylogeny inference and show that the resulting phylogenies are different from, and have improved likelihood over, those inferred under standard models

    Evidence for Positive Selection in Putative Virulence Factors within the Paracoccidioides brasiliensis Species Complex

    Get PDF
    Paracoccidioides brasiliensis is a dimorphic fungus that is the causative agent of paracoccidioidomycosis, the most important prevalent systemic mycosis in Latin America. Recently, the existence of three genetically isolated groups in P. brasiliensis was demonstrated, enabling comparative studies of molecular evolution among P. brasiliensis lineages. Thirty-two gene sequences coding for putative virulence factors were analyzed to determine whether they were under positive selection. Our maximum likelihood–based approach yielded evidence for selection in 12 genes that are involved in different cellular processes. An in-depth analysis of four of these genes showed them to be either antigenic or involved in pathogenesis. Here, we present evidence indicating that several replacement mutations in gp43 are under positive balancing selection. The other three genes (fks, cdc42 and p27) show very little variation among the P. brasiliensis lineages and appear to be under positive directional selection. Our results are consistent with the more general observations that selective constraints are variable across the genome, and that even in the genes under positive selection, only a few sites are altered. We present our results within an evolutionary framework that may be applicable for studying adaptation and pathogenesis in P. brasiliensis and other pathogenic fungi

    Evolution and Phylogenetic Analysis of Full-Length VP3 Genes of Eastern Mediterranean Bluetongue Virus Isolates

    Get PDF
    Bluetongue virus (BTV) is the ‘type’ species of the genus Orbivirus within the family Reoviridae. The BTV genome is composed of ten linear segments of double-stranded RNA (dsRNA), each of which codes for one of ten distinct viral proteins. Previous phylogenetic comparisons have evaluated variations in genome segment 3 (Seg-3) nucleotide sequence as way to identify the geographical origin (different topotypes) of BTV isolates. The full-length nucleotide sequence of genome Seg-3 was determined for thirty BTV isolates recovered in the eastern Mediterranean region, the Balkans and other geographic areas (Spain, India, Malaysia and Africa). These data were compared, based on molecular variability, positive-selection-analysis and maximum-likelihood phylogenetic reconstructions (using appropriate substitution models) to 24 previously published sequences, revealing their evolutionary relationships. These analyses indicate that negative selection is a major force in the evolution of BTV, restricting nucleotide variability, reducing the evolutionary rate of Seg-3 and potentially of other regions of the BTV genome. Phylogenetic analysis of the BTV-4 strains isolated over a relatively long time interval (1979–2000), in a single geographic area (Greece), showed a low level of nucleotide diversity, indicating that the virus can circulate almost unchanged for many years. These analyses also show that the recent incursions into south-eastern Europe were caused by BTV strains belonging to two different major-lineages: representing an ‘eastern’ (BTV-9, -16 and -1) and a ‘western’ (BTV-4) group/topotype. Epidemiological and phylogenetic analyses indicate that these viruses originated from a geographic area to the east and southeast of Greece (including Cyprus and the Middle East), which appears to represent an important ecological niche for the virus that is likely to represent a continuing source of future BTV incursions into Europe

    Shifting patterns of natural variation in the nuclear genome of caenorhabditis elegans

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome wide analysis of variation within a species can reveal the evolution of fundamental biological processes such as mutation, recombination, and natural selection. We compare genome wide sequence differences between two independent isolates of the nematode <it>Caenorhabditis elegans </it>(CB4856 and CB4858) and the reference genome (N2).</p> <p>Results</p> <p>The base substitution pattern when comparing N2 against CB4858 reveals a transition over transversion bias (1.32:1) that is not present in CB4856. In CB4856, there is a significant bias in the direction of base substitution. The frequency of A or T bases in N2 that are G or C bases in CB4856 outnumber the opposite frequencies for transitions as well as transversions. These differences were not observed in the N2/CB4858 comparison. Similarly, we observed a strong bias for deletions over insertions in CB4856 (1.44: 1) that is not present in CB4858. In both CB4856 and CB4858, there is a significant correlation between SNP rate and recombination rate on the autosomes but not on the X chromosome. Furthermore, we identified numerous significant hotspots of variation in the CB4856-N2 comparison.</p> <p>In both CB4856 and CB4858, based on a measure of the strength of selection (k<sub>a</sub>/k<sub>s</sub>), all the chromosomes are under negative selection and in CB4856, there is no difference in the strength of natural selection in either the autosomes versus X or between any of the chromosomes. By contrast, in CB4858, k<sub>a</sub>/k<sub>s </sub>values are smaller in the autosomes than in the X chromosome. In addition, in CB4858, k<sub>a</sub>/k<sub>s </sub>values differ between chromosomes.</p> <p>Conclusions</p> <p>The clear bias of deletions over insertions in CB4856 suggests that either the CB4856 genome is becoming smaller or the N2 genome is getting larger. We hypothesize the hotspots found represent alleles that are shared between CB4856 and CB4858 but not N2. Because the k<sub>a</sub>/k<sub>s </sub>ratio in the X chromosome is higher than the autosomes on average in CB4858, purifying selection is reduced on the X chromosome.</p

    Parps: Rapidly Evolving Weapons in the War against Viral Infection

    Get PDF
    Post-translational protein modifications such as phosphorylation and ubiquitinylation are common molecular targets of conflict between viruses and their hosts. However, the role of other post-translational modifications, such as ADP-ribosylation, in host-virus interactions is less well characterized. ADP-ribosylation is carried out by proteins encoded by the PARP (also called ARTD) gene family. The majority of the 17 human PARP genes are poorly characterized. However, one PARP protein, PARP13/ZAP, has broad antiviral activity and has evolved under positive (diversifying) selection in primates. Such evolution is typical of domains that are locked in antagonistic 'arms races' with viral factors. To identify additional PARP genes that may be involved in host-virus interactions, we performed evolutionary analyses on all primate PARP genes to search for signatures of rapid evolution. Contrary to expectations that most PARP genes are involved in 'housekeeping' functions, we found that nearly one-third of PARP genes are evolving under strong recurrent positive selection. We identified a >300 amino acid disordered region of PARP4, a component of cytoplasmic vault structures, to be rapidly evolving in several mammalian lineages, suggesting this region serves as an important host-pathogen specificity interface. We also found positive selection of PARP9, 14 and 15, the only three human genes that contain both PARP domains and macrodomains. Macrodomains uniquely recognize, and in some cases can reverse, protein mono-ADP-ribosylation, and we observed strong signatures of recurrent positive selection throughout the macro-PARP macrodomains. Furthermore, PARP14 and PARP15 have undergone repeated rounds of gene birth and loss during vertebrate evolution, consistent with recurrent gene innovation. Together with previous studies that implicated several PARPs in immunity, as well as those that demonstrated a role for virally encoded macrodomains in host immune evasion, our evolutionary analyses suggest that addition, recognition and removal of ADP-ribosylation is a critical, underappreciated currency in host-virus conflicts

    The Development of Three Long Universal Nuclear Protein-Coding Locus Markers and Their Application to Osteichthyan Phylogenetics with Nested PCR

    Get PDF
    BACKGROUND: Universal nuclear protein-coding locus (NPCL) markers that are applicable across diverse taxa and show good phylogenetic discrimination have broad applications in molecular phylogenetic studies. For example, RAG1, a representative NPCL marker, has been successfully used to make phylogenetic inferences within all major osteichthyan groups. However, such markers with broad working range and high phylogenetic performance are still scarce. It is necessary to develop more universal NPCL markers comparable to RAG1 for osteichthyan phylogenetics. METHODOLOGY/PRINCIPAL FINDINGS: We developed three long universal NPCL markers (>1.6 kb each) based on single-copy nuclear genes (KIAA1239, SACS and TTN) that possess large exons and exhibit the appropriate evolutionary rates. We then compared their phylogenetic utilities with that of the reference marker RAG1 in 47 jawed vertebrate species. In comparison with RAG1, each of the three long universal markers yielded similar topologies and branch supports, all in congruence with the currently accepted osteichthyan phylogeny. To compare their phylogenetic performance visually, we also estimated the phylogenetic informativeness (PI) profile for each of the four long universal NPCL markers. The PI curves indicated that SACS performed best over the whole timescale, while RAG1, KIAA1239 and TTN exhibited similar phylogenetic performances. In addition, we compared the success of nested PCR and standard PCR when amplifying NPCL marker fragments. The amplification success rate and efficiency of the nested PCR were overwhelmingly higher than those of standard PCR. CONCLUSIONS/SIGNIFICANCE: Our work clearly demonstrates the superiority of nested PCR over the conventional PCR in phylogenetic studies and develops three long universal NPCL markers (KIAA1239, SACS and TTN) with the nested PCR strategy. The three markers exhibit high phylogenetic utilities in osteichthyan phylogenetics and can be widely used as pilot genes for phylogenetic questions of osteichthyans at different taxonomic levels

    Accelerated Evolution of Mitochondrial but Not Nuclear Genomes of Hymenoptera: New Evidence from Crabronid Wasps

    Get PDF
    Mitochondrial genes in animals are especially useful as molecular markers for the reconstruction of phylogenies among closely related taxa, due to the generally high substitution rates. Several insect orders, notably Hymenoptera and Phthiraptera, show exceptionally high rates of mitochondrial molecular evolution, which has been attributed to the parasitic lifestyle of current or ancestral members of these taxa. Parasitism has been hypothesized to entail frequent population bottlenecks that increase rates of molecular evolution by reducing the efficiency of purifying selection. This effect should result in elevated substitution rates of both nuclear and mitochondrial genes, but to date no extensive comparative study has tested this hypothesis in insects. Here we report the mitochondrial genome of a crabronid wasp, the European beewolf (Philanthus triangulum, Hymenoptera, Crabronidae), and we use it to compare evolutionary rates among the four largest holometabolous insect orders (Coleoptera, Diptera, Hymenoptera, Lepidoptera) based on phylogenies reconstructed with whole mitochondrial genomes as well as four single-copy nuclear genes (18S rRNA, arginine kinase, wingless, phosphoenolpyruvate carboxykinase). The mt-genome of P. triangulum is 16,029 bp in size with a mean A+T content of 83.6%, and it encodes the 37 genes typically found in arthropod mt genomes (13 protein-coding, 22 tRNA, and two rRNA genes). Five translocations of tRNA genes were discovered relative to the putative ancestral genome arrangement in insects, and the unusual start codon TTG was predicted for cox2. Phylogenetic analyses revealed significantly longer branches leading to the apocritan Hymenoptera as well as the Orussoidea, to a lesser extent the Cephoidea, and, possibly, the Tenthredinoidea than any of the other holometabolous insect orders for all mitochondrial but none of the four nuclear genes tested. Thus, our results suggest that the ancestral parasitic lifestyle of Apocrita is unlikely to be the major cause for the elevated substitution rates observed in hymenopteran mitochondrial genomes

    Origin and Evolution of TRIM Proteins: New Insights from the Complete TRIM Repertoire of Zebrafish and Pufferfish

    Get PDF
    Tripartite motif proteins (TRIM) constitute a large family of proteins containing a RING-Bbox-Coiled Coil motif followed by different C-terminal domains. Involved in ubiquitination, TRIM proteins participate in many cellular processes including antiviral immunity. The TRIM family is ancient and has been greatly diversified in vertebrates and especially in fish. We analyzed the complete sets of trim genes of the large zebrafish genome and of the compact pufferfish genome. Both contain three large multigene subsets - adding the hsl5/trim35-like genes (hltr) to the ftr and the btr that we previously described - all containing a B30.2 domain that evolved under positive selection. These subsets are conserved among teleosts. By contrast, most human trim genes of the other classes have only one or two orthologues in fish. Loss or gain of C-terminal exons generated proteins with different domain organizations; either by the deletion of the ancestral domain or, remarkably, by the acquisition of a new C-terminal domain. Our survey of fish trim genes in fish identifies subsets with different evolutionary dynamics. trims encoding RBCC-B30.2 proteins show the same evolutionary trends in fish and tetrapods: they evolve fast, often under positive selection, and they duplicate to create multigenic families. We could identify new combinations of domains, which epitomize how new trim classes appear by domain insertion or exon shuffling. Notably, we found that a cyclophilin-A domain replaces the B30.2 domain of a zebrafish fintrim gene, as reported in the macaque and owl monkey antiretroviral TRIM5α. Finally, trim genes encoding RBCC-B30.2 proteins are preferentially located in the vicinity of MHC or MHC gene paralogues, which suggests that such trim genes may have been part of the ancestral MHC

    Inferring selection in the Anopheles gambiae species complex: an example from immune-related serine protease inhibitors

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mosquitoes of the <it>Anopheles gambiae </it>species complex are the primary vectors of human malaria in sub-Saharan Africa. Many host genes have been shown to affect <it>Plasmodium </it>development in the mosquito, and so are expected to engage in an evolutionary arms race with the pathogen. However, there is little conclusive evidence that any of these mosquito genes evolve rapidly, or show other signatures of adaptive evolution.</p> <p>Methods</p> <p>Three serine protease inhibitors have previously been identified as candidate immune system genes mediating mosquito-Plasmodium interaction, and serine protease inhibitors have been identified as hot-spots of adaptive evolution in other taxa. Population-genetic tests for selection, including a recent multi-gene extension of the McDonald-Kreitman test, were applied to 16 serine protease inhibitors and 16 other genes sampled from the <it>An. gambiae </it>species complex in both East and West Africa.</p> <p>Results</p> <p>Serine protease inhibitors were found to show a marginally significant trend towards higher levels of amino acid diversity than other genes, and display extensive genetic structuring associated with the 2La chromosomal inversion. However, although serpins are candidate targets for strong parasite-mediated selection, no evidence was found for rapid adaptive evolution in these genes.</p> <p>Conclusion</p> <p>It is well known that phylogenetic and population history in the <it>An. gambiae </it>complex can present special problems for the application of standard population-genetic tests for selection, and this may explain the failure of this study to detect selection acting on serine protease inhibitors. The pitfalls of uncritically applying these tests in this species complex are highlighted, and the future prospects for detecting selection acting on the <it>An. gambiae </it>genome are discussed.</p
    corecore