314 research outputs found

    Somatic genome architecture and molecular evolution are decoupled in "young" linage-specific gene families in ciliates.

    Get PDF
    The evolution of lineage-specific gene families remains poorly studied across the eukaryotic tree of life, with most analyses focusing on the recent evolution of de novo genes in model species. Here we explore the origins of lineage-specific genes in ciliates, a ~1 billion year old clade of microeukaryotes that are defined by their division of somatic and germline functions into distinct nuclei. Previous analyses on conserved gene families have shown the effect of ciliates' unusual genome architecture on gene family evolution: extensive genome processing-the generation of thousands of gene-sized somatic chromosomes from canonical germline chromosomes-is associated with larger and more diverse gene families. To further study the relationship between ciliate genome architecture and gene family evolution, we analyzed lineage specific gene families from a set of 46 transcriptomes and 12 genomes representing x species from eight ciliate classes. We assess how the evolution lineage-specific gene families occurs among four groups of ciliates: extensive fragmenters with gene-size somatic chromosomes, non-extensive fragmenters with "large'' multi-gene somatic chromosomes, Heterotrichea with highly polyploid somatic genomes and Karyorelictea with 'paradiploid' somatic genomes. Our analyses demonstrate that: 1) most lineage-specific gene families are found at shallow taxonomic scales; 2) extensive genome processing (i.e., gene unscrambling) during development likely influences the size and number of young lineage-specific gene families; and 3) the influence of somatic genome architecture on molecular evolution is increasingly apparent in older gene families. Altogether, these data highlight the influences of genome architecture on the evolution of lineage-specific gene families in eukaryotes

    Assigning and visualizing germline genes in antibody repertoires.

    Get PDF
    Identifying the germline genes involved in immunoglobulin rearrangements is an essential first step in the analysis of antibody repertoires. Based on our prior work in analysing diverse recombinant viruses, we present IgSCUEAL (Immunoglobulin Subtype Classification Using Evolutionary ALgorithms), a phylogenetic approach to assign V and J regions of immunoglobulin sequences to their corresponding germline alleles, with D regions assigned using a simple pairwise alignment algorithm. We also develop an interactive web application for viewing the results, allowing the user to explore the frequency distribution of sequence assignments and CDR3 region length statistics, which is useful for summarizing repertoires, as well as a detailed viewer of rearrangements and region alignments for individual query sequences. We demonstrate the accuracy and utility of our method compared with sequence similarity-based approaches and other non-phylogenetic model-based approaches, using both simulated data and a set of evaluation datasets of human immunoglobulin heavy chain sequences. IgSCUEAL demonstrates the highest accuracy of V and J assignment amongst existing approaches, even when the reassorted sequence is highly mutated, and can successfully cluster sequences on the basis of shared V/J germline alleles.S.K.L.P. and B.M. were supported in part by the U.S. National Institutes of Health (AI110181, AI90970, AI100665, DA34978, GM93939, HL108460, GM110749, LM7092, MH97520, MH83552), the UCSD Center for AIDS Research (Developmental Grant, AI36214, Bioinformatics and Information Technologies Core), the International AIDS Vaccine Initiative (through AI90970), the UC Laboratory Fees Research Program (grant no. 12-LR-236617). G.J.S. was supported in part the U.S. National Institute of Health (AI90118, AI68063, AI40305, and NIAID HHS N272201400019C), and a grant from the Lupus Research Institute. A.S.M.M.H. was supported by an Islamic Development Bank Scholarship, and S.D.W.F. was supported in part by the UK MRC Methodology Research Programme (grant no. MR/J013862/1).This is the final published version. It first appeared at http://rstb.royalsocietypublishing.org/content/370/1676/20140240

    Evolutionary Interactions between N-Linked Glycosylation Sites in the HIV-1 Envelope

    Get PDF
    The addition of asparagine (N)-linked polysaccharide chains (i.e., glycans) to the gp120 and gp41 glycoproteins of human immunodeficiency virus type 1 (HIV-1) envelope is not only required for correct protein folding, but also may provide protection against neutralizing antibodies as a “glycan shield.” As a result, strong host-specific selection is frequently associated with codon positions where nonsynonymous substitutions can create or disrupt potential N-linked glycosylation sites (PNGSs). Moreover, empirical data suggest that the individual contribution of PNGSs to the neutralization sensitivity or infectivity of HIV-1 may be critically dependent on the presence or absence of other PNGSs in the envelope sequence. Here we evaluate how glycan–glycan interactions have shaped the evolution of HIV-1 envelope sequences by analyzing the distribution of PNGSs in a large-sequence alignment. Using a “covarion”-type phylogenetic model, we find that the rates at which individual PNGSs are gained or lost vary significantly over time, suggesting that the selective advantage of having a PNGS may depend on the presence or absence of other PNGSs in the sequence. Consequently, we identify specific interactions between PNGSs in the alignment using a new paired-character phylogenetic model of evolution, and a Bayesian graphical model. Despite the fundamental differences between these two methods, several interactions are jointly identified by both. Mapping these interactions onto a structural model of HIV-1 gp120 reveals that negative (exclusive) interactions occur significantly more often between colocalized glycans, while positive (inclusive) interactions are restricted to more distant glycans. Our results imply that the adaptive repertoire of alternative configurations in the HIV-1 glycan shield is limited by functional interactions between the N-linked glycans. This represents a potential vulnerability of rapidly evolving HIV-1 populations that may provide useful glycan-based targets for neutralizing antibodies

    An Evolutionary-Network Model Reveals Stratified Interactions in the V3 Loop of the HIV-1 Envelope

    Get PDF
    The third variable loop (V3) of the human immunodeficiency virus type 1 (HIV-1) envelope is a principal determinant of antibody neutralization and progression to AIDS. Although it is undoubtedly an important target for vaccine research, extensive genetic variation in V3 remains an obstacle to the development of an effective vaccine. Comparative methods that exploit the abundance of sequence data can detect interactions between residues of rapidly evolving proteins such as the HIV-1 envelope, revealing biological constraints on their variability. However, previous studies have relied implicitly on two biologically unrealistic assumptions: (1) that founder effects in the evolutionary history of the sequences can be ignored, and; (2) that statistical associations between residues occur exclusively in pairs. We show that comparative methods that neglect the evolutionary history of extant sequences are susceptible to a high rate of false positives (20%–40%). Therefore, we propose a new method to detect interactions that relaxes both of these assumptions. First, we reconstruct the evolutionary history of extant sequences by maximum likelihood, shifting focus from extant sequence variation to the underlying substitution events. Second, we analyze the joint distribution of substitution events among positions in the sequence as a Bayesian graphical model, in which each branch in the phylogeny is a unit of observation. We perform extensive validation of our models using both simulations and a control case of known interactions in HIV-1 protease, and apply this method to detect interactions within V3 from a sample of 1,154 HIV-1 envelope sequences. Our method greatly reduces the number of false positives due to founder effects, while capturing several higher-order interactions among V3 residues. By mapping these interactions to a structural model of the V3 loop, we find that the loop is stratified into distinct evolutionary clusters. We extend our model to detect interactions between the V3 and C4 domains of the HIV-1 envelope, and account for the uncertainty in mapping substitutions to the tree with a parametric bootstrap

    Modeling HIV-1 Drug Resistance as Episodic Directional Selection

    Get PDF
    The evolution of substitutions conferring drug resistance to HIV-1 is both episodic, occurring when patients are on antiretroviral therapy, and strongly directional, with site-specific resistant residues increasing in frequency over time. While methods exist to detect episodic diversifying selection and continuous directional selection, no evolutionary model combining these two properties has been proposed. We present two models of episodic directional selection (MEDS and EDEPS) which allow the a priori specification of lineages expected to have undergone directional selection. The models infer the sites and target residues that were likely subject to directional selection, using either codon or protein sequences. Compared to its null model of episodic diversifying selection, MEDS provides a superior fit to most sites known to be involved in drug resistance, and neither one test for episodic diversifying selection nor another for constant directional selection are able to detect as many true positives as MEDS and EDEPS while maintaining acceptable levels of false positives. This suggests that episodic directional selection is a better description of the process driving the evolution of drug resistance

    CodonTest: Modeling Amino Acid Substitution Preferences in Coding Sequences

    Get PDF
    Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA) method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes

    The biological and clinical significance of emerging SARS-CoV-2 variants.

    Get PDF
    The past several months have witnessed the emergence of SARS-CoV-2 variants with novel spike protein mutations that are influencing the epidemiological and clinical aspects of the COVID-19 pandemic. These variants can increase rates of virus transmission and/or increase the risk of reinfection and reduce the protection afforded by neutralizing monoclonal antibodies and vaccination. These variants can therefore enable SARS-CoV-2 to continue its spread in the face of rising population immunity while maintaining or increasing its replication fitness. The identification of four rapidly expanding virus lineages since December 2020, designated variants of concern, has ushered in a new stage of the pandemic. The four variants of concern, the Alpha variant (originally identified in the UK), the Beta variant (originally identified in South Africa), the Gamma variant (originally identified in Brazil) and the Delta variant (originally identified in India), share several mutations with one another as well as with an increasing number of other recently identified SARS-CoV-2 variants. Collectively, these SARS-CoV-2 variants complicate the COVID-19 research agenda and necessitate additional avenues of laboratory, epidemiological and clinical research

    HIV-Specific Probabilistic Models of Protein Evolution

    Get PDF
    Comparative sequence analyses, including such fundamental bioinformatics techniques as similarity searching, sequence alignment and phylogenetic inference, have become a mainstay for researchers studying type 1 Human Immunodeficiency Virus (HIV-1) genome structure and evolution. Implicit in comparative analyses is an underlying model of evolution, and the chosen model can significantly affect the results. In general, evolutionary models describe the probabilities of replacing one amino acid character with another over a period of time. Most widely used evolutionary models for protein sequences have been derived from curated alignments of hundreds of proteins, usually based on mammalian genomes. It is unclear to what extent these empirical models are generalizable to a very different organism, such as HIV-1–the most extensively sequenced organism in existence. We developed a maximum likelihood model fitting procedure to a collection of HIV-1 alignments sampled from different viral genes, and inferred two empirical substitution models, suitable for describing between-and within-host evolution. Our procedure pools the information from multiple sequence alignments, and provided software implementation can be run efficiently in parallel on a computer cluster. We describe how the inferred substitution models can be used to generate scoring matrices suitable for alignment and similarity searches. Our models had a consistently superior fit relative to the best existing models and to parameter-rich data-driven models when benchmarked on independent HIV-1 alignments, demonstrating evolutionary biases in amino-acid substitution that are unique to HIV, and that are not captured by the existing models. The scoring matrices derived from the models showed a marked difference from common amino-acid scoring matrices. The use of an appropriate evolutionary model recovered a known viral transmission history, whereas a poorly chosen model introduced phylogenetic error. We argue that our model derivation procedure is immediately applicable to other organisms with extensive sequence data available, such as Hepatitis C and Influenza A viruses

    Adaptation to Different Human Populations by HIV-1 Revealed by Codon-Based Analyses

    Get PDF
    Several codon-based methods are available for detecting adaptive evolution in protein-coding sequences, but to date none specifically identify sites that are selected differentially in two populations, although such comparisons between populations have been historically useful in identifying the action of natural selection. We have developed two fixed effects maximum likelihood methods: one for identifying codon positions showing selection patterns that persist in a population and another for detecting whether selection is operating differentially on individual codons of a gene sampled from two different populations. Applying these methods to two HIV populations infecting genetically distinct human hosts, we have found that few of the positively selected amino acid sites persist in the population; the other changes are detected only at the tips of the phylogenetic tree and appear deleterious in the long term. Additionally, we have identified seven amino acid sites in protease and reverse transcriptase that are selected differentially in the two samples, demonstrating specific population-level adaptation of HIV to human populations

    Synonymous Substitution Rates Predict HIV Disease Progression as a Result of Underlying Replication Dynamics

    Get PDF
    Upon HIV transmission, some patients develop AIDS in only a few months, while others remain disease free for 20 or more years. This variation in the rate of disease progression is poorly understood and has been attributed to host genetics, host immune responses, co-infection, viral genetics, and adaptation. Here, we develop a new “relaxed-clock” phylogenetic method to estimate absolute rates of synonymous and nonsynonymous substitution through time. We identify an unexpected association between the synonymous substitution rate of HIV and disease progression parameters. Since immune activation is the major determinant of HIV disease progression, we propose that this process can also determine viral generation times, by creating favourable conditions for HIV replication. These conclusions may apply more generally to HIV evolution, since we also observed an overall low synonymous substitution rate for HIV-2, which is known to be less pathogenic than HIV-1 and capable of tempering the detrimental effects of immune activation. Humoral immune responses, on the other hand, are the major determinant of nonsynonymous rate changes through time in the envelope gene, and our relaxed-clock estimates support a decrease in selective pressure as a consequence of immune system collapse
    corecore