176 research outputs found

    Phylogenetic estimates of HIV-1 gp120 indel rates across the group M subtypes

    Get PDF
    Insertions and deletions (indels) in the HIV-1 envelope glycoprotein gp120 play a significant role in the evolution of HIV pathogenesis and transmission fitness. While substitution rates in HIV-1 are well characterized by phylogenetic models, there is a lack of quantitative measures of indel rates in HIV-1. Here we use a dated-tip phylogenetic analysis of gp120 sequences to estimate indel rates for 7 subtypes and CRFs of HIV-1 group M. We obtained and processed 26,359 HIV-1 gp120 sequences from the Los Alamos National Laboratory HIV Sequence database. After filtering these sequences, we extracted the conserved and variable regions from the remaining 6,605 sequences by pairwise alignment. We used FastTree2 to reconstruct phylogenies from the alignment of concatenated conserved regions, and used least-squares dating (LSD) to rescale these trees in time. We estimated variable region indel rates by fitting a binomial-Poisson model to length discordance in sequences related by cherries. Indel rate estimates ranged from 3e-5 to 1.5e-3/nt/year and varied significantly among variable regions and subtypes; e.g., rates were significantly lower for subtype B. Variable regions V1, V2 and V4 accumulated significantly longer indels irrespective of subtype, and we found evidence of positive selection for indels affecting N-linked glycosylation sites in V1/V2. Further, we observed that indel sequences were enriched for G and depleted for T relative to the flanking sequences. Our results comprise the first phylogenetic measures of indel rates in HIV-1 gp120 across subtypes and variable regions, and identify novel and unexpected patterns for further investigation into HIV-1 evolution

    An Application of the Modifiable Areal Unit Problem: Optimizing Cluster Method Parameters to Produce Predictive Data for HIV Outbreaks

    Get PDF
    Background A popular approach to study HIV outbreaks is to cluster cases based on genetic similarity. However, there is no widely-used statistical criterion which optimizes the parameters for sequence-based clustering methods. The relationship between a cluster-defining similarity threshold and it’s associated set of clusters can be analogized to the aggregation level in the Modifiable Areal Unit Problem (MAUP). Hypothesis Based on the selection of aggregation level for study partitions in MAUP, we present a statistical framework to optimize the similarity threshold for pairwise distance algorithm TN93 (http://github.com/veg/tn93). We hypothesize that defining this threshold includes case connections such that the most predictive clusters are defined for the purposes of public health. Methods We obtained 1,653 published HIV-1 pol sequences from Seattle, USA. The sequences were aligned using MAFFT and coupled with sampling dates from Genbank. Years ranged from 2000 to 2013, with 2013 cases reflecting cluster growth. TN93 obtained pairwise distances between sequences and an R script interpreted these distances as an annotated, undirected network, annotated. Edges between cases were included in this network based on cutoff d, which was modulated from 0 to 0.06 in steps of 0.001. Based on a Poisson-linked linear model with the cluster growth outcome predicted by cluster size, we calculated the Generalized Akaike Information Criterion (GAIC) for networks at each value of d. Results GAIC was minimized at d = 0.036; notably larger than values often used in literature. Common Values in literature fall within maximum deviance peaks

    Using Amino Acid Correlation and Community Detection Algorithms to Identify Functional Determinants in Protein Families

    Get PDF
    Correlated mutation analysis has a long history of interesting applications, mostly in the detection of contact pairs in protein structures. Based on previous observations that, if properly assessed, amino acid correlation data can also provide insights about functional sub-classes in a protein family, we provide a complete framework devoted to this purpose. An amino acid specific correlation measure is proposed, which can be used to build networks summarizing all correlation and anti-correlation patterns in a protein family. These networks can be submitted to community structure detection algorithms, resulting in subsets of correlated amino acids which can be further assessed by specific parameters and procedures that provide insight into the relationship between different communities, the individual importance of community members and the adherence of a given amino acid sequence to a given community. By applying this framework to three protein families with contrasting characteristics (the Fe/Mn-superoxide dismutases, the peroxidase-catalase family and the C-type lysozyme/α-lactalbumin family), we show how our method and the proposed parameters and procedures are related to biological characteristics observed in these protein families, highlighting their potential use in protein characterization and gene annotation

    Atlantic Cod Piscidin and Its Diversification through Positive Selection

    Get PDF
    Piscidins constitute a family of cationic antimicrobial peptides that are thought to play an important role in the innate immune response of teleosts. On the one hand they show a remarkable diversity, which indicates that they are shaped by positive selection, but on the other hand they are ancient and have specific targets, suggesting that they are constrained by purifying selection. Until now piscidins had only been found in fish species from the superorder Acanthopterygii but we have recently identified a piscidin gene in Atlantic cod (Gadus morhua), thus showing that these antimicrobial peptides are not restricted to evolutionarily modern teleosts. Nucleotide diversity was much higher in the regions of the piscidin gene that code for the mature peptide and its pro domain than in the signal peptide. Maximum likelihood analyses with different evolution models revealed that the piscidin gene is under positive selection. Charge or hydrophobicity-changing amino acid substitutions observed in positively selected sites within the mature peptide influence its amphipathic structure and can have a marked effect on its function. This diversification might be associated with adaptation to new habitats or rapidly evolving pathogens

    A novel codon insert in protease of clade B HIV type 1.

    Get PDF
    A novel combination of three codon inserts in the pol coding region of HIV-1 RNA was identified in a highly antiretroviral experienced study subject with HIV-1 infection. A one codon insert was observed in the protease region between codon 40 and 41 simultaneously with a two codon insert present in the reverse transcriptase region at codon 69

    Evolutionary Interactions between N-Linked Glycosylation Sites in the HIV-1 Envelope

    Get PDF
    The addition of asparagine (N)-linked polysaccharide chains (i.e., glycans) to the gp120 and gp41 glycoproteins of human immunodeficiency virus type 1 (HIV-1) envelope is not only required for correct protein folding, but also may provide protection against neutralizing antibodies as a “glycan shield.” As a result, strong host-specific selection is frequently associated with codon positions where nonsynonymous substitutions can create or disrupt potential N-linked glycosylation sites (PNGSs). Moreover, empirical data suggest that the individual contribution of PNGSs to the neutralization sensitivity or infectivity of HIV-1 may be critically dependent on the presence or absence of other PNGSs in the envelope sequence. Here we evaluate how glycan–glycan interactions have shaped the evolution of HIV-1 envelope sequences by analyzing the distribution of PNGSs in a large-sequence alignment. Using a “covarion”-type phylogenetic model, we find that the rates at which individual PNGSs are gained or lost vary significantly over time, suggesting that the selective advantage of having a PNGS may depend on the presence or absence of other PNGSs in the sequence. Consequently, we identify specific interactions between PNGSs in the alignment using a new paired-character phylogenetic model of evolution, and a Bayesian graphical model. Despite the fundamental differences between these two methods, several interactions are jointly identified by both. Mapping these interactions onto a structural model of HIV-1 gp120 reveals that negative (exclusive) interactions occur significantly more often between colocalized glycans, while positive (inclusive) interactions are restricted to more distant glycans. Our results imply that the adaptive repertoire of alternative configurations in the HIV-1 glycan shield is limited by functional interactions between the N-linked glycans. This represents a potential vulnerability of rapidly evolving HIV-1 populations that may provide useful glycan-based targets for neutralizing antibodies

    Complete-Proteome Mapping of Human Influenza A Adaptive Mutations: Implications for Human Transmissibility of Zoonotic Strains

    Get PDF
    BACKGROUND: There is widespread concern that H5N1 avian influenza A viruses will emerge as a pandemic threat, if they become capable of human-to-human (H2H) transmission. Avian strains lack this capability, which suggests that it requires important adaptive mutations. We performed a large-scale comparative analysis of proteins from avian and human strains, to produce a catalogue of mutations associated with H2H transmissibility, and to detect their presence in avian isolates. METHODOLOGY/PRINCIPAL FINDINGS: We constructed a dataset of influenza A protein sequences from 92,343 public database records. Human and avian sequence subsets were compared, using a method based on mutual information, to identify characteristic sites where human isolates present conserved mutations. The resulting catalogue comprises 68 characteristic sites in eight internal proteins. Subtype variability prevented the identification of adaptive mutations in the hemagglutinin and neuraminidase proteins. The high number of sites in the ribonucleoprotein complex suggests interdependence between mutations in multiple proteins. Characteristic sites are often clustered within known functional regions, suggesting their functional roles in cellular processes. By isolating and concatenating characteristic site residues, we defined adaptation signatures, which summarize the adaptive potential of specific isolates. Most adaptive mutations emerged within three decades after the 1918 pandemic, and have remained remarkably stable thereafter. Two lineages with stable internal protein constellations have circulated among humans without reassorting. On the contrary, H5N1 avian and swine viruses reassort frequently, causing both gains and losses of adaptive mutations. CONCLUSIONS: Human host adaptation appears to be complex and systemic, involving nearly all influenza proteins. Adaptation signatures suggest that the ability of H5N1 strains to infect humans is related to the presence of an unusually high number of adaptive mutations. However, these mutations appear unstable, suggesting low pandemic potential of H5N1 in its current form. In addition, adaptation signatures indicate that pandemic H1N1/09 strain possesses multiple human-transmissibility mutations, though not an unusually high number with respect to swine strains that infected humans in the past. Adaptation signatures provide a novel tool for identifying zoonotic strains with the potential to infect humans

    Transmitted Drug Resistance in the CFAR Network of Integrated Clinical Systems Cohort: Prevalence and Effects on Pre-Therapy CD4 and Viral Load

    Get PDF
    Human immunodeficiency virus type 1 (HIV-1) genomes often carry one or more mutations associated with drug resistance upon transmission into a therapy-naïve individual. We assessed the prevalence and clinical significance of transmitted drug resistance (TDR) in chronically-infected therapy-naïve patients enrolled in a multi-center cohort in North America. Pre-therapy clinical significance was quantified by plasma viral load (pVL) and CD4+ cell count (CD4) at baseline. Naïve bulk sequences of HIV-1 protease and reverse transcriptase (RT) were screened for resistance mutations as defined by the World Health Organization surveillance list. The overall prevalence of TDR was 14.2%. We used a Bayesian network to identify co-transmission of TDR mutations in clusters associated with specific drugs or drug classes. Aggregate effects of mutations by drug class were estimated by fitting linear models of pVL and CD4 on weighted sums over TDR mutations according to the Stanford HIV Database algorithm. Transmitted resistance to both classes of reverse transcriptase inhibitors was significantly associated with lower CD4, but had opposing effects on pVL. In contrast, position-specific analyses of TDR mutations revealed substantial effects on CD4 and pVL at several residue positions that were being masked in the aggregate analyses, and significant interaction effects as well. Residue positions in RT with predominant effects on CD4 or pVL (D67 and M184) were re-evaluated in causal models using an inverse probability-weighting scheme to address the problem of confounding by other mutations and demographic or risk factors. We found that causal effect estimates of mutations M184V/I ( pVL) and D67N/G ( and pVL) were compensated by K103N/S and K219Q/E/N/R. As TDR becomes an increasing dilemma in this modern era of highly-active antiretroviral therapy, these results have immediate significance for the clinical management of HIV-1 infections and our understanding of the ongoing adaptation of HIV-1 to human populations

    Immune-driven recombination and loss of control after HIV superinfection

    Get PDF
    After acute HIV infection, CD8+ T cells are able to control viral replication to a set point. This control is often lost after superinfection, although the mechanism behind this remains unclear. In this study, we illustrate in an HLA-B27+ subject that loss of viral control after HIV superinfection coincides with rapid recombination events within two narrow regions of Gag and Env. Screening for CD8+ T cell responses revealed that each of these recombination sites (∼50 aa) encompassed distinct regions containing two immunodominant CD8 epitopes (B27-KK10 in Gag and Cw1-CL9 in Env). Viral escape and the subsequent development of variant-specific de novo CD8+ T cell responses against both epitopes were illustrative of the significant immune selection pressures exerted by both responses. Comprehensive analysis of the kinetics of CD8 responses and viral evolution indicated that the recombination events quickly facilitated viral escape from both dominant WT- and variant-specific responses. These data suggest that the ability of a superinfecting strain of HIV to overcome preexisting immune control may be related to its ability to rapidly recombine in critical regions under immune selection pressure. These data also support a role for cellular immune pressures in driving the selection of new recombinant forms of HIV
    corecore