766 research outputs found

    A phylogenetic method for detecting positive epistasis in gene sequences and its application to RNA virus evolution

    Get PDF
    RNA virus genomes are compact, often containing multiple overlapping reading frames and functional secondary structure. Consequently, it is thought that evolutionary interactions between nucleotide sites are commonplace in the genomes of these infectious agents. However, the role of epistasis in natural populations of RNA viruses remains unclear. To investigate the pervasiveness of epistasis in RNA viruses, we used a parsimony-based computational method to identify pairs of co-occurring mutations along phylogenies of 177 RNA virus genes. This analysis revealed widespread evidence for positive epistatic interactions at both synonymous and nonsynonymous nucleotide sites and in both clonal and recombining viruses, with the majority of these interactions spanning very short sequence regions. These findings have important implications for understanding the key aspects of RNA virus evolution, including the dynamics of adaptation. Additionally, many comparative analyses that utilize the phylogenetic relationships among gene sequences assume that mutations represent independent, uncorrelated events. Our results show that this assumption may often be invalid.</p

    Phylogenetic surveillance of viral genetic diversity and the evolving molecular epidemiology of human immunodeficiency virus type 1

    Get PDF
    With ongoing generation of viral genetic diversity and increasing levels of migration, the global human immunodeficiency virus type 1 (HIV-1) epidemic is becoming increasingly heterogeneous. In this study, we investigate the epidemiological characteristics of 5,675 HIV-1 pol gene sequences sampled from distinct infections in the United Kingdom. These sequences were phylogenetically analyzed in conjunction with 976 complete-genome and 3,201 pol gene reference sequences sampled globally and representing the broad range of HIV-1 genetic diversity, allowing us to estimate the probable geographic origins of the various strains present in the United Kingdom. A statistical analysis of phylogenetic clustering in this data set identified several independent transmission chains within the United Kingdom involving recently introduced strains and indicated that strains more commonly associated with infections acquired heterosexually in East Africa are spreading among men who have sex with men. Coalescent approaches were also used and indicated that the transmission chains that we identify originated in the late 1980s to early 1990s. Similar changes in the epidemiological structuring of HIV epidemics are likely to be taking in place in other industrialized nations with large immigrant populations. The framework implemented here takes advantage of the vast amount of routinely generated HIV-1 sequence data and can provide epidemiological insights not readily obtainable through standard surveillance methods.</p

    Disease-associated XMRV sequences are consistent with laboratory contamination

    Get PDF
    BACKGROUND: Xenotropic murine leukaemia viruses (MLV-X) are endogenous gammaretroviruses that infect cells from many species, including humans. Xenotropic murine leukaemia virus-related virus (XMRV) is a retrovirus that has been the subject of intense debate since its detection in samples from humans with prostate cancer (PC) and chronic fatigue syndrome (CFS). Controversy has arisen from the failure of some studies to detect XMRV in PC or CFS patients and from inconsistent detection of XMRV in healthy controls. RESULTS: Here we demonstrate that Taqman PCR primers previously described as XMRV-specific can amplify common murine endogenous viral sequences from mouse suggesting that mouse DNA can contaminate patient samples and confound specific XMRV detection. To consider the provenance of XMRV we sequenced XMRV from the cell line 22Rv1, which is infected with an MLV-X that is indistinguishable from patient derived XMRV. Bayesian phylogenies clearly show that XMRV sequences reportedly derived from unlinked patients form a monophyletic clade with interspersed 22Rv1 clones (posterior probability >0.99). The cell line-derived sequences are ancestral to the patient-derived sequences (posterior probability >0.99). Furthermore, pol sequences apparently amplified from PC patient material (VP29 and VP184) are recombinants of XMRV and Moloney MLV (MoMLV) a virus with an envelope that lacks tropism for human cells. Considering the diversity of XMRV we show that the mean pairwise genetic distance among env and pol 22Rv1-derived sequences exceeds that of patient-associated sequences (Wilcoxon rank sum test: p = 0.005 and p < 0.001 for pol and env, respectively). Thus XMRV sequences acquire diversity in a cell line but not in patient samples. These observations are difficult to reconcile with the hypothesis that published XMRV sequences are related by a process of infectious transmission. CONCLUSIONS: We provide several independent lines of evidence that XMRV detected by sensitive PCR methods in patient samples is the likely result of PCR contamination with mouse DNA and that the described clones of XMRV arose from the tumour cell line 22Rv1, which was probably infected with XMRV during xenografting in mice. We propose that XMRV might not be a genuine human pathogen

    Viral phylogeny in court: the unusual case of the Valencian anesthetist

    Get PDF
    A large and complex outbreak of hepatitis C virus in Valencia, Spain that began 25 years ago led to the prosecution and conviction of an anesthetist who was accused of infecting hundreds of his patients. Evolutionary analyses of viral gene sequences were presented as evidence in the trial, and these are now described in detail by González-Candelas and colleagues in a paper published in BMC Biology. Their study illustrates the challenges and opportunities that arise from the use of phylogenetic inference in criminal trials concerning virus transmission

    Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen)

    Get PDF
    Gene sequences sampled at different points in time can be used to infer molecular phylogenies on a natural timescale of months or years, provided that the sequences in question undergo measurable amounts of evolutionary change between sampling times. Data sets with this property are termed heterochronous and have become increasingly common in several fields of biology, most notably the molecular epidemiology of rapidly evolving viruses. Here we introduce the cross-platform software tool, TempEst (formerly known as Path-O-Gen), for the visualization and analysis of temporally sampled sequence data. Given a molecular phylogeny and the dates of sampling for each sequence, TempEst uses an interactive regression approach to explore the association between genetic divergence through time and sampling dates. TempEst can be used to (1) assess whether there is sufficient temporal signal in the data to proceed with phylogenetic molecular clock analysis, and (2) identify sequences whose genetic divergence and sampling date are incongruent. Examination of the latter can help identify data quality problems, including errors in data annotation, sample contamination, sequence recombination, or alignment error. We recommend that all users of the molecular clock models implemented in BEAST first check their data using TempEst prior to analysis

    Model Selection and the Molecular Clock

    Get PDF
    A brief overview of the methods used to determine phylogenetic distances sets the stage for understanding new research published in PLoS Biology

    Pango lineage designation and assignment using SARS-CoV-2 spike gene nucleotide sequences

    Get PDF
    BACKGROUND: More than 2 million SARS-CoV-2 genome sequences have been generated and shared since the start of the COVID-19 pandemic and constitute a vital information source that informs outbreak control, disease surveillance, and public health policy. The Pango dynamic nomenclature is a popular system for classifying and naming genetically-distinct lineages of SARS-CoV-2, including variants of concern, and is based on the analysis of complete or near-complete virus genomes. However, for several reasons, nucleotide sequences may be generated that cover only the spike gene of SARS-CoV-2. It is therefore important to understand how much information about Pango lineage status is contained in spike-only nucleotide sequences. Here we explore how Pango lineages might be reliably designated and assigned to spike-only nucleotide sequences. We survey the genetic diversity of such sequences, and investigate the information they contain about Pango lineage status. RESULTS: Although many lineages, including the main variants of concern, can be identified clearly using spike-only sequences, some spike-only sequences are shared among tens or hundreds of Pango lineages. To facilitate the classification of SARS-CoV-2 lineages using subgenomic sequences we introduce the notion of designating such sequences to a “lineage set”, which represents the range of Pango lineages that are consistent with the observed mutations in a given spike sequence. CONCLUSIONS: We find that many lineages, including the main variants-of-concern, can be reliably identified by spike alone and we define lineage-sets to represent the lineage precision that can be achieved using spike-only nucleotide sequences. These data provide a foundation for the development of software tools that can assign newly-generated spike nucleotide sequences to Pango lineage sets. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-022-08358-2
    corecore