35 research outputs found
Mutation Rates and Selection on Synonymous Mutations in SARS-CoV-2
The COVID-19 pandemic has seen an unprecedented response from the sequencing community. Leveraging the sequence data from more than 140,000 SARS-CoV-2 genomes, we study mutation rates and selective pressures affecting the virus. Understanding the processes and effects of mutation and selection has profound implications for the study of viral evolution, for vaccine design, and for the tracking of viral spread. We highlight and address some common genome sequence analysis pitfalls that can lead to inaccurate inference of mutation rates and selection, such as ignoring skews in the genetic code, not accounting for recurrent mutations, and assuming evolutionary equilibrium. We find that two particular mutation rates, G →U and C →U, are similarly elevated and considerably higher than all other mutation rates, causing the majority of mutations in the SARS-CoV-2 genome, and are possibly the result of APOBEC and ROS activity. These mutations also tend to occur many times at the same genome positions along the global SARS-CoV-2 phylogeny (i.e., they are very homoplasic). We observe an effect of genomic context on mutation rates, but the effect of the context is overall limited. Although previous studies have suggested selection acting to decrease U content at synonymous sites, we bring forward evidence suggesting the opposite.N.G., C.W., and N.D.M. were supported by the European
Molecular Biology Laboratory (EMBL). R.C.-D. was supported
by R35GM128932 and by an Alfred P. Sloan foundation fellowship. R.L. was funded by Australian Research Council
grant DP200103151, and by a Chan-Zuckerberg Initiative
grant. We are very grateful to GISAID and all the groups
who shared their sequencing data
A lung-specific mutational signature enables inference of viral and bacterial respiratory niche
Exposure to different mutagens leaves distinct mutational patterns that can allow inference of pathogen replication niches. We therefore investigated whether SARS-CoV-2 mutational spectra might show lineage-specific differences, dependent on the dominant site(s) of replication and onwards transmission, and could therefore rapidly infer virulence of emergent variants of concern (VOCs). Through mutational spectrum analysis, we found a significant reduction in G>T mutations in the Omicron variant, which replicates in the upper respiratory tract (URT), compared to other lineages, which replicate in both the URT and lower respiratory tract (LRT). Mutational analysis of other viruses and bacteria indicates a robust, generalizable association of high G>T mutations with replication within the LRT. Monitoring G>T mutation rates over time, we found early separation of Omicron from Beta, Gamma and Delta, while mutational patterns in Alpha varied consistent with changes in transmission source as social restrictions were lifted. Mutational spectra may be a powerful tool to infer niches of established and emergent pathogens.Fil: Ruis, Christopher. University of Cambridge; Estados UnidosFil: Peacock, Thomas P.. Imperial College London; Reino UnidoFil: Polo Ilacqua, Luis Mariano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza. Instituto de Histología y Embriología de Mendoza Dr. Mario H. Burgos. Universidad Nacional de Cuyo. Facultad de Ciencias Médicas. Instituto de Histología y Embriología de Mendoza Dr. Mario H. Burgos; ArgentinaFil: Masone, Diego Fernando. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza. Instituto de Histología y Embriología de Mendoza Dr. Mario H. Burgos. Universidad Nacional de Cuyo. Facultad de Ciencias Médicas. Instituto de Histología y Embriología de Mendoza Dr. Mario H. Burgos; ArgentinaFil: Alvarez, Maria Soledad. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza. Instituto de Medicina y Biología Experimental de Cuyo; ArgentinaFil: Hinrichs, Angie S.. University Of California At Santa Cruz.; Estados UnidosFil: Turakhia, Yatish. University of California at San Diego; Estados UnidosFil: Cheng, Ye. University of California at San Diego; Estados UnidosFil: McBroome, Jakob. University Of California At Santa Cruz.; Estados UnidosFil: Corbett Detig, Russell. University Of California At Santa Cruz.; Estados UnidosFil: Parkhill, Julian. University of Cambridge; Reino UnidoFil: Floto, R. Andres. University of Cambridge; Reino Unid
Stability of SARS-CoV-2 phylogenies.
Funder: Alfred P. Sloan Foundation; funder-id: http://dx.doi.org/10.13039/100000879Funder: European Molecular Biology Laboratory (EMBL)The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab-or protocol-specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein-coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation or recombination among viral lineages. We suggest how samples can be screened and problematic variants removed, and we plan to regularly inform the scientific community with our updated results as more SARS-CoV-2 genome sequences are shared (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 and https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). We also develop tools for comparing and visualizing differences among very large phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse
Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic
As the SARS-CoV-2 virus spreads through human populations, the unprecedented accumulation of viral genome sequences is ushering in a new era of ‘genomic contact tracing’—that is, using viral genomes to trace local transmission dynamics. However, because the viral phylogeny is already so large—and will undoubtedly grow many fold—placing new sequences onto the tree has emerged as a barrier to real-time genomic contact tracing. Here, we resolve this challenge by building an efficient tree-based data structure encoding the inferred evolutionary history of the virus. We demonstrate that our approach greatly improves the speed of phylogenetic placement of new samples and data visualization, making it possible to complete the placements under the constraints of real-time contact tracing. Thus, our method addresses an important need for maintaining a fully updated reference phylogeny. We make these tools available to the research community through the University of California Santa Cruz SARS-CoV-2 Genome Browser to enable rapid cross-referencing of information in new virus sequences with an ever-expanding array of molecular and structural biology data. The methods described here will empower research and genomic contact tracing for SARS-CoV-2 specifically for laboratories worldwide
Recommended from our members
Tracking and curating putative SARS-CoV-2 recombinants with RIVET.
MOTIVATION: Identifying and tracking recombinant strains of SARS-CoV-2 is critical to understanding the evolution of the virus and controlling its spread. But confidently identifying SARS-CoV-2 recombinants from thousands of new genome sequences that are being shared online every day is quite challenging, causing many recombinants to be missed or suffer from weeks of delay in being formally identified while undergoing expert curation. RESULTS: We present RIVET-a software pipeline and visual platform that takes advantage of recent algorithmic advances in recombination inference to comprehensively and sensitively search for potential SARS-CoV-2 recombinants and organize the relevant information in a web interface that would help greatly accelerate the process of identifying and tracking recombinants. AVAILABILITY AND IMPLEMENTATION: RIVET-based web interface displaying the most updated analysis of potential SARS-CoV-2 recombinants is available at https://rivet.ucsd.edu/. RIVETs frontend and backend code is freely available under the MIT license at https://github.com/TurakhiaLab/rivet and the documentation for RIVET is available at https://turakhialab.github.io/rivet/. The inputs necessary for running RIVETs backend workflow for SARS-CoV-2 are available through a public database maintained and updated daily by UCSC (https://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/)