115 research outputs found

    Mitogenome Phylogenetics: The Impact of Using Single Regions and Partitioning Schemes on Topology, Substitution Rate and Divergence Time Estimation

    Get PDF
    The availability of mitochondrial genome sequences is growing as a result of recent technological advances in molecular biology. In phylogenetic analyses, the complete mitogenome is increasingly becoming the marker of choice, usually providing better phylogenetic resolution and precision relative to traditional markers such as cytochrome b (CYTB) and the control region (CR). In some cases, the differences in phylogenetic estimates between mitogenomic and single-gene markers have yielded incongruent conclusions. By comparing phylogenetic estimates made from different genes, we identified the most informative mitochondrial regions and evaluated the minimum amount of data necessary to reproduce the same results as the mitogenome. We compared results among individual genes and the mitogenome for recently published complete mitogenome datasets of selected delphinids (Delphinidae) and killer whales (genus Orcinus). Using Bayesian phylogenetic methods, we investigated differences in estimation of topologies, divergence dates, and clock-like behavior among genes for both datasets. Although the most informative regions were not the same for each taxonomic group (COX1, CYTB, ND3 and ATP6 for Orcinus, and ND1, COX1 and ND4 for Delphinidae), in both cases they were equivalent to less than a quarter of the complete mitogenome. This suggests that gene information content can vary among groups, but can be adequately represented by a portion of the complete sequence. Although our results indicate that complete mitogenomes provide the highest phylogenetic resolution and most precise date estimates, a minimum amount of data can be selected using our approach when the complete sequence is unavailable. Studies based on single genes can benefit from the addition of a few more mitochondrial markers, producing topologies and date estimates similar to those obtained using the entire mitogenome

    SNPPar: identifying convergent evolution and other homoplasies from microbial whole-genome alignments

    Get PDF
    AbstractHomoplasic single nucleotide polymorphisms (SNPs) are considered important signatures of strong (positive) selective pressure, and hence of adaptive evolution for clinically relevant traits such as antibiotic resistance and virulence. Here we present a new tool, SNPPar, for efficient detection and analysis of homoplasic SNPs from large WGS datasets (&gt;1,000 isolates and/or &gt;100,000 SNPs). SNPPar takes as input a SNP alignment, tree and annotated reference genome, and uses a combination of simple monophyly tests and ancestral state reconstruction (ASR, via TreeTime) to assign mutation events to branches and identify homoplasies. Mutations are annotated at the level of codon and gene, to facilitate analysis of convergent evolution.Testing on simulated data (120Mycobacterium tuberculosisalignments representing local and global samples) showed SNPPar can detect homoplasic SNPs with very high sensitivity (zero false-positives in all tests) and high specificity (zero false-negatives in 89% of tests). SNPPar analysis of three empirically sampled datasets (E. anophelis, B. dolosaandM. tuberculosis) produced results that were in concordance with previous studies, in terms of both individual homoplasies and evidence of convergence at the codon and gene levels. SNPPar analysis of a simulated alignment of ∼64,000 genome-wide SNPs from 2000M. tuberculosisgenomes took ∼23 minutes and ∼2.6 GB of RAM to generate complete annotated results on a laptop. This analysis required ASR be conducted for only 1.25% of SNPs, and the ASR step took ∼23 seconds and 0.4 GB RAM.SNPPar automates the detection and annotation of homoplasic SNPs efficiently and accurately from large SNP alignments. As demonstrated by the examples included here, this information can be readily used to explore the role of homoplasy in parallel and/or convergent evolution at the level of nucleotide, codon and/or gene.Impact statementDNA sequences of bacterial pathogens are mutating all the time; most changes are deleterious or neutral, but sometimes a mutation leads to functional change that allows the pathogen to evade a potential threat. These random mutational changes (single nucleotide polymorphisms, or SNPs) are so very rarely beneficial, that when they do arise in parallel in distantly related isolates (known as homoplasic SNPs) this indicates that the change may be positively selected because it confers an adaptive advantage to the bacteria.Finding homoplasic SNPs in large sets of bacterial genomes is challenging as current tools require substantial time and computational resources to run. Here we present SNPPar, a software program to efficiently and accurately automate the detection and annotation of homoplasic SNPs from large whole-genome sequence data sets. We use simulated data to demonstrate accuracy of the program, and re-analyse published datasets using SNPPar to illustrate how the results can be used to gain insights into the evolution of antibiotic resistance and other traits.We envisage SNPPar will help facilitate the undertaking of long-term, real-time surveillance of bacterial pathogens, and their adaptive evolutionary response to interventions and control measures such as new drugs or vaccines.Data summaryThe authors confirm all supporting data, code and protocols have been provided within the article, through supplementary data files or other online sources as indicated in the article.New content generated for this paper is:SNPPar code is available fromhttps://github.com/d-j-e/SNPPar. The version described here is v1.0.A GitHub repository containing the full protocol, ‘in-house’ code and data used to carry out the validation and performance testing is available athttps://github.com/d-j-e/SNPPar_test. This repository includes all the simulated and real data sets used here.Data statementThe authors confirm all supporting data, code and protocols have been provided within the article, through supplementary data files or other online sources as indicated in the article.</jats:sec

    Cross-validation to select Bayesian hierarchical models in phylogenetics.

    Get PDF
    BACKGROUND: Recent developments in Bayesian phylogenetic models have increased the range of inferences that can be drawn from molecular sequence data. Accordingly, model selection has become an important component of phylogenetic analysis. Methods of model selection generally consider the likelihood of the data under the model in question. In the context of Bayesian phylogenetics, the most common approach involves estimating the marginal likelihood, which is typically done by integrating the likelihood across model parameters, weighted by the prior. Although this method is accurate, it is sensitive to the presence of improper priors. We explored an alternative approach based on cross-validation that is widely used in evolutionary analysis. This involves comparing models according to their predictive performance. RESULTS: We analysed simulated data and a range of viral and bacterial data sets using a cross-validation approach to compare a variety of molecular clock and demographic models. Our results show that cross-validation can be effective in distinguishing between strict- and relaxed-clock models and in identifying demographic models that allow growth in population size over time. In most of our empirical data analyses, the model selected using cross-validation was able to match that selected using marginal-likelihood estimation. The accuracy of cross-validation appears to improve with longer sequence data, particularly when distinguishing between relaxed-clock models. CONCLUSIONS: Cross-validation is a useful method for Bayesian phylogenetic model selection. This method can be readily implemented even when considering complex models where selecting an appropriate prior for all parameters may be difficult

    Data for Millennia of genomic stability within the invasive Para C Lineage of Salmonella enterica: date estimation 1

    Get PDF
    Salmonella enterica serovar Paratyphi C is the causative agent of enteric (paratyphoid) fever. While today a potentially lethal infection of humans that occurs in Africa and Asia, early 20th century observations in Eastern Europe suggest it may once have had a wider-ranging impact on human societies. We recovered a draft Paratyphi C genome from the 800-year-old skeleton of a young woman in Trondheim, Norway, who likely died of enteric fever. Analysis of this genome against a new, significantly expanded database of related modern genomes demonstrated that Paratyphi C is descended from the ancestors of swine pathogens, serovars Choleraesuis and Typhisuis, together forming the Para C Lineage. Our results indicate that Paratyphi C has been a pathogen of humans for at least 1,000 years, and may have evolved after zoonotic transfer from swine during the Neolithic period

    Turnip mosaic potyvirus probably first spread to Eurasian brassica crops from wild orchids about 1000 years ago

    Get PDF
    Turnip mosaic potyvirus (TuMV) is probably the most widespread and damaging virus that infects cultivated brassicas worldwide. Previous work has indicated that the virus originated in western Eurasia, with all of its closest relatives being viruses of monocotyledonous plants. Here we report that we have identified a sister lineage of TuMV-like potyviruses (TuMV-OM) from European orchids. The isolates of TuMV-OM form a monophyletic sister lineage to the brassica-infecting TuMVs (TuMV-BIs), and are nested within a clade of monocotyledon-infecting viruses. Extensive host-range tests showed that all of the TuMV-OMs are biologically similar to, but distinct from, TuMV-BIs and do not readily infect brassicas. We conclude that it is more likely that TuMV evolved from a TuMV-OM-like ancestor than the reverse. We did Bayesian coalescent analyses using a combination of novel and published sequence data from four TuMV genes [helper component-proteinase protein (HC-Pro), protein 3(P3), nuclear inclusion b protein (NIb), and coat protein (CP)]. Three genes (HC-Pro, P3, and NIb), but not the CP gene, gave results indicating that the TuMV-BI viruses diverged from TuMV-OMs around 1000 years ago. Only 150 years later, the four lineages of the present global population of TuMV-BIs diverged from one another. These dates are congruent with historical records of the spread of agriculture in Western Europe. From about 1200 years ago, there was a warming of the climate, and agriculture and the human population of the region greatly increased. Farming replaced woodlands, fostering viruses and aphid vectors that could invade the crops, which included several brassica cultivars and weeds. Later, starting 500 years ago, inter-continental maritime trade probably spread the TuMV-BIs to the remainder of the world

    Atlas of group A streptococcal vaccine candidates compiled using large-scale comparative genomics.

    Get PDF
    Group A Streptococcus (GAS; Streptococcus pyogenes) is a bacterial pathogen for which a commercial vaccine for humans is not available. Employing the advantages of high-throughput DNA sequencing technology to vaccine design, we have analyzed 2,083 globally sampled GAS genomes. The global GAS population structure reveals extensive genomic heterogeneity driven by homologous recombination and overlaid with high levels of accessory gene plasticity. We identified the existence of more than 290 clinically associated genomic phylogroups across 22 countries, highlighting challenges in designing vaccines of global utility. To determine vaccine candidate coverage, we investigated all of the previously described GAS candidate antigens for gene carriage and gene sequence heterogeneity. Only 15 of 28 vaccine antigen candidates were found to have both low naturally occurring sequence variation and high (>99%) coverage across this diverse GAS population. This technological platform for vaccine coverage determination is equally applicable to prospective GAS vaccine antigens identified in future studies

    Pan-genome Analysis of Ancient and Modern Salmonella enterica Demonstrates Genomic Stability of the Invasive Para C Lineage for Millennia.

    Get PDF
    Salmonella enterica serovar Paratyphi C causes enteric (paratyphoid) fever in humans. Its presentation can range from asymptomatic infections of the blood stream to gastrointestinal or urinary tract infection or even a fatal septicemia [1]. Paratyphi C is very rare in Europe and North America except for occasional travelers from South and East Asia or Africa, where the disease is more common [2, 3]. However, early 20th-century observations in Eastern Europe [3, 4] suggest that Paratyphi C enteric fever may once have had a wide-ranging impact on human societies. Here, we describe a draft Paratyphi C genome (Ragna) recovered from the 800-year-old skeleton (SK152) of a young woman in Trondheim, Norway. Paratyphi C sequences were recovered from her teeth and bones, suggesting that she died of enteric fever and demonstrating that these bacteria have long caused invasive salmonellosis in Europeans. Comparative analyses against modern Salmonella genome sequences revealed that Paratyphi C is a clade within the Para C lineage, which also includes serovars Choleraesuis, Typhisuis, and Lomita. Although Paratyphi C only infects humans, Choleraesuis causes septicemia in pigs and boar [5] (and occasionally humans), and Typhisuis causes epidemic swine salmonellosis (chronic paratyphoid) in domestic pigs [2, 3]. These different host specificities likely evolved in Europe over the last ∼4,000 years since the time of their most recent common ancestor (tMRCA) and are possibly associated with the differential acquisitions of two genomic islands, SPI-6 and SPI-7. The tMRCAs of these bacterial clades coincide with the timing of pig domestication in Europe [6]

    BEAST 2.5:An advanced software platform for Bayesian evolutionary analysis

    Get PDF
    Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release

    The first wave of the COVID-19 epidemic in Spain was associated with early introductions and fast spread of a dominating genetic variant

    Get PDF
    SeqCOVID-Spain consortium: Álvaro Chiner-Oms, Irving Cancino-Muñoz, Mariana G. López, Manuela Torres-Puente, Inmaculada Gómez-Navarro, Santiago Jiménez-Serrano, Jordi Pérez-Tur, Darío García de Viedma, Laura Pérez-Lago, Marta Herranz, Jon Sicilia, Pilar Catalán-Alonso, Julia Suárez González, Patricia Muñoz, Mireia Coscolla, Paula Ruiz-Rodríguez, Fernando González-Candelas, Iñaki Comas, Lidia Ruiz-Roldán, María Alma Bracho, Neris García-González, Llúcia Martínez Priego, Inmaculada Galán-Vendrell, Paula Ruiz-Hueso, Griselda De Marco, María Loreto Ferrús-Abad, Sandra Carbó-Ramírez, Giuseppe D’Auria, Galo Adrian Goig, Juan Alberola, Jose Miguel Nogueira, Juan José Camarena, David Navarro, Eliseo Albert, Ignacio Torres, Maitane Aranzamendi Zaldumbide, Óscar Martínez Expósito, Nerea Antona Urieta, María de Toro, María Pilar Bea-Escudero, Jose Antonio Boga, Cristian Castelló-Abietar, Susana Rojo-Alba, Marta Elena Álvarez-Argüelles, Santiago Melón, Elisa Martró, Antoni E. Bordoy, Anna Not, Adrián Antuori, Anabel Fernández-Navarro, Andrés Canut-Blasco, Silvia Hernáez Crespo, Maria Luz Cordón Rodríguez, Maria Concepción Lecaroz Agara, Carmen Gómez-González, Amaia Aguirre-Quiñonero, José Israel López-Mirones, Marina Fernández-Torres, Maria Rosario Almela-Ferrer, Ana Carvajal, Juan Miguel Fregeneda-Grandes, Héctor Argüello, Gustavo Cilla Eguiluz, Milagrosa Montes Ros, Luis Piñeiro Vázquez, Ane Sorarrain, José María Marimón, José J. Costa-Alcalde, Rocío Trastoy, Gema Barbeito Castiñeiras, Amparo Coira, María Luisa Pérez del Molino, Antonio Aguilera, Begoña Palop-Borrás, Inmaculada de Toro Peinado, Maria Concepción Mediavilla Gradolph, Mercedes Pérez-Ruiz, Mirian Fernández-Alonso, Jose Luis del Pozo, Oscar González-Recio, Mónica Gutiérrez-Rivas, Jovita Fernández-Pinero, Miguel Ángel Jiménez Clavero, Begoña Fuster Escrivá, Concepción Gimeno Cardona, María Dolores Ocete Mochón, Rafael Medina-Gonzalez, José Antonio Lepe, Verónica González Galán, Ángel Rodríguez-Villodres, Nieves Gonzalo Jiménez, Jordi Reina, Carla López-Causapé, Maria Dolores Gómez-Ruiz, Eva M. Gonzalez-Barbera, José Luis López-Hontangas, Vicente Martín, Antonio J. Molina, Tania Fernandez-Villa, Ana Milagro Beamonte, Nieves Felisa Martínez-Cameo, Yolanda Gracia-Grataloup, Rosario Moreno-Muñoz, Maria Dolores Tirado Balaguer, José María Navarro-Marí, Irene Pedrosa-Corral, Sara Sanbonmatsu-Gámez, Antonio Oliver, Mónica Parra Grande, Bárbara Gómez Alonso, Francisco José Arjona Zaragozí, Maria Carmen Pérez González, Francisco Javier Chamizo López, Ana Bordes-Benítez, Núria Rabella, Ferran Navarro, Elisenda Miró, Antonio Rezusta, Alexander Tristancho, Encarnación Simarro Córdoba, Julia Lozano-Serra, Lorena Robles Fonseca, Álex Soriano, Francisco Javier Roig Sena, Hermelinda Vanaclocha Luna, Isabel Sanmartín, Daniel García-Souto, Ana Pequeño-Valtierra, Jose M. C. Tubio, Javier Temes, Jorge Rodríguez-Castro, Martín Santamarina García, Manuel Rodríguez-Iglesias, Fátima Galán-Sanchez, Salud Rodríguez-Pallares, José Manuel Azcona-Gutiérrez, Miriam Blasco-Alberdi, Alfredo Mayor, Alberto L. García-Basteiro, Gemma Moncunill, Carlota Dobaño, Pau Cisteró, Oriol Mitjà, Camila González-Beiras, Martí Vall-Mayans, Marc Corbacho-Monné, Andrea Alemany, Cristina Muñoz-Cuevas, Guadalupe Rodríguez-Rodríguez, Rafael Benito, Sonia Algarate, Jessica Bueno, Andrea Vergara-Gómez, Miguel J. Martínez, Jordi Vila, Elisa Rubio, Aida Peiró-Mestres, Jessica Navero-Castillejos, David Posada, Diana Valverde, Nuria Estévez, Iria Fernández-Silva, Loretta de Chiara, Pilar Gallego-García, Nair Varela, Ulises Gómez-Pinedo, Mónica Gozalo-Margüello, Maria Eliecer Cano García, José Manuel Méndez-Legaza, Jesus Rodríguez-Lozano, María Siller, Daniel Pablo-Marcos, Maria Montserrat Ruiz-García, Antonio Galiana, Judith Sánchez-Almendro, Maria Isabel Gascón Ros, Cristina Juana Torregrosa-Hetland, Eva María Pastor Boix, Paloma Cascales Ramos, Pedro Luis Garcinuño Enríquez, Salvador Raga Borja, Julia González Cantó, Olalla Martínez Macias, Adolfo de Salazar, Laura Viñuela González, Natalia Chueca, Federico García, Cristina Gómez-Camarasa, Amparo Farga Martí, Rocío Falcón, Victoria Domínguez-Márquez, Anna M. Planas, Israel Fernández-Cádenas, Maria Ángeles Marcos, Carmen Ezpeleta, Ana Navascués, Ana Miqueleiz Zapatero, Manuel Segovia, Antonio Moreno-Docón, Esther Viedma, Raúl Recio Martínez, Irene Muñoz-Gallego, Sara Gonzalez-Bodi, Maria Dolores Folgueira, Jesús Mingorance, Elias Dahdouh, Fernando Lázaro-Perona, María Rodríguez-Tejedor, María Pilar Romero-Gómez, Julio García-Rodríguez, Juan Carlos Galán, Mario Rodríguez-Dominguez, Laura Martínez-García, Melanie Abreu Di Berardino, Manuel Ponce-Alonso, Jose Maria González-Alba, Ivan Sanz-Muñoz, Diana Pérez San José, Maria Gil Fortuño, Juan B. Bellido-Blasco, Alberto Yagüe Muñoz, Noelia Hernández Pérez, Helena Buj Jordá, Óscar Pérez Olaso, Alejandro González Praetorius, Nora Mariela Martínez Ramírez, Aida Ramírez Marinero, Eduardo Padilla León, Alba Vilas Basil, Mireia Canal Aranda, Albert Bernet Sánchez, Alba Bellés Bellés, Eric López González, Iván Prats Sánchez, Mercè García-González, Miguel José Martínez-Lirola, Manuel Ángel Rodríguez Maresca, Maria Teresa Cabezas Fernández, María Eugenia Carrillo Gil, Maria Paz Ventero Martín, Carmen Molina Pardines, Nieves Orta Mira, María Navarro Cots, Inmaculada Vidal Catalá, Isabel García Nava, Soledad Illescas Fernández-Bermejo, José Martínez-Alarcón, Marta Torres-Narbona, Cristina Colmenarejo, Lidia García-Agudo, Jorge A. Pérez García, Martín Yago López, María Ángeles Goberna Bravo, Victoria Simón García, Gonzalo Llop Furquet, Agustín Iranzo Tatay, Sandra Moreno-Marro, Noelia Lozano Rodríguez, Amparo Broseta Tamarit, Juan José Badiola Díez, Amparo Martínez-Ramírez, Ana Dopazo, Sergio Callejas, Alberto Benguría, Begoña Aguado, Antonio Alcamí, Marta Bermejo Bermejo, Ricardo Ramos-Ruíz, Víctor Manuel Fernández Soria, Fernando Simón Soria & Mercedes Roig CardellsThe coronavirus disease 2019 (COVID-19) pandemic has affected the world radically since 2020. Spain was one of the European countries with the highest incidence during the first wave. As a part of a consortium to monitor and study the evolution of the epidemic, we sequenced 2,170 samples, diagnosed mostly before lockdown measures. Here, we identified at least 500 introductions from multiple international sources and documented the early rise of two dominant Spanish epidemic clades (SECs), probably amplified by superspreading events. Both SECs were related closely to the initial Asian variants of SARS-CoV-2 and spread widely across Spain. We inferred a substantial reduction in the effective reproductive number of both SECs due to public-health interventions (Re < 1), also reflected in the replacement of SECs by a new variant over the summer of 2020. In summary, we reveal a notable difference in the initial genetic makeup of SARS-CoV-2 in Spain compared with other European countries and show evidence to support the effectiveness of lockdown measures in controlling virus spread, even for the most successful genetic variants.This work was mainly funded by the Instituto de Salud Carlos III project COV20/00140, with additional funding by Spanish National Research Council project CSIC-COV19-021, Ministerio de Ciencia project PID2019-104477RB-100, ERC StG 638553 and ERC CoG 101001038 to I.C., and BFU2017-89594R to F.G.C. M.C. is supported by Ramón y Cajal program from Ministerio de Ciencia and grants RTI2018-094399-A-I00 and Generalitat Valenciana (Regional Government) project SEJI/2019/011. We gratefully acknowledge Hospital Universitari Vall d’Hebron, Instituto de Salud Carlos III, IrsiCaixa AIDS Research Lab and all the international researchers and institutions that submitted sequenced SARS-CoV-2 genomes to the GISAID’s EpiCov Database (Supplementary Table 1), as an important part of our analyses has been made possible by the sharing of their work. We also thank Unidad de Bioinformática y Estadística, Centro de Investigación Príncipe Felipe, for allowing us to use the Computer Cluster to perform some of the bioinformatic analysis.Peer reviewe
    corecore