178 research outputs found

    Efficient Inference of Recent and Ancestral Recombination within Bacterial Populations

    Get PDF
    Prokaryotic evolution is affected by horizontal transfer of genetic material through recombination. Inference of an evolutionary tree of bacteria thus relies on accurate identification of the population genetic structure and recombination-derived mosaicism. Rapidly growing databases represent a challenge for computational methods to detect recombinations in bacterial genomes. We introduce a novel algorithm called fastGEAR which identifies lineages in diverse microbial alignments, and recombinations between them and from external origins. The algorithm detects both recent recombinations (affecting a few isolates) and ancestral recombinations between detected lineages (affecting entire lineages), thus providing insight into recombinations affecting deep branches of the phylogenetic tree. In simulations, fastGEAR had comparable power to detect recent recombinations and outstanding power to detect the ancestral ones, compared with state-of-the-art methods, often with a fraction of computational cost. We demonstrate the utility of the method by analyzing a collection of 616 whole-genomes of a recombinogenic pathogen Streptococcus pneumoniae, for which the method provided a high-resolution view of recombination across the genome. We examined in detail the penicillin-binding genes across the Streptococcus genus, demonstrating previously undetected genetic exchanges between different species at these three loci. Hence, fastGEAR can be readily applied to investigate mosaicism in bacterial genes across multiple species. Finally, fastGEAR correctly identified many known recombination hotspots and pointed to potential new ones. Matlab code and Linux/Windows executables are available at https://users.ics.aalto.fi/similar to pemartti/fastGEAR/ (last accessed February 6, 2017).Peer reviewe

    Bayesian modeling of recombination events in bacterial populations

    Get PDF
    Background: We consider the discovery of recombinant segments jointly with their origins within multilocus DNA sequences from bacteria representing heterogeneous populations of fairly closely related species. The currently available methods for recombination detection capable of probabilistic characterization of uncertainty have a limited applicability in practice as the number of strains in a data set increases. Results: We introduce a Bayesian spatial structural model representing the continuum of origins over sites within the observed sequences, including a probabilistic characterization of uncertainty related to the origin of any particular site. To enable a statistically accurate and practically feasible approach to the analysis of large-scale data sets representing a single genus, we have developed a novel software tool (BRAT, Bayesian Recombination Tracker) implementing the model and the corresponding learning algorithm, which is capable of identifying the posterior optimal structure and to estimate the marginal posterior probabilities of putative origins over the sites. Conclusion: A multitude of challenging simulation scenarios and an analysis of real data from seven housekeeping genes of 120 strains of genus Burkholderia are used to illustrate the possibilities offered by our approach. The software is freely available for download at URL http://web.abo.fi/fak/ mnf//mate/jc/software/brat.html

    The impact of host metapopulation structure on the population genetics of colonizing bacteria

    Get PDF
    Many key bacterial pathogens are frequently carried asymptomatically, and the emergence and spread of these opportunistic pathogens can be driven, or mitigated, via demographic changes within the host population. These inter-host transmission dynamics combine with basic evolutionary parameters such as rates of mutation and recombination, population size and selection, to shape the genetic diversity within bacterial populations. Whilst many studies have focused on how molecular processes underpin bacterial population structure, the impact of host migration and the connectivity of the local populations has received far less attention. A stochastic neutral model incorporating heightened local transmission has been previously shown to fit closely with genetic data for several bacterial species. However, this model did not incorporate transmission limiting population stratification, nor the possibility of migration of strains between subpopulations, which we address here by presenting an extended model. We study the consequences of migration in terms of shared genetic variation and show by simulation that the previously used summary statistic, the allelic mismatch distribution, can be insensitive to even large changes in microepidemic and migration rates. Using likelihood-free inference with genotype network topological summaries we fit a simpler model to commensal and hospital samples from the common nosocomial pathogens Staphylococcus aureus, Staphylococcus epidermidis, Enterococcus faecalis and Enterococcus faecium. Only the hospital data for E. faecium display clearly marked deviations from the model predictions which may be attributable to its adaptation to the hospital environment

    Plasmids Shaped the Recent Emergence of the Major Nosocomial Pathogen Enterococcus faecium

    Get PDF
    Enterococcus faecium is a gut commensal of humans and animals but is also listed on the WHO global priority list of multidrug-resistant pathogens. Many of its antibiotic resistance traits reside on plasmids and have the potential to be disseminated by horizontal gene transfer. Here, we present the first comprehensive population-wide analysis of the pan-plasmidome of a clinically important bacterium, by whole-genome sequence analysis of 1,644 isolates from hospital, commensal, and animal sources of E. faecium. Long-read sequencing on a selection of isolates resulted in the completion of 305 plasmids that exhibited high levels of sequence modularity. We further investigated the entirety of all plasmids of each isolate (plasmidome) using a combination of short-read sequencing and machine-learning classifiers. Clustering of the plasmid sequences unraveled different E. faecium populations with a clear association with hospitalized patient isolates, suggesting different optimal configurations of plasmids in the hospital environment. The characterization of these populations allowed us to identify common mechanisms of plasmid stabilization such as toxin-antitoxin systems and genes exclusively present in particular plasmidome populations exemplified by copper resistance, phosphotransferase systems, or bacteriocin genes potentially involved in niche adaptation. Based on the distribution of k-mer distances between isolates, we concluded that plasmidomes rather than chromosomes are most informative for source specificity of E. faecium. IMPORTANCE Enterococcus faecium is one of the most frequent nosocomial pathogens of hospital-acquired infections. E. faecium has gained resistance against most commonly available antibiotics, most notably, against ampicillin, gentamicin, and vancomycin, which renders infections difficult to treat. Many antibiotic resistance traits, in particular, vancomycin resistance, can be encoded in autonomous and extrachromosomal elements called plasmids. These sequences can be disseminated to other isolates by horizontal gene transfer and confer novel mechanisms to source specificity. In our study, we elucidated the total plasmid content, referred to as the plasmidome, of 1,644 E. faecium isolates by using short- and long-read whole-genome technologies with the combination of a machine-learning classifier. This was fundamental to investigate the full collection of plasmid sequences present in our collection (pan-plasmidome) and to observe the potential transfer of plasmid sequences between E. faecium hosts. We observed that E. faecium isolates from hospitalized patients carried a larger number of plasmid sequences compared to that from other sources, and they elucidated different configurations of plasmidome populations in the hospital environment. We assessed the contribution of different genomic components and observed that plasmid sequences have the highest contribution to source specificity. Our study suggests that E. faecium plasmids are regulated by complex ecological constraints rather than physical interaction between hosts.Peer reviewe

    Population gene introgression and high genome plasticity for the zoonotic pathogen Streptococcus agalactiae

    Get PDF
    The influence that bacterial adaptation (or niche partitioning) within species has on gene spillover and transmission among bacteria populations occupying different niches is not well understood. Streptococcus agalactiae is an important bacterial pathogen that has a taxonomically diverse host range making it an excellent model system to study these processes. Here we analyze a global set of 901 genome sequences from nine diverse host species to advance our understanding of these processes. Bayesian clustering analysis delineated twelve major populations that closely aligned with niches. Comparative genomics revealed extensive gene gain/loss among populations and a large pan-genome of 9,527 genes, which remained open and was strongly partitioned among niches. As a result, the biochemical characteristics of eleven populations were highly distinctive (significantly enriched). Positive selection was detected and biochemical characteristics of the dispensable genes under selection were enriched in ten populations. Despite the strong gene partitioning, phylogenomics detected gene spillover. In particular, tetracycline resistance (which likely evolved in the human-associated population) from humans to bovine, canines, seals, and fish, demonstrating how a gene selected in one host can ultimately be transmitted into another, and biased transmission from humans to bovines was confirmed with a Bayesian migration analysis. Our findings show high bacterial genome plasticity acting in balance with selection pressure from distinct functional requirements of niches that is associated with an extensive and highly partitioned dispensable genome, likely facilitating continued and expansive adaptation

    International genomic definition of pneumococcal lineages, to contextualise disease, antibiotic resistance and vaccine impact

    Get PDF
    Background: Pneumococcal conjugate vaccines have reduced the incidence of invasive pneumococcal disease, caused by vaccine serotypes, but non-vaccine-serotypes remain a concern. We used whole genome sequencing to study pneumococcal serotype, antibiotic resistance and invasiveness, in the context of genetic background. / Methods: Our dataset of 13,454 genomes, combined with four published genomic datasets, represented Africa (40%), Asia (25%), Europe (19%), North America (12%), and South America (5%). These 20,027 pneumococcal genomes were clustered into lineages using PopPUNK, and named Global Pneumococcal Sequence Clusters (GPSCs). From our dataset, we additionally derived serotype and sequence type, and predicted antibiotic sensitivity. We then measured invasiveness using odds ratios that relating prevalence in invasive pneumococcal disease to carriage. / Findings: The combined collections (n = 20,027) were clustered into 621 GPSCs. Thirty-five GPSCs observed in our dataset were represented by >100 isolates, and subsequently classed as dominant-GPSCs. In 22/35 (63%) of dominant-GPSCs both non-vaccine serotypes and vaccine serotypes were observed in the years up until, and including, the first year of pneumococcal conjugate vaccine introduction. Penicillin and multidrug resistance were higher (p < .05) in a subset dominant-GPSCs (14/35, 9/35 respectively), and resistance to an increasing number of antibiotic classes was associated with increased recombination (R2 = 0.27 p < .0001). In 28/35 dominant-GPSCs, the country of isolation was a significant predictor (p < .05) of its antibiogram (mean misclassification error 0.28, SD ± 0.13). We detected increased invasiveness of six genetic backgrounds, when compared to other genetic backgrounds expressing the same serotype. Up to 1.6-fold changes in invasiveness odds ratio were observed. / Interpretation: We define GPSCs that can be assigned to any pneumococcal genomic dataset, to aid international comparisons. Existing non-vaccine-serotypes in most GPSCs preclude the removal of these lineages by pneumococcal conjugate vaccines; leaving potential for serotype replacement. A subset of GPSCs have increased resistance, and/or serotype-independent invasiveness

    Ensemble approach to predict specificity determinants: benchmarking and validation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is extremely important and challenging to identify the sites that are responsible for functional specification or diversification in protein families. In this study, a rigorous comparative benchmarking protocol was employed to provide a reliable evaluation of methods which predict the specificity determining sites. Subsequently, three best performing methods were applied to identify new potential specificity determining sites through ensemble approach and common agreement of their prediction results.</p> <p>Results</p> <p>It was shown that the analysis of structural characteristics of predicted specificity determining sites might provide the means to validate their prediction accuracy. For example, we found that for smaller distances it holds true that the more reliable the prediction method is, the closer predicted specificity determining sites are to each other and to the ligand.</p> <p>Conclusion</p> <p>We observed certain similarities of structural features between predicted and actual subsites which might point to their functional relevance. We speculate that majority of the identified potential specificity determining sites might be indirectly involved in specific interactions and could be ideal target for mutagenesis experiments.</p
    corecore