144 research outputs found

    Patchy promiscuity:machine learning applied to predict the host specificity of <i>Salmonella enterica </i>and <i>Escherichia coli</i>

    Get PDF
    Supporting data for Patchy promiscuity: machine learning applied to predict the host specificity of Salmonella enterica and Escherichia coli, as published in <em>Microbial Genomics</em

    Comparison of Shiga toxin-encoding bacteriophages in highly pathogenic strains of Shiga toxin-producing Escherichia coli O157:H7 in the UK

    Get PDF
    Over the last 35 years in the UK, the burden of Shiga toxin-producing Escherichia coli (STEC) O157:H7 infection has, during different periods of time, been associated with five different sub-lineages (1983-1995, Ia, I/IIa and I/IIb; 1996-2014, Ic; and 2015-2018, IIb). The acquisition of a stx2a-encoding bacteriophage by these five sub-lineages appears to have coincided with their respective emergences. The Oxford Nanopore Technologies (ONT) system was used to sequence, characterize and compare the stx-encoding prophages harboured by each sub-lineage to investigate the integration of this key virulence factor. The stx2a-encoding prophages from each of the lineages causing clinical disease in the UK were all different, including the two UK sub-lineages (Ia and I/IIa) circulating concurrently and causing severe disease in the early 1980s. Comparisons between the stx2a-encoding prophage in sub-lineages I/IIb and IIb revealed similarity to the prophage commonly found to encode stx2c, and the same site of bacteriophage integration (sbcB) as stx2c-encoding prophage. These data suggest independent acquisition of previously unobserved stx2a-encoding phage is more likely to have contributed to the emergence of STEC O157:H7 sub-lineages in the UK than intra-UK lineage to lineage phage transmission. In contrast, the stx2c-encoding prophage showed a high level of similarity across lineages and time, consistent with the model of stx2c being present in the common ancestor to extant STEC O157:H7 and maintained by vertical inheritance in the majority of the population. Studying the nature of the stx-encoding bacteriophage contributes to our understanding of the emergence of highly pathogenic strains of STEC O157:H7

    The advantage of intergenic regions as genomic features for machine-learning-based host attribution of Salmonella Typhimurium from the USA

    Get PDF
    Salmonella enterica is a taxonomically diverse pathogen with over 2600 serovars associated with a wide variety of animal hosts including humans, other mammals, birds and reptiles. Some serovars are host-specific or host-restricted and cause disease in distinct host species, while others, such as serovar S. Typhimurium (STm), are generalists and have the potential to colonize a wide variety of species. However, even within generalist serovars such as STm it is becoming clear that pathovariants exist that differ in tropism and virulence. Identifying the genetic factors underlying host specificity is complex, but the availability of thousands of genome sequences and advances in machine learning have made it possible to build specific host prediction models to aid outbreak control and predict the human pathogenic potential of isolates from animals and other reservoirs. We have advanced this area by building host-association prediction models trained on a wide range of genomic features and compared them with predictions based on nearest-neighbour phylogeny. SNPs, protein variants (PVs), antimicrobial resistance (AMR) profiles and intergenic regions (IGRs) were extracted from 3883 high-quality STm assemblies collected from humans, swine, bovine and poultry in the USA, and used to construct Random Forest (RF) machine learning models. An additional 244 recent STm assemblies from farm animals were used as a test set for further validation. The models based on PVs and IGRs had the best performance in terms of predicting the host of origin of isolates and outperformed nearest-neighbour phylogenetic host prediction as well as models based on SNPs or AMR data. However, the models did not yield reliable predictions when tested with isolates that were phylogenetically distinct from the training set. The IGR and PV models were often able to differentiate human isolates in clusters where the majority of isolates were from a single animal source. Notably, IGRs were the feature with the best performance across multiple models which may be due to IGRs acting as both a representation of their flanking genes, equivalent to PVs, while also capturing genomic regulatory variation, such as altered promoter regions. The IGR and PV models predict that ~45 % of the human infections with STm in the USA originate from bovine, ~40 % from poultry and ~14.5 % from swine, although sequences of isolates from other sources were not used for training. In summary, the research demonstrates a significant gain in accuracy for models with IGRs and PVs as features compared to SNP-based and core genome phylogeny predictions when applied within the existing population structure. This article contains data hosted by Microreact

    Re-analysis of an outbreak of Shiga toxin-producing Escherichia coli O157:H7 associated with raw drinking milk using Nanopore sequencing

    Get PDF
    The aim of this study was to compare Illumina and Oxford Nanopore Technology (ONT) sequencing data to quantify genetic variation to assess within-outbreak strain relatedness and characterise microevolutionary events in the accessory genomes of a cluster of 23 genetically and epidemiologically linked isolates related to an outbreak of Shiga toxin-producing Escherichia coli O157:H7 caused by the consumption of raw drinking milk. There were seven discrepant variants called between the two technologies, five were false-negative or false-positive variants in the Illumina data and two were false-negative calls in ONT data. After masking horizontally acquired sequences such as prophages, analysis of both short and long-read sequences revealed the 20 isolates linked to the outbreak in 2017 had a maximum SNP distance of one SNP between each other, and a maximum of five SNPs when including three additional strains identified in 2019. Analysis of the ONT data revealed a 47 kbp deletion event in a terminal compound prophage within one sample relative to the remaining samples, and a 0.65 Mbp large chromosomal rearrangement (inversion), within one sample relative to the remaining samples. Furthermore, we detected two bacteriophages encoding the highly pathogenic Shiga toxin (Stx) subtype, Stx2a. One was typical of Stx2a-phage in this sub-lineage (Ic), the other was atypical and inserted into a site usually occupied by Stx2c-encoding phage. Finally, we observed an increase in the size of the pO157 IncFIB plasmid (1.6 kbp) in isolates from 2019 compared to those from 2017, due to the duplication of insertion elements within the plasmids from the more recently isolated strains. The ability to characterize the accessory genome in this way is the first step to understanding the significance of these microevolutionary events and their impact on the genome plasticity and virulence between strains of this zoonotic, foodborne pathogen.</p

    Evolution of a zoonotic pathogen:investigating prophage diversity in enterohaemorrhagic Escherichia coli O157 by long-read sequencing

    Get PDF
    Enterohaemorrhagic Escherichia coli (EHEC) O157 is a zoonotic pathogen for which colonization of cattle and virulence in humans is associated with multiple horizontally acquired genes, the majority present in active or cryptic prophages. Our understanding of the evolution and phylogeny of EHEC O157 continues to develop primarily based on core genome analyses; however, such short-read sequences have limited value for the analysis of prophage content and its chromosomal location. In this study, we applied Single Molecule Real Time (SMRT) sequencing, using the Pacific Biosciences long-read sequencing platform, to isolates selected from the main sub-clusters of this clonal group. Prophage regions were extracted from these sequences and from published reference strains. Genome position and prophage diversity were analysed along with genetic content. Prophages could be assigned to clusters, with smaller prophages generally exhibiting less diversity and preferential loss of structural genes. Prophages encoding Shiga toxin (Stx) 2a and Stx1a were the most diverse, and more variable compared to prophages encoding Stx2c, further supporting the hypothesis that Stx2c-prophage integration was ancestral to acquisition of other Stx types. The concept that phage type (PT) 21/28 (Stx2a+, Stx2c+) strains evolved from PT32 (Stx2c+) was supported by analysis of strains with excised Stx-encoding prophages. Insertion sequence elements were over-represented in prophage sequences compared to the rest of the genome, showing integration in key genes such as stx and an excisionase, the latter potentially acting to capture the bacteriophage into the genome. Prophage profiling should allow more accurate prediction of the pathogenic potential of isolates

    Dataset of Escherichia coli O157 : H7 genes enriched in adherence to spinach root tissue

    Get PDF
    A high-throughput positive-selection approach was taken to generate a dataset of Shigatoxigenic Escherichia coli (STEC) O157:H7 genes enriched in adherence to plant tissue. The approach generates a differential dataset based on BAC clones enriched in the output, after adherence, compared to the inoculum used as the input. A BAC clone library derived from STEC isolate 'Sakai' was used since this isolate is associated with a very large-scale outbreak of human disease from consumption of contaminated fresh produce; white radish sprouts. Spinach was used for the screen since it is associated with STEC outbreaks, and the roots provide a suitable site for bacterial colonisation. Four successive of rounds of Sakai BAC clone selection and amplification were applied for spinach root adherence, in parallel to a non-plant control. Genomic DNA was obtained from a total of 7.17 x 108 cfu/ml of bacteria from the plant treatment and 1.13 x 109 cfu/ml of bacteria from the no-plant control. Relative gene abundance of the output compared to the input pools was obtained using an established E. coli DNA microarray chip for STEC. The dataset enables screening for genes enriched under the treatment condition and informs on genes that may play a role in plant-microbe interactions

    Identification of bacteriophage-encoded anti-sRNAs in pathogenic escherichia coli

    Get PDF
    In bacteria, Hfq is a core RNA chaperone that catalyzes the interaction of mRNAs with regulatory small RNAs (sRNAs). To determine in vivo RNA sequence requirements for Hfq interactions, and to study riboregulation in a bacterial pathogen, Hfq was UV crosslinked to RNAs in enterohemorrhagic Escherichia coli (EHEC). Hfq bound repeated trinucleotide motifs of A-R-N (A-A/G-any nucleotide) often associated with the Shine-Dalgarno translation initiation sequence in mRNAs. These motifs overlapped or were adjacent to the mRNA sequences bound by sRNAs. In consequence, sRNA-mRNA duplex formation will displace Hfq, promoting recycling. Fifty-five sRNAs were identified within bacteriophage-derived regions of the EHEC genome, including some of the most abundant Hfq-interacting sRNAs. One of these (AgvB) antagonized the function of the core genome regulatory sRNA, GcvB, by mimicking its mRNA substrate sequence. This bacteriophage-encoded "anti-sRNA" provided EHEC with a growth advantage specifically in bovine rectal mucus recovered from its primary colonization site in cattle
    • …
    corecore