11 research outputs found

    Bacterial host attribution and bioinformatic characterisation of enteric bacteria Salmonella enterica and Escherichia coli from different hosts and environments

    Get PDF
    With the advent of relatively low cost whole genome sequencing (WGS), it is now possible to obtain sequences from large numbers of bacterial strains and interrogate their core and accessory genomes in relation to associated metadata. While there are some bacterial species with preferred hosts, especially in terms of disease, there has been no real systematic genomic investigation of host and niche specificity of ’generalist’ bacteria, i.e., those that can be isolated from multiple hosts and environments. The main aim of this research was to determine if host and/or niche-specific proteins can be identified for ’multi-host adapted’ bacteria such as E. coli and Salmonella Typhimurium (STm) in order to predict the ’origin’ of a strain and its zoonotic potential from its sequence. Two datasets of ’multi-host’ bacteria were analysed: 1,203 STm isolates from 4 hosts (avian, bovine, human and swine) and E. coli from 6 hosts (avian, bovine, canine, environmental, human and swine). Based on classical core genome analysis such as core phylogeny, multilocus sequence typing and phylo-grouping, no strong correlations with host were identified. The accessory genome was also investigated for host-based associations, and accessory host associated proteins (HAP) were identified for each of the bacteria/ host groups. These proteins were used to build a machine learning (ML) classifier - support vector machine (SVM) - to predict the isolation host of the bacterial isolates. The majority of the isolates from both species were predicted correctly with prediction accuracy ranging from 67% to 90%. For both bacterial species the most challenging were bovine and swine host groups as these two had many features in common. The approach allowed not only prediction of host based on WGS but also an assessment of how much the genome of particular isolates resembled the features of the genomes of the same species isolated from other hosts. This allowed ’generalist’ and ’specialist’ strains from each host group to be estimated as well as the sequences that indicate successful transmission potential between hosts. This work also showed that diverse collections of E. coli or STm can be used as a baseline for prediction and quantification of zoonotic potential as was demonstrated with E. coli O157 and Salmonella serovar Typhi. Overall this part of the research indicated marked host restriction for both STm and E. coli, with only limited isolate subsets exhibiting host promiscuity based on predicted protein content. ML can be successfully applied to interrogate source attribution of bacterial isolates and has the capacity to predict zoonotic potential. Using the same ML approach, another question was asked about how similar are the known zoonotic pathogens. When studied apart, E. coli O157 can be classified further into human and bovine isolates with only a small proportion of bovine isolates predicted as ’human’, pointing to the specific cattle strains that are potentially a more serious threat to human health. This approach was tested with 2 independent sets of O157 human outbreak strains with traced-back isolates from animals and food. The outbreak strains independent of the origin were scored as ’human’. This finding has profound implications for public health management of disease because interventions in cattle, such a vaccination, could be targeted at herds carrying strains of high zoonotic potential. The final section the thesis research was based on the STm dataset and compared different ML approaches to test which algorithm performed best for host prediction. Dimensionality reduction techniques as well as unsupervised and supervised ML were applied to HAP. Dimensionality reduction techniques and unsupervised ML were not able to split the dataset by host and produced different results which could be challenging to interpret correctly in terms of biological significance of the factors that influenced clustering. On the other hand, all three supervised classifiers resulted in very comparable high levels of prediction (over 95%). Thus, the choice of supervised classifier for host prediction should be based on the knowledge of the end-user as well as on requirements for any further analysis. To conclude, accessory genomes were successfully used for extraction of host associated proteins as well as for prediction of source host and quantification of zoonotic potential for bacteria species that can be isolated from multiple hosts. The methods described here can be applied to other bacteria and overall have implications for monitoring, identification and targeted interventions associated with potentially zoonotic infections. The results are completely dependent on the dataset quality which should be as large and diverse as possible. The research highlights the predictive potential of such algorithms but also the need for bacterial sequences to be gathered with as much useful metadata as possible, including isolation host

    Patchy promiscuity:machine learning applied to predict the host specificity of <i>Salmonella enterica </i>and <i>Escherichia coli</i>

    Get PDF
    Supporting data for Patchy promiscuity: machine learning applied to predict the host specificity of Salmonella enterica and Escherichia coli, as published in <em>Microbial Genomics</em

    Support Vector Machine applied to predict the zoonotic potential of E. coli O157 cattle isolates

    Get PDF
    Sequence analyses of pathogen genomes facilitate the tracking of disease outbreaks and allow relationships between strains to be reconstructed and virulence factors to be identified. However, these methods are generally used after an outbreak has happened. Here, we show that support vector machine analysis of bovine E. coli O157 isolate sequences can be applied to predict their zoonotic potential, identifying cattle strains more likely to be a serious threat to human health. Notably, only a minor subset (less than 10%) of bovine E. coli O157 isolates analyzed in our datasets were predicted to have the potential to cause human disease; this is despite the fact that the majority are within previously defined pathogenic lineages I or I/II and encode key virulence factors. The predictive capacity was retained when tested across datasets. The major differences between human and bovine E. coli O157 isolates were due to the relative abundances of hundreds of predicted prophage proteins. This finding has profound implications for public health management of disease because interventions in cattle, such a vaccination, can be targeted at herds carrying strains of high zoonotic potential. Machine-learning approaches should be applied broadly to further our understanding of pathogen biology

    Genome structural variation in Escherichia coli O157:H7

    Get PDF
    The human zoonotic pathogen Escherichia coli O157:H7 is defined by its extensive prophage repertoire including those that encode Shiga toxin, the factor responsible for inducing life-threatening pathology in humans. As well as introducing genes that can contribute to the virulence of a strain, prophage can enable the generation of large-chromosomal rearrangements (LCRs) by homologous recombination. This work examines the types and frequencies of LCRs across the major lineages of the O157:H7 serotype. We demonstrate that LCRs are a major source of genomic variation across all lineages of E. coli O157:H7 and by using both optical mapping and Oxford Nanopore long-read sequencing prove that LCRs are generated in laboratory cultures started from a single colony and that these variants can be recovered from colonized cattle. LCRs are biased towards the terminus region of the genome and are bounded by specific prophages that share large regions of sequence homology associated with the recombinational activity. RNA transcriptional profiling and phenotyping of specific structural variants indicated that important virulence phenotypes such as Shiga-toxin production, type-3 secretion and motility can be affected by LCRs. In summary, E. coli O157:H7 has acquired multiple prophage regions over time that act to continually produce structural variants of the genome. These findings raise important questions about the significance of this prophage-mediated genome contingency to enhance adaptability between environments

    Whole Genome Sequence Analysis Reveals Lower Diversity and Frequency of Acquired Antimicrobial Resistance (AMR) Genes in E. coli From Dairy Herds Compared With Human Isolates From the Same Region of Central Zambia

    Get PDF
    Antibiotic treatment of sick dairy cattle is critical for the sustainability of this production system which is vital for food security and societal prosperity in many low and middle-income countries. Given the increasingly high levels of antibiotic resistance worldwide and the challenge this presents for the treatment of bacterial infections, the rational use of antibiotics in humans and animals has been emphatically recommended in the spirit of a “One Health” approach. The aim of this study was to characterize antimicrobial resistance (AMR) genes and their frequencies from whole genome sequences of Escherichia coli isolated from both dairy cattle and human patients in central Zambia. Whole genome sequences of E. coli isolates from dairy cattle (n = 224) and from patients at a local hospital (n = 73) were compared for the presence of acquired AMR genes. In addition we analyzed the publicly available genomes of 317 human E. coli isolates from over the wider African continent. Both acquired antibiotic resistance genes and phylogroups were identified from de novo assemblies and SNP based phylogenetic analyses were used to visualize the distribution of resistance genes in E. coli isolates from the two hosts. Greater acquired AMR gene diversity was detected in human compared to bovine E. coli isolates across multiple classes of antibiotics with particular resistance genes for extended-spectrum beta lactamases (ESBL), quinolones, macrolides and fosfomycin only detected in E. coli genomes of human origin. The striking difference was that the Zambian or wider African human isolates were significantly more likely to possess multiple acquired AMR genes compared to the Zambian dairy cattle isolates. The median number of resistance genes in the Zambian cattle cohort was 0 (0–1 interquartile range), while in the Zambian human and wider African cohorts the medians and interquartile ranges were 6 (4–9) and 6 (0–8), respectively. The lower frequency and reduced diversity of acquired AMR genes in the dairy cattle isolates is concordant with relatively limited antibiotic use that we have documented in this region, especially among smallholder farmers. The relatively distinct resistant profiles in the two host populations also indicates limited sharing of strains or genes

    Analysis of Escherichia coli O157 strains in cattle and humans between Scotland and England & Wales: implications for human health.

    Get PDF
    For the last two decades, the human infection frequency of Escherichia coli O157 (O157) in Scotland has been 2.5-fold higher than in England and Wales. Results from national cattle surveys conducted in Scotland and England and Wales in 2014/2015 were combined with data on reported human clinical cases from the same time frame to determine if strain differences in national populations of O157 in cattle could be associated with higher human infection rates in Scotland. Shiga toxin subtype (Stx) and phage type (PT) were examined within and between host (cattle vs human) and nation (Scotland vs England and Wales). For a subset of the strains, whole genome sequencing (WGS) provided further insights into geographical and host association. All three major O157 lineages (I, II, I/II) and most sub-lineages (Ia, Ib, Ic, IIa, IIb, IIc) were represented in cattle and humans in both nations. While the relative contribution of different reservoir hosts to human infection is unknown, WGS analysis indicated that the majority of O157 diversity in human cases was captured by isolates from cattle. Despite comparable cattle O157 prevalence between nations, strain types were localized. PT21/28 (sub-lineage Ic, Stx2a+) was significantly more prevalent in Scottish cattle [odds ratio (OR) 8.7 (2.3-33.7; P<0.001] and humans [OR 2.2 (1.5-3.2); P<0.001]. In England and Wales, cattle had a significantly higher association with sub-lineage IIa strains [PT54, Stx2c; OR 5.6 (1.27-33.3); P=0.011] while humans were significantly more closely associated with sub-lineage IIb [PT8, Stx1 and Stx2c; OR 29 (4.9-1161); P<0.001]. Therefore, cattle farms in Scotland were more likely to harbour Stx2a+O157 strains compared to farms in E and W ( P<0.001). There was evidence of limited cattle strain migration between nations and clinical isolates from one nation were more similar to cattle isolates from the same nation, with sub-lineage Ic (mainly PT21/28) exhibiting clear national association and evidence of local transmission in Scotland. While we propose the higher rate of O157 clinical cases in Scotland, compared to England and Wales, is a consequence of the nationally higher level of Stx2a+O157 strains in Scottish cattle, we discuss the multiple additional factors that may also contribute to the different infection rates between these nations

    Genome structural variation in Escherichia coli O157:H7

    Get PDF
    The human zoonotic pathogen Escherichia coli O157:H7 is defined by its extensive prophage repertoire including those that encode Shiga toxin, the factor responsible for inducing life-threatening pathology in humans. As well as introducing genes that can contribute to the virulence of a strain, prophage can enable the generation of large-chromosomal rearrangements (LCRs) by homologous recombination. This work examines the types and frequencies of LCRs across the major lineages of the O157:H7 serotype. We demonstrate that LCRs are a major source of genomic variation across all lineages of E. coli O157:H7 and by using both optical mapping and Oxford Nanopore long-read sequencing prove that LCRs are generated in laboratory cultures started from a single colony and that these variants can be recovered from colonized cattle. LCRs are biased towards the terminus region of the genome and are bounded by specific prophages that share large regions of sequence homology associated with the recombinational activity. RNA transcriptional profiling and phenotyping of specific structural variants indicated that important virulence phenotypes such as Shiga-toxin production, type-3 secretion and motility can be affected by LCRs. In summary, E. coli O157:H7 has acquired multiple prophage regions over time that act to continually produce structural variants of the genome. These findings raise important questions about the significance of this prophage-mediated genome contingency to enhance adaptability between environments
    corecore