11 research outputs found

    Patchy promiscuity:machine learning applied to predict the host specificity of <i>Salmonella enterica </i>and <i>Escherichia coli</i>

    Get PDF
    Supporting data for Patchy promiscuity: machine learning applied to predict the host specificity of Salmonella enterica and Escherichia coli, as published in <em>Microbial Genomics</em

    The advantage of intergenic regions as genomic features for machine-learning-based host attribution of Salmonella Typhimurium from the USA

    Get PDF
    Salmonella enterica is a taxonomically diverse pathogen with over 2600 serovars associated with a wide variety of animal hosts including humans, other mammals, birds and reptiles. Some serovars are host-specific or host-restricted and cause disease in distinct host species, while others, such as serovar S. Typhimurium (STm), are generalists and have the potential to colonize a wide variety of species. However, even within generalist serovars such as STm it is becoming clear that pathovariants exist that differ in tropism and virulence. Identifying the genetic factors underlying host specificity is complex, but the availability of thousands of genome sequences and advances in machine learning have made it possible to build specific host prediction models to aid outbreak control and predict the human pathogenic potential of isolates from animals and other reservoirs. We have advanced this area by building host-association prediction models trained on a wide range of genomic features and compared them with predictions based on nearest-neighbour phylogeny. SNPs, protein variants (PVs), antimicrobial resistance (AMR) profiles and intergenic regions (IGRs) were extracted from 3883 high-quality STm assemblies collected from humans, swine, bovine and poultry in the USA, and used to construct Random Forest (RF) machine learning models. An additional 244 recent STm assemblies from farm animals were used as a test set for further validation. The models based on PVs and IGRs had the best performance in terms of predicting the host of origin of isolates and outperformed nearest-neighbour phylogenetic host prediction as well as models based on SNPs or AMR data. However, the models did not yield reliable predictions when tested with isolates that were phylogenetically distinct from the training set. The IGR and PV models were often able to differentiate human isolates in clusters where the majority of isolates were from a single animal source. Notably, IGRs were the feature with the best performance across multiple models which may be due to IGRs acting as both a representation of their flanking genes, equivalent to PVs, while also capturing genomic regulatory variation, such as altered promoter regions. The IGR and PV models predict that ~45 % of the human infections with STm in the USA originate from bovine, ~40 % from poultry and ~14.5 % from swine, although sequences of isolates from other sources were not used for training. In summary, the research demonstrates a significant gain in accuracy for models with IGRs and PVs as features compared to SNP-based and core genome phylogeny predictions when applied within the existing population structure. This article contains data hosted by Microreact

    Acquisition and loss of CTX-M plasmids in Shigella species associated with MSM transmission in the UK

    Get PDF
    Shigellosis in men who have sex with men (MSM) is caused by multidrug resistant Shigellae, exhibiting resistance to antimicrobials including azithromycin, ciprofloxacin and more recently the third-generation cephalosporins. We sequenced four bla (CTX-M-27)-positive MSM Shigella isolates (2018–20) using Oxford Nanopore Technologies; three S. sonnei (identified as two MSM clade 2, one MSM clade 5) and one S. flexneri 3a, to explore AMR context. All S. sonnei isolates harboured Tn7/Int2 chromosomal integrons, whereas S. flexneri 3a contained the Shigella Resistance Locus. All strains harboured IncFII pKSR100-like plasmids (67-83kbp); where present bla (CTX-M-27) was located on these plasmids flanked by IS26 and IS903B, however bla (CTX-M-27) was lost in S. flexneri 3a during storage between Illumina and Nanopore sequencing. IncFII AMR regions were mosaic and likely reorganised by IS26; three of the four plasmids contained azithromycin-resistance genes erm(B) and mph(A) and one harboured the pKSR100 integron. Additionally, all S. sonnei isolates possessed a large IncB/O/K/Z plasmid, two of which carried aph(3’)-Ib/aph(6)-Id/sul2 and tet(A). Monitoring the transmission of mobile genetic elements with co-located AMR determinants is necessary to inform empirical treatment guidance and clinical management of MSM-associated shigellosis

    Enteroaggregative escherichia coli have evolved independently as distinct complexes within the E. Coli population with varying ability to cause disease

    Get PDF
    Enteroaggregative E. Coli (EAEC) is an established diarrhoeagenic pathotype. The association with virulence gene content and ability to cause disease has been studied but little is known about the population structure of EAEC and how this pathotype evolved. Analysis by Multi Locus Sequence Typing of 564 EAEC isolates from cases and controls in Bangladesh, Nigeria and the UK spanning the past 29 years, revealed multiple successful lineages of EAEC. The population structure of EAEC indicates some clusters are statistically associated with disease or carriage, further highlighting the heterogeneous nature of this group of organisms. Different clusters have evolved independently as a result of both mutational and recombination events; the EAEC phenotype is distributed throughout the population of E. coli

    MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island

    Get PDF
    Short-read, high-throughput sequencing technology cannot identify the chromosomal position of repetitive insertion sequences that typically flank horizontally acquired genes such as bacterial virulence genes and antibiotic resistance genes. The MinION nanopore sequencer can produce long sequencing reads on a device similar in size to a USB memory stick. Here we apply a MinION sequencer to resolve the structure and chromosomal insertion site of a composite antibiotic resistance island in Salmonella Typhi Haplotype 58. Nanopore sequencing data from a single 18-h run was used to create a scaffold for an assembly generated from short-read Illumina data. Our results demonstrate the potential of the MinION device in clinical laboratories to fully characterize the epidemic spread of bacterial pathogens

    Analysis of whole genome sequencing for the Escherichia coli O157:H7 typing phages

    Get PDF
    Background: Shiga toxin producing Escherichia coli O157 can cause severe bloody diarrhea and haemolytic uraemic syndrome. Phage typing of E. coli O157 facilitates public health surveillance and outbreak investigations, certain phage types are more likely to occupy specific niches and are associated with specific age groups and disease severity. The aim of this study was to analyse the genome sequences of 16 (fourteen T4 and two T7) E. coli O157 typing phages and to determine the genes responsible for the subtle differences in phage type profiles. Results: The typing phages were sequenced using paired-end Illumina sequencing at The Genome Analysis Centre and the Animal Health and Veterinary Laboratories Agency and bioinformatics programs including Velvet, Brig and Easyfig were used to analyse them. A two-way Euclidian cluster analysis highlighted the associations between groups of phage types and typing phages. The analysis showed that the T7 typing phages (9 and 10) differed by only three genes and that the T4 typing phages formed three distinct groups of similar genomic sequences: Group 1 (1, 8, 11, 12 and 15, 16), Group 2 (3, 6, 7 and 13) and Group 3 (2, 4, 5 and 14). The E. coli O157 phage typing scheme exhibited a significantly modular network linked to the genetic similarity of each group showing that these groups are specialised to infect a subset of phage types. Conclusion: Sequencing the typing phage has enabled us to identify the variable genes within each group and to determine how this corresponds to changes in phage type.Public Health EnglandNational Institute for Health Research scientific research development fundBiotechnology and Biological Sciences Research Council (BBSRC

    Exploiting the explosion of information associated with whole genome sequencing to tackle Shiga toxin-producing Escherichia coli (STEC) in global food production systems

    No full text
    The rates of foodborne disease caused by gastrointestinal pathogens continue to be a concern in both the developed and developing worlds. The growing world population, the increasing complexity of agri-food networks and the wide range of foods now associated with STEC are potential drivers for increased risk of human disease. It is vital that new developments in technology, such as whole genome sequencing (WGS), are effectively utilized to help address the issues associated with these pathogenic microorganisms. This position paper, arising from an OECD funded workshop, provides a brief overview of next generation sequencing technologies and software. It then uses the agent-host-environment paradigm as a basis to investigate the potential benefits and pitfalls of WGS in the examination of (1) the evolution and virulence of STEC, (2) epidemiology from bedside diagnostics to investigations of outbreaks and sporadic cases and (3) food protection from routine analysis of foodstuffs to global food networks. A number of key recommendations are made that include: validation and standardization of acquisition, processing and storage of sequence data including the development of an open access "WGSNET" building up of sequence databases from both prospective and retrospective isolates; development of a suite of open-access software specific for STEC accessible to non-bioinformaticians that promotes understanding of both the computational and biological aspects of the problems at hand prioritization of research funding to both produce and integrate genotypic and phenotypic information suitable for risk assessment; training to develop a supply of individuals working in bioinformatics/software development; training for clinicians, epidemiologists, the food industry and other stakeholders to ensure uptake of the technology and finally review of progress of implementation of WGS. Currently the benefits of WGS are being slowly teased out by academic, government, and industry or private sector researchers around the world. The next phase will require a coordinated international approach to ensure that it's potential to contribute to the challenge of STEC disease can be realized in a cost effective and timely manner
    corecore