29 research outputs found

    Salmonella enterica serovar virulence clusters

    No full text
    The serovars of Salmonella enterica display dramatic differences in pathogenesis and host preferences. We developed a process (patent pending) for grouping Salmonella isolates and serovars by their public health risk. We collated a curated set of 12,337 S. enterica isolate genomes from human, beef, and bovine sources in the US. After annotating a virulence gene catalog for each isolate, we used unsupervised random forest methods to estimate the proximity (similarity) between isolates based upon the genomic presentation of putative virulence traits We then grouped isolates (virulence clusters) using hierarchical clustering (Ward’s method), used non-parametric bootstrapping to assess cluster stability, and externally validated the clusters against epidemiological virulence measures from FoodNet, the National Outbreak Reporting System (NORS), and US federal sampling of beef products. We identified five stable virulence clusters of S. enterica serovars. Cluster 1 (higher virulence) serovars yielded an annual incidence rate of domestically acquired sporadic cases roughly one and a half times higher than the other four clusters combined (Clusters 2-5, lower virulence). Compared to other clusters, cluster 1 also had a higher proportion of infections leading to hospitalization and was implicated in more foodborne and beef-associated outbreaks, despite being isolated at a similar frequency from beef products as other clusters. We also identified subpopulations within 11 serovars. Remarkably, we found S. Infantis and S. Typhimurium subpopulations that significantly differed in genome length and clinical case presentation. Further, we found that the presence of the pESI plasmid accounted for the genome length differences between the S. Infantis subpopulations. Our results show that S. enterica strains associated with highest incidence of human infections share a common virulence repertoire. This work could be updated regularly and used in combination with foodborne surveillance information to prioritize serovars of public health concern.</p

    Salmonella enterica serovar virulence clusters

    No full text
    The serovars of Salmonella enterica display dramatic differences in pathogenesis and host preferences. We developed a process (patent pending) for grouping Salmonella isolates and serovars by their public health risk. We collated a curated set of 12,337 S. enterica isolate genomes from human, beef, and bovine sources in the US. After annotating a virulence gene catalog for each isolate, we used unsupervised random forest methods to estimate the proximity (similarity) between isolates based upon the genomic presentation of putative virulence traits  We then grouped isolates (virulence clusters) using hierarchical clustering (Ward’s method), used non-parametric bootstrapping to assess cluster stability, and externally validated the clusters against epidemiological virulence measures from FoodNet, the National Outbreak Reporting System (NORS), and US federal sampling of beef products. We identified five stable virulence clusters of S. enterica serovars. Cluster 1 (higher virulence) serovars yielded an annual incidence rate of domestically acquired sporadic cases roughly one and a half times higher than the other four clusters combined (Clusters 2-5, lower virulence). Compared to other clusters, cluster 1 also had a higher proportion of infections leading to hospitalization and was implicated in more foodborne and beef-associated outbreaks, despite being isolated at a similar frequency from beef products as other clusters. We also identified subpopulations within 11 serovars. Remarkably, we found S. Infantis and S. Typhimurium subpopulations that significantly differed in genome length and clinical case presentation. Further, we found that the presence of the pESI plasmid accounted for the genome length differences between the S. Infantis subpopulations. Our results show that S. enterica strains associated with highest incidence of human infections share a common virulence repertoire. This work could be updated regularly and used in combination with foodborne surveillance information to prioritize serovars of public health concern.   </p

    Conceptual model of virulence cluster development.

    No full text
    First, we downloaded contig assemblies and quality controlled for fragmentation followed by the identification of virulence factors. We then fit an unsupervised random forest model to the isolate level virulence factors catalogues to approximate relatedness. We converted the resultant similarity matrix to a distance matrix (1 –similarity) and clustered using Ward’s method. We identified five stable clusters and validated using non-parametric bootstrapping.</p

    Serovar virulence cluster designations.

    No full text
    Virulence cluster designations (k = 5) for the 37 serovars in the analysis set. (CSV)</p

    Metadata for the analysis set of genomes and SISTR serovar prediction.

    No full text
    Metadata for the contig assemblies used in the analysis including results of the in silico serovar prediction for the analysis set genomes from the SISTR software. (XLSX)</p

    Addition of a sixth virulence cluster.

    No full text
    (A) Dendrogram depicting the hierarchical relationship between 12,337 S. enterica genome assemblies based upon virulence factor gene carriage with six virulence clusters superimposed on top. (B) Heatmap of serovar proportion within each of the six respective virulence clusters. Rows are clustered using Ward’s method. (C) Characteristics of the six virulence clusters: cluster stability—Jaccard similarity of 10,000 non-parametric bootstraps, Number of Genomes—depicting the number of S. enterica genomes constituent in each cluster, and number of serovars (within cluster serovar proportion > 0.5) in each cluster. (TIF)</p

    Full list of putative virulence loci considered in the random forest model.

    No full text
    Gene name, locus tag, database source, Genus origin, gene product and classification for the 182 putative virulence factor loci used in the random forest model. (CSV)</p

    Isolate virulence subpopulation cluster designations.

    No full text
    Subpopulation cluster designations (k = 37) for the 12,337 contig assemblies in the analysis set. (XLSX)</p
    corecore