6 research outputs found

    Phylogenetic placement of STEC/ETEC chromosomes using restriction-based and <i>in silico</i> whole genome maps.

    No full text
    <p>STEC/ETEC chromosomal maps (red boxes) were compared to other <i>E</i>. <i>coli</i> and <i>Shigella</i> spp. chromosomal maps. Similarities were calculated using UPGMA algorithm.</p

    Phylogenetic placement of STEC/ETEC strains using core genome MLST and sequence alignment.

    No full text
    <p>UPGMA tree based on aligned sequences of the defined <i>E</i>. <i>coli</i> core genome genes (n = 1341) showing the phylogenetic relationship of the three STEC/ETEC genomes and 73 additional <i>E</i>. <i>coli</i> and <i>Shigella</i> spp. strains. The different pathogroups, STEC, ETEC, EPEC, EIEC, EAEC, AIEC (adherent/invasive <i>E</i>. <i>coli</i>), APEC (avian pathogenic <i>E</i>. <i>coli</i>), UPEC (uropathogenic <i>E</i>. <i>coli</i>), ExPEC (extraintestinal pathogenic <i>E</i>. <i>coli</i>), MNEC (meningitis causing <i>E</i>. <i>coli</i>), commensal, and environmental <i>E</i>. <i>coli</i> are marked by colors. The reference genomes STEC O139:H1 S1191, ETEC UMNF18, STEC O2:H25 7v, STEC O8:H19 MHI813, and STEC O73:H18 C165-02 were previously characterized as STEC/ETEC hybrids [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0135936#pone.0135936.ref014" target="_blank">14</a>,<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0135936#pone.0135936.ref037" target="_blank">37</a>]. The reference genomes ETEC O6 E8, ETEC O6 E66, ETEC O78 E36, ETEC O25 E135, ETEC O115 E21, ETEC ON3 E562, ETEC O169 E344, ETEC O148 E222, ETEC O27 E220, ETEC O114 E934, ETEC O159 E159, ETEC O15 E330, ETEC O112ab E399, and ETEC ON5 E620 represent the phylogenetic lineages L1-L14 of the ETEC pathogroup, respectively [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0135936#pone.0135936.ref017" target="_blank">17</a>].</p

    Whole genome map comparison of STEC/ETEC strains.

    No full text
    <p>Areas in blue are common between two maps, areas in white are unique to the map in which they are contained, and areas in red are matching more than once. (A) Comparison between IH53473 and IH57218, (B) comparison between IH53473 and FE95160, and (C) comparison between IH57218 and FE95160.</p

    INNUENDO whole genome and core genome MLST schemas and datasets for Salmonella enterica

    No full text
    <p><strong>Dataset</strong></p> <p>As reference dataset, 4,307 public available draft or complete genome assemblies and available metadata of <em>Salmonella enterica</em> have been downloaded from public repositories (i.e. <a href="https://enterobase.warwick.ac.uk/">EnteroBase</a>, <a href="https://www.ncbi.nlm.nih.gov/">National Center for Biotechnology Information NCBI</a>and <a href="https://www.ebi.ac.uk/">The European Bioinformatics Institute EMBL-EBI</a>; accessed April 2017). The collection includes 1,465 <em>S.</em> Enteritidis, 2,442 <em>S.</em>Typhimurium, and 400 of other frequently isolated serovars in Europe. The dataset includes also 153 <em>S.</em>Typhimurium variant 4,[5],12:i:- collected from different Italian regions between 2012 and 2014 during a surveillance study and 129 <em>S.</em> Enteritidis belonging to the INNUENDO sequence dataset (<a href="https://www.ebi.ac.uk/ena/data/view/PRJEB27020">PRJEB27020</a>). The 282 additional genomes were assembled using <a href="https://github.com/B-UMMI/INNUca">INNUca v3.1</a>.</p> <p>File 'Metadata/Senterica_metadata.txt' contains metadata information for each strain including source classification, host taxa, year and country of isolation, serotype, classical pubMLST 7 genes ST classification, and source/method of the assembly. </p> <p>The directory 'Genomes' contains all the 4,589 assemblies of the strains listed in 'Metadata/Senterica_metadata.txt'. Please note that genomes marked as 'Enterobase' have been downloaded from Enterobase webpage http://enterobase.warwick.ac.uk.</p> <p><strong>Schema creation and validation</strong></p> <p>The wgMLST schema from <a href="https://enterobase.warwick.ac.uk/species/senterica/download_data">EnteroBase</a> have been downloaded and curated using <a href="https://github.com/B-UMMI/chewBBACA/wiki/1.-Schema-Creation"><em>chewBBACA AutoAlleleCDSCuration</em></a> for removing all alleles that are not coding sequences (CDS). The quality of the remain loci have been assessed using <a href="https://github.com/B-UMMI/chewBBACA/wiki/1.-Schema-Creation"><em>chewBBACA Schema Evaluation</em></a> and loci with single alleles, those with high length variability (i.e. if more than 1 allele is outside the mode +/- 0.05 size) and those present in less than 0.5% of the <em>Salmonella</em> genomes in <a href="https://enterobase.warwick.ac.uk/species/index/senterica">EnteroBase</a> at the date of the analysis (April 2017) have been removed. The wgMLST schema have been further curated, excluding all those loci detected as “Repeated Loci” and loci annotated as “non-informative paralogous hit (NIPH/ NIPHEM)” or “Allele Larger/ Smaller than length mode (ALM/ ASM)” by the <a href="https://github.com/B-UMMI/chewBBACA/wiki/2.-Allele-Calling"><em>chewBBACA Allele Calling</em></a> engine in more than 1% of a dataset composed by 4,589 <em>Salmonella</em> genomes.</p> <p>File 'Schemas/Senterica_wgMLST_ 8558_schema.tar.gz' contains the wgMLST schema formatted for chewBBACA and includes a total of  8,558 loci.</p> <p>File 'Schemas/Senterica_cgMLST_ 3255_listGenes.txt' contains the list of genes from the wgMLST schema which defines the cgMLST schema. The cgMLST schema consists of  3,255 loci and has been defined as the loci present in at least the 99% of the 4,589 <em>Salmonella</em> genomes. Genomes have no more than 2% of missing loci.</p> <p>File 'Allele_Profles/Senterica_wgMLST_alleleProfiles.tsv' contains the wgMLST allelic profile of the 4,589 <em>Salmonella</em> genomes of the dataset. Please note that missing loci follow the annotation of chewBBACA Allele Calling software.</p> <p>File 'Allele_Profles/Senterica_cgMLST_alleleProfiles.tsv' contains the cgMLST allelic profile of the 4,589 <em>Salmonella</em> genomes of the dataset. Please note that missing loci are indicated with a zero.</p> <p><strong>Additional citations</strong></p> <p>The schema are prepared to be used with <a href="https://github.com/B-UMMI/chewBBACA/wiki"><strong>chewBBACA</strong></a>. When using the schema in this repository please cite also:</p> <blockquote> <p>Silva M, Machado M, Silva D, Rossi M, Moran-Gilad J, Santos S, Ramirez M, Carriço J. chewBBACA: A complete suite for gene-by-gene schema creation and strain identification. 15/03/2018. M Gen 4(3): doi:10.1099/mgen.0.000166 <a href="http://mgen.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000166">http://mgen.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000166</a></p> </blockquote> <p><em>Salmonella enterica</em> schema is a derivation of EnteroBase <em>Salmonella </em><a href="http://enterobase.warwick.ac.uk/">EnteroBase</a> wgMLST schema. When using the schema in this repository please cite also:</p> <blockquote> <p>Alikhan N-F, Zhou Z, Sergeant MJ, Achtman M (2018) A genomic overview of the population structure of <em>Salmonella</em>. PLoS Genet 14 (4):e1007261. <a href="https://doi.org/10.1371/journal.pgen.1007261">https://doi.org/10.1371/journal.pgen.1007261</a></p> </blockquote

    INNUENDO whole genome and core genome MLST schemas and datasets for Campylobacter jejuni

    No full text
    <p><strong>Dataset</strong></p> <p>Raw reads deposited in the European Nucleotide Archive (ENA) or in the NCBI Sequence Read Archive (SRA) as <em>C. jejuni</em> were retrieved in April 2017. In total 5,691 genomes passed the INNUca v3.1 pipeline have been selected. Additionally, 566 raw reads previously published in <a href="https://www.ncbi.nlm.nih.gov/pubmed/27041390">Kovanen et al., 2016</a>, <a href="https://www.ncbi.nlm.nih.gov/pubmed/28348829">Llarena et al., 2016</a>, <a href="https://www.ncbi.nlm.nih.gov/pubmed/25232158">Kovanen et al., 2014</a>, <a href="https://www.ncbi.nlm.nih.gov/pubmed/24655229">Kovanen et al., 2014</a> and <a href="http://www.sciencedirect.com/science/article/pii/S0740002016310449?via=ihub">Gacia-Sanchez et a., 2017</a> were included. The database also includes 269 <em>C. jejuni</em> belonging to the INNUENDO Sequence Dataset (<a href="https://www.ebi.ac.uk/ena/data/view/PRJEB27020">PRJEB27020</a>). Genomes were assembled using <a href="https://github.com/INNUENDOCON/INNUca">INNUca v3.1 pipeline</a> and passed the QC. </p> <p>File 'Metadata/Cjejuni_metadata.txt' contains metadata information for each strain including country and year of isolation, source classification and taxa of the host, classical pubMLST 7 genes ST and CC classification. </p> <p>The directory 'Genomes' contains all the 6,526 INNUca V3.1 assemblies of the strains listed in 'Metadata/Cjejuni_metadata.txt'.</p> <p><strong>Schema creation and validation</strong></p> <p>Draft genome assemblies were annotated using Prokka and initial pangenome was defined using Roary. The <a href="https://github.com/B-UMMI/chewBBACA/wiki/1.-Schema-Creation"><em>chewBBACA CreateSchema.py</em></a> was used for creating a whole genome schema starting from roary pangenome. The schema was initially composed by 5,447 loci and has been populated with the 6,526 <em>C. jejuni</em> genomes. The quality of the loci has been assessed using <a href="https://github.com/B-UMMI/chewBBACA/wiki/1.-Schema-Creation"><em>chewBBACA Schema Evaluation</em></a>. Loci with single alleles and those with high length variability (i.e. if more than 1 allele is outside the mode +/- 0.05 size) have been removed. The wgMLST schema has been further curated, excluding all those loci detected as “Repeated Loci” and loci annotated as “non-informative paralogous hit (NIPH/ NIPHEM)” or “Allele Larger/ Smaller than length mode (ALM/ ASM)” by the <a href="https://github.com/B-UMMI/chewBBACA/wiki/2.-Allele-Calling"><em>chewBBACA Allele Calling</em></a> engine in more than 1% of the <em>C. jejuni</em> genomes dataset.</p> <p>File 'Schema/Cjejuni_wgMLST_2795_schema.tar.gz' contains the wgMLST schema formatted for chewBBACA and includes a total of 2,795 loci.</p> <p>File 'Schema/Cjejuni_cgMLST_678_listGenes.txt' contains the list of genes from the wgMLST schema which defines the cgMLST schema. The cgMLST schema consists of 678 loci and has been defined as the loci present in at least the 99.9% of the 6,526 <em>C. jejuni</em> genomes. Genomes have no more than 2% of missing loci.</p> <p>File 'Allele_Profles/Cjejuni_wgMLST_alleleProfiles.tsv' contains the wgMLST allelic profile of the 6,526 <em>C. jejuni</em> genomes of the dataset. Please note that missing loci follow the annotation of chewBBACA Allele Calling software.</p> <p>File 'Allele_Profles/Cjejuni_cgMLST_alleleProfiles.tsv' contains the cgMLST allelic profile of the 6,526 <em>C. jejuni</em> genomes of the dataset. Please note that missing loci are indicated with a zero.</p
    corecore