39 research outputs found

    Heuristic and Hierarchical-Based Population Mining of Salmonella enterica Lineage I Pan-Genomes as a Platform to Enhance Food Safety

    Get PDF
    The recent incorporation of bacterial whole-genome sequencing (WGS) into Public Health laboratories has enhanced foodborne outbreak detection and source attribution. As a result, large volumes of publicly available datasets can be used to study the biology of foodborne pathogen populations at an unprecedented scale. To demonstrate the application of a heuristic and agnostic hierarchical population structure guided pan-genome enrichment analysis (PANGEA), we used populations of S. enterica lineage I to achieve two main objectives: (i) show how hierarchical population inquiry at different scales of resolution can enhance ecological and epidemiological inquiries; and (ii) identify population-specific inferable traits that could provide selective advantages in food production environments. Publicly available WGS data were obtained from NCBI database for three serovars of Salmonella enterica subsp. enterica lineage I (S. Typhimurium, S. Newport, and S. Infantis). Using the hierarchical genotypic classifications (Serovar, BAPS1, ST, cgMLST), datasets from each of the three serovars showed varying degrees of clonal structuring. When the accessory genome (PANGEA) was mapped onto these hierarchical structures, accessory loci could be linked with specific genotypes. A large heavy-metal resistance mobile element was found in the Monophasic ST34 lineage of S. Typhimurium, and laboratory testing showed that Monophasic isolates have on average a higher degree of copper resistance than the Biphasic ones. In S. Newport, an extra sugEgene copy was found among most isolates of the ST45 lineage, and laboratory testing of multiple isolates confirmed that isolates of S. Newport ST45 were on average less sensitive to the disinfectant cetylpyridimium chloride than non-ST45 isolates. Lastly, data-mining of the accessory genomic content of S. Infantis revealed two cryptic Ecotypes with distinct accessory genomic content and distinct ecological patterns. Poultry appears to be themajor reservoir for Ecotype 1, and temporal analysis further suggested a recent ecological succession, with Ecotype 2 apparently being displaced by Ecotype 1. Altogether, the use of a heuristic hierarchical-based population structure analysis that includes bacterial pan-genomes (core and accessory genomes) can (1) improve genomic resolution for mapping populations and accessing epidemiological patterns; and (2) define lineage-specific informative loci that may be associated with survival in the food chain

    High-Throughput flaA Short Variable Region Sequencing to Assess Campylobacter Diversity in Fecal Samples From Birds

    Get PDF
    Current approach to identify sources of human pathogens is largely dependent on the cultivation and isolation of target bacteria. For rapid pathogen source identification, culture-independent strain typing method is necessary. In this study, we designed new primer set that broadly covers flaA short variable region (SVR) of various Campylobacter species, and applied the flaA SVR sequencing method to examine the diversity of Campylobacter spp. in geese fecal samples (n = 16) with and without bacteria cultivation. Twenty-three Campylobacter strains isolated from the 16 geese fecal samples were grouped similarly by conventional flaA restriction fragment length polymorphism (RFLP) method and by the flaA SVR sequencing method, but higher discriminant power was observed in the flaA SVR sequencing approach. For culture-independent flaA SVR sequencing analysis, we developed and optimized the sequence data analysis pipeline to identify as many genotypes as possible, while minimizing the detection of genotypes generated by sequencing errors. By using this pipeline, 51,629 high-quality flaA sequence reads were clustered into 16 operational taxonomic units (=genotypes) by using 98% sequence similarity and >50 sequence duplicates. Almost all flaA genotypes obtained by culture-dependent method were also identified by culture-independent flaA SVR MiSeq sequencing method. In addition, more flaA genotypes were identified probably due to high throughput nature of the MiSeq sequencing. These results suggest that the flaA SVR sequencing could be used to analyze the diversity of Campylobacter spp. without bacteria isolation. This method is promising to rapidly identify potential sources of Campylobacter pathogens

    Complex host genetics influence the microbiome in inflammatory bowel disease

    Get PDF
    Background: Human genetics and host-associated microbial communities have been associated independently with a wide range of chronic diseases. One of the strongest associations in each case is inflammatory bowel disease (IBD), but disease risk cannot be explained fully by either factor individually. Recent findings point to interactions between host genetics and microbial exposures as important contributors to disease risk in IBD. These include evidence of the partial heritability of the gut microbiota and the conferral of gut mucosal inflammation by microbiome transplant even when the dysbiosis was initially genetically derived. Although there have been several tests for association of individual genetic loci with bacterial taxa, there has been no direct comparison of complex genome-microbiome associations in large cohorts of patients with an immunity-related disease. Methods: We obtained 16S ribosomal RNA (rRNA) gene sequences from intestinal biopsies as well as host genotype via Immunochip in three independent cohorts totaling 474 individuals. We tested for correlation between relative abundance of bacterial taxa and number of minor alleles at known IBD risk loci, including fine mapping of multiple risk alleles in the Nucleotide-binding oligomerization domain-containing protein 2 (NOD2) gene exon. We identified host polymorphisms whose associations with bacterial taxa were conserved across two or more cohorts, and we tested related genes for enrichment of host functional pathways. Results: We identified and confirmed in two cohorts a significant association between NOD2 risk allele count and increased relative abundance of Enterobacteriaceae, with directionality of the effect conserved in the third cohort. Forty-eight additional IBD-related SNPs have directionality of their associations with bacterial taxa significantly conserved across two or three cohorts, implicating genes enriched for regulation of innate immune response, the JAK-STAT cascade, and other immunity-related pathways. Conclusions: These results suggest complex interactions between genetically altered host functional pathways and the structure of the microbiome. Our findings demonstrate the ability to uncover novel associations from paired genome-microbiome data, and they suggest a complex link between host genetics and microbial dysbiosis in subjects with IBD across independent cohorts. Electronic supplementary material The online version of this article (doi:10.1186/s13073-014-0107-1) contains supplementary material, which is available to authorized users

    High-performance tools for precise microbiome characterization

    No full text
    University of Minnesota Ph.D. dissertation.August 2018. Major: Biomedical Informatics and Computational Biology. Advisor: Dan Knights. 1 computer file (PDF); v, 95 pages.The microbiome, defined as the vast number of microorganisms inhabiting both human and non-human environments, has been associated with human disease as well as other important ecological phenomena. However, its quantitative study is complicated in part by measurement error and computational limitations, pointing to a need for more sensitive and reproducible DNA sequence analysis techniques. To this end, I have developed a variety of improved methods including a flexible short-read quality control pipeline, curated databases of marker genes and whole genomes, streamlined OTU picking software, and a high-throughput optimal aligner with taxonomy interpolation. Together, these methods represent advancements over traditional sequence analysis pipelines and may improve the quality of downstream statistical analyses

    NINJA-OPS: Fast Accurate Marker Gene Alignment Using Concatenated Ribosomes

    No full text
    <div><p>The explosion of bioinformatics technologies in the form of next generation sequencing (NGS) has facilitated a massive influx of genomics data in the form of short reads. Short read mapping is therefore a fundamental component of next generation sequencing pipelines which routinely match these short reads against reference genomes for contig assembly. However, such techniques have seldom been applied to microbial marker gene sequencing studies, which have mostly relied on novel heuristic approaches. We propose NINJA Is Not Just Another OTU-Picking Solution (NINJA-OPS, or NINJA for short), a fast and highly accurate novel method enabling reference-based marker gene matching (picking Operational Taxonomic Units, or OTUs). NINJA takes advantage of the Burrows-Wheeler (BW) alignment using an artificial reference chromosome composed of concatenated reference sequences, the “concatesome,” as the BW input. Other features include automatic support for paired-end reads with arbitrary insert sizes. NINJA is also free and open source and implements several pre-filtering methods that elicit substantial speedup when coupled with existing tools. We applied NINJA to several published microbiome studies, obtaining accuracy similar to or better than previous reference-based OTU-picking methods while achieving an order of magnitude or more speedup and using a fraction of the memory footprint. NINJA is a complete pipeline that takes a FASTA-formatted input file and outputs a QIIME-formatted taxonomy-annotated BIOM file for an entire MiSeq run of human gut microbiome 16S genes in under 10 minutes on a dual-core laptop.</p></div

    Alignment accuracy of NINJA vs USEARCH 8 (where both reported a match).

    No full text
    <p>Each point on the graph represents a sequence for which both tools found a valid alignment. A point’s position along the X axis corresponds to alignment score (in %ID) for the match chosen by USEARCH 8, and its position on the Y axis corresponds to the alignment score against the match chosen by NINJA. Points along the diagonal represent sequences for which both tools picked the same quality match. Points above the diagonal correspond to sequences for which NINJA produced more accurate hits, and points below the diagonal represent sequences for which USEARCH 8 produced more accurate hits. Note the presence of a line at the top of the graph showing a number of sequences for which NINJA selected a perfect match from the database while USEARCH 8 could not.</p
    corecore