269 research outputs found

    Evaluation of genomic island predictors using a comparative genomics approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genomic islands (GIs) are clusters of genes in prokaryotic genomes of probable horizontal origin. GIs are disproportionately associated with microbial adaptations of medical or environmental interest. Recently, multiple programs for automated detection of GIs have been developed that utilize sequence composition characteristics, such as G+C ratio and dinucleotide bias. To robustly evaluate the accuracy of such methods, we propose that a dataset of GIs be constructed using criteria that are independent of sequence composition-based analysis approaches.</p> <p>Results</p> <p>We developed a comparative genomics approach (IslandPick) that identifies both very probable islands and non-island regions. The approach involves 1) flexible, automated selection of comparative genomes for each query genome, using a distance function that picks appropriate genomes for identification of GIs, 2) identification of regions unique to the query genome, compared with the chosen genomes (positive dataset) and 3) identification of regions conserved across all genomes (negative dataset). Using our constructed datasets, we investigated the accuracy of several sequence composition-based GI prediction tools.</p> <p>Conclusion</p> <p>Our results indicate that AlienHunter has the highest recall, but the lowest measured precision, while SIGI-HMM is the most precise method. SIGI-HMM and IslandPath/DIMOB have comparable overall highest accuracy. Our comparative genomics approach, IslandPick, was the most accurate, compared with a curated list of GIs, indicating that we have constructed suitable datasets. This represents the first evaluation, using diverse and, independent datasets that were not artificially constructed, of the accuracy of several sequence composition-based GI predictors. The caveats associated with this analysis and proposals for optimal island prediction are discussed.</p

    Cross-study analyses of microbial abundance using generalized common factor methods

    Full text link
    By creating networks of biochemical pathways, communities of micro-organisms are able to modulate the properties of their environment and even the metabolic processes within their hosts. Next-generation high-throughput sequencing has led to a new frontier in microbial ecology, promising the ability to leverage the microbiome to make crucial advancements in the environmental and biomedical sciences. However, this is challenging, as genomic data are high-dimensional, sparse, and noisy. Much of this noise reflects the exact conditions under which sequencing took place, and is so significant that it limits consensus-based validation of study results. We propose an ensemble approach for cross-study exploratory analyses of microbial abundance data in which we first estimate the variance-covariance matrix of the underlying abundances from each dataset on the log scale assuming Poisson sampling, and subsequently model these covariances jointly so as to find a shared low-dimensional subspace of the feature space. By viewing the projection of the latent true abundances onto this common structure, the variation is pared down to that which is shared among all datasets, and is likely to reflect more generalizable biological signal than can be inferred from individual datasets. We investigate several ways of achieving this, and demonstrate that they work well on simulated and real metagenomic data in terms of signal retention and interpretability

    Infectious Complications Are Associated With Alterations in the Gut Microbiome in Pediatric Patients With Acute Lymphoblastic Leukemia

    Get PDF
    Acute lymphoblastic leukemia is the most common pediatric cancer. Fortunately, survival rates exceed 90%, however, infectious complications remain a significant issue that can cause reductions in the quality of life and prognosis of patients. Recently, numerous studies have linked shifts in the gut microbiome composition to infection events in various hematological malignances including acute lymphoblastic leukemia (ALL). These studies have been limited to observing broad taxonomic changes using 16S rRNA gene profiling, while missing possible differences within microbial functions encoded by individual species. In this study we present the first combined 16S rRNA gene and metagenomic shotgun sequencing study on the gut microbiome of an independent pediatric ALL cohort during treatment. In this study we found distinctive differences in alpha diversity and beta diversity in samples from patients with infectious complications in the first 6 months of therapy. We were also able to find specific species and functional pathways that were significantly different in relative abundance between samples that came from patients with infectious complications. Finally, machine learning models based on patient metadata and bacterial species were able to classify samples with high accuracy (84.09%), with bacterial species being the most important classifying features. This study strengthens our understanding of the association between infection and pediatric acute lymphoblastic leukemia treatment and warrants further investigation in the future

    The Association of Virulence Factors with Genomic Islands

    Get PDF
    Background: It has been noted that many bacterial virulence factor genes are located within genomic islands (GIs; clusters of genes in a prokaryotic genome of probable horizontal origin). However, such studies have been limited to single genera or isolated observations. We have performed the first large-scale analysis of multiple diverse pathogens to examine this association. We additionally identified genes found predominantly in pathogens, but not non-pathogens, across multiple genera using 631 complete bacterial genomes, and we identified common trends in virulence for genes in GIs. Furthermore, we examined the relationship between GIs and clustered regularly interspaced palindromic repeats (CRISPRs) proposed to confer resistance to phage. Methodology/Principal Findings: We show quantitatively that GIs disproportionately contain more virulence factors than the rest of a given genome (p,1E-40 using three GI datasets) and that CRISPRs are also over-represented in GIs. Virulence factors in GIs and pathogen-associated virulence factors are enriched for proteins having more ‘‘offensive’ ’ functions, e.g. active invasion of the host, and are disproportionately components of type III/IV secretion systems or toxins. Numerous hypothetical pathogen-associated genes were identified, meriting further study. Conclusions/Significance: This is the first systematic analysis across diverse genera indicating that virulence factors are disproportionately associated with GIs. ‘‘Offensive’ ’ virulence factors, as opposed to host-interaction factors, may more ofte

    Denoising the Denoisers: an independent evaluation of microbiome sequence error-correction approaches

    Get PDF
    High-depth sequencing of universal marker genes such as the 16S rRNA gene is a common strategy to profile microbial communities. Traditionally, sequence reads are clustered into operational taxonomic units (OTUs) at a defined identity threshold to avoid sequencing errors generating spurious taxonomic units. However, there have been numerous bioinformatic packages recently released that attempt to correct sequencing errors to determine real biological sequences at single nucleotide resolution by generating amplicon sequence variants (ASVs). As more researchers begin to use high resolution ASVs, there is a need for an in-depth and unbiased comparison of these novel “denoising” pipelines. In this study, we conduct a thorough comparison of three of the most widely-used denoising packages (DADA2, UNOISE3, and Deblur) as well as an open-reference 97% OTU clustering pipeline on mock, soil, and host-associated communities. We found from the mock community analyses that although they produced similar microbial compositions based on relative abundance, the approaches identified vastly different numbers of ASVs that significantly impact alpha diversity metrics. Our analysis on real datasets using recommended settings for each denoising pipeline also showed that the three packages were consistent in their per-sample compositions, resulting in only minor differences based on weighted UniFrac and Bray–Curtis dissimilarity. DADA2 tended to find more ASVs than the other two denoising pipelines when analyzing both the real soil data and two other host-associated datasets, suggesting that it could be better at finding rare organisms, but at the expense of possible false positives. The open-reference OTU clustering approach identified considerably more OTUs in comparison to the number of ASVs from the denoising pipelines in all datasets tested. The three denoising approaches were significantly different in their run times, with UNOISE3 running greater than 1,200 and 15 times faster than DADA2 and Deblur, respectively. Our findings indicate that, although all pipelines result in similar general community structure, the number of ASVs/OTUs and resulting alpha-diversity metrics varies considerably and should be considered when attempting to identify rare organisms from possible background noise

    Comparative genomic analysis reveals habitat-specific genes and regulatory hubs within the genus Novosphingobium

    Get PDF
    © The Author(s), 2017. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in mSystems 2 (2017): e00020-17, doi:10.1128/mSystems.00020-17.Species belonging to the genus Novosphingobium are found in many different habitats and have been identified as metabolically versatile. Through comparative genomic analysis, we identified habitat-specific genes and regulatory hubs that could determine habitat selection for Novosphingobium spp. Genomes from 27 Novosphingobium strains isolated from diverse habitats such as rhizosphere soil, plant surfaces, heavily contaminated soils, and marine and freshwater environments were analyzed. Genome size and coding potential were widely variable, differing significantly between habitats. Phylogenetic relationships between strains were less likely to describe functional genotype similarity than the habitat from which they were isolated. In this study, strains (19 out of 27) with a recorded habitat of isolation, and at least 3 representative strains per habitat, comprised four ecological groups—rhizosphere, contaminated soil, marine, and freshwater. Sulfur acquisition and metabolism were the only core genomic traits to differ significantly in proportion between these ecological groups; for example, alkane sulfonate (ssuABCD) assimilation was found exclusively in all of the rhizospheric isolates. When we examined osmolytic regulation in Novosphingobium spp. through ectoine biosynthesis, which was assumed to be marine habitat specific, we found that it was also present in isolates from contaminated soil, suggesting its relevance beyond the marine system. Novosphingobium strains were also found to harbor a wide variety of mono- and dioxygenases, responsible for the metabolism of several aromatic compounds, suggesting their potential to act as degraders of a variety of xenobiotic compounds. Protein-protein interaction analysis revealed β-barrel outer membrane proteins as habitat-specific hubs in each of the four habitats—freshwater (Saro_1868), marine water (PP1Y_AT17644), rhizosphere (PMI02_00367), and soil (V474_17210). These outer membrane proteins could play a key role in habitat demarcation and extend our understanding of the metabolic versatility of the Novosphingobium species.This work was supported by grants from the Department of Biotechnology (DBT), R.K., S.H., K.P., A.B., and U.S. gratefully acknowledge the National Bureau of Agriculturally Important Microorganisms (NBAIM), Science and Engineering Research Board (SERB), N-PDF (PDF/2015/000062), (PDF/2015, 000319), University Grant Commission (UGC) for the Dr. D. S. Kothari Postdoctoral Fellowship and UGC for providing fellowships, respectively

    The impact of chemerin or chemokine-like receptor 1 loss on the mouse gut microbiome

    Get PDF
    Chemerin is an adipocyte derived signalling molecule (adipokine) that serves as a ligand activator of Chemokine-like receptor 1(CMKLR1). Chemerin/CMKLR1 signalling is well established to regulate fundamental processes in metabolism and inflammation. The composition and function of gut microbiota has also been shown to impact the development of metabolic and inflammatory diseases such as obesity, diabetes and inflammatory bowel disease. In this study, we assessed the microbiome composition of fecal samples isolated from wildtype, chemerin, or CMKLR1 knockout mice using Illumina-based sequencing. Moreover, the knockout mice and respective wildtype mice used in this study were housed at different universities allowing us to compare facility-dependent effects on microbiome composition. While there was no difference in alpha diversity within samples when compared by either facility or genotype, we observed a dramatic difference in the presence and abundance of numerous taxa between facilities. There were minor differences in bacterial abundance between wildtype and chemerin knockout mice, but significantly more differences in taxa abundance between wildtype and CMKLR1 knockout mice. Specifically, CMKLR1 knockout mice exhibited decreased abundance of Akkermansia and Prevotella, which correlated with body weight in CMKLR1 knockout, but not wildtype mice. This is the first study to investigate a linkage between chemerin/CMKLR1 signaling and microbiome composition. The results of our study suggest that chemerin/CMKLR1 signaling influences metabolic processes through effects on the gut microbiome. Furthermore, the dramatic difference in microbiome composition between facilities might contribute to discrepancies in the metabolic phenotype of CMKLR1 knockout mice reported by independent groups. Considered altogether, these findings establish a foundation for future studies to investigate the relationship between chemerin signaling and the gut microbiome on the development and progression of metabolic and inflammatory disease

    Detection of Helicobacter pylori Microevolution and Multiple Infection from Gastric Biopsies by Housekeeping Gene Amplicon Sequencing

    Get PDF
    Despite the great efforts devoted to research on Helicobacter pylori, the prevalence of single-strain infection or H. pylori mixed infection and its implications in the mode of transmission of this bacterium are still controversial. In this study, we explored the usefulness of housekeeping gene amplicon sequencing in the detection of H. pylori microevolution and multiple infections. DNA was extracted from five gastric biopsies from four patients infected with distinct histopathological diagnoses. PCR amplification of six H. pylori-specific housekeeping genes was then assessed on each sample. Optimal results were obtained for the cgt and luxS genes, which were selected for amplicon sequencing. A total of 11,833 cgt and 403 luxS amplicon sequences were obtained, 2042 and 112 of which were unique sequences, respectively. All cgt and luxS sequences were clustered at 97% to 9 and 13 operational taxonomic units (OTUs), respectively. For each sample from a different patient, a single OTU comprised the majority of sequences in both genes, but more than one OTU was detected in all samples. These results suggest that multiple infections with a predominant strain together with other minority strains are the main way by which H. pylori colonizes the human stomach
    corecore