57 research outputs found

    Predicting Combinatorial Binding of Transcription Factors to Regulatory Elements in the Human Genome by Association Rule Mining

    Get PDF
    Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some cisregulatory elements have been well studied, the combinations of transcription factors that regulate normal expression levels for the vast majority of the 20,000 genes in the human genome are unknown. We hypothesized that it should be possible to discover transcription factor combinations that regulate gene expression in concert by identifying over-represented combinations of sequence motifs that occur together in the genome. In order to detect combinations of transcription factor binding motifs, we developed a data mining approach based on the use of association rules, which are typically used in market basket analysis. We scored each segment of the genome for the presence or absence of each of 83 transcription factor binding motifs, then used association rule mining algorithms to mine this dataset, thus identifying frequently occurring pairs of distinct motifs within a segment. Results: Support for most pairs of transcription factor binding motifs was highly correlated across different chromosomes although pair significance varied. Known true positive motif pairs showed higher association rule support, confidence, and significance than background. Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature. Conclusion: Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.NIAAA Alcohol Training GrantNational Science FoundationCellular and Molecular Biolog

    Computational meta'omics for microbial community studies

    Get PDF
    Complex microbial communities are an integral part of the Earth's ecosystem and of our bodies in health and disease. In the last two decades, culture-independent approaches have provided new insights into their structure and function, with the exponentially decreasing cost of high-throughput sequencing resulting in broadly available tools for microbial surveys. However, the field remains far from reaching a technological plateau, as both computational techniques and nucleotide sequencing platforms for microbial genomic and transcriptional content continue to improve. Current microbiome analyses are thus starting to adopt multiple and complementary meta'omic approaches, leading to unprecedented opportunities to comprehensively and accurately characterize microbial communities and their interactions with their environments and hosts. This diversity of available assays, analysis methods, and public data is in turn beginning to enable microbiome-based predictive and modeling tools. We thus review here the technological and computational meta'omics approaches that are already available, those that are under active development, their success in biological discovery, and several outstanding challenges

    Relating the metatranscriptome and metagenome of the human gut

    Get PDF
    Although the composition of the human microbiome is now wellstudied, the microbiota’s \u3e8 million genes and their regulation remain largely uncharacterized. This knowledge gap is in part because of the difficulty of acquiring large numbers of samples amenable to functional studies of the microbiota. We conducted what is, to our knowledge, one of the first human microbiome studies in a well-phenotyped prospective cohort incorporating taxonomic, metagenomic, and metatranscriptomic profiling at multiple body sites using self-collected samples. Stool and saliva were provided by eight healthy subjects, with the former preserved by three different methods (freezing, ethanol, and RNAlater) to validate self-collection. Within-subject microbial species, gene, and transcript abundances were highly concordant across sampling methods, with only a small fraction of transcripts (\u3c5%) displaying between-method variation. Next, we investigated relationships between the oral and gut microbial communities, identifying a subset of abundant oral microbes that routinely survive transit to the gut, but with minimal transcriptional activity there. Finally, systematic comparison of the gut metagenome and metatranscriptome revealed that a substantial fraction (41%) of microbial transcripts were not differentially regulated relative to their genomic abundances. Of the remainder, consistently underexpressed pathways included sporulation and amino acid biosynthesis, whereas up-regulated pathways included ribosome biogenesis and methanogenesis. Across subjects, metatranscriptional profiles were significantly more individualized than DNA-level functional profiles, but less variable than microbial composition, indicative of subject-specific whole-community regulation. The results thus detail relationships between community genomic potential and gene expression in the gut, and establish the feasibility of metatranscriptomic investigations in subject-collected and shipped samples

    The Chthonomonas calidirosea genome is highly conserved across geographic locations and distinct chemical and microbial environments in New Zealand's Taupo Volcanic Zone

    Get PDF
    Chthonomonas calidirosea T49T is a low-abundance, carbohydrate-scavenging, and thermophilic soil bacterium with a seemingly disorganized genome. We hypothesized that the C. calidirosea genome would be highly responsive to local selection pressure, resulting in the divergence of its genomic content, genome organization, and carbohydrate utilization phenotype across environments. We tested this hypothesis by sequencing the genomes of four C. calidirosea isolates obtained from four separate geothermal fields in the Taupō Volcanic Zone, New Zealand. For each isolation site, we measured physicochemical attributes and defined the associated microbial community by 16S rRNA gene sequencing. Despite their ecological and geographical isolation, the genome sequences showed low divergence (maximum, 1.17%). Isolate-specific variations included single-nucleotide polymorphisms (SNPs), restriction-modification systems, and mobile elements but few major deletions and no major rearrangements. The 50-fold variation in C. calidirosea relative abundance among the four sites correlated with site environmental characteristics but not with differences in genomic content. Conversely, the carbohydrate utilization profiles of the C. calidirosea isolates corresponded to the inferred isolate phylogenies, which only partially paralleled the geographical relationships among the sample sites. Genomic sequence conservation does not entirely parallel geographic distance, suggesting that stochastic dispersal and localized extinction, which allow for rapid population homogenization with little restriction by geographical barriers, are possible mechanisms of C. calidirosea distribution. This dispersal and extinction mechanism is likely not limited to C. calidirosea but may shape the populations and genomes of many other low-abundance free-living taxa

    Identifying accurate metagenome and amplicon software via a meta-analysis of sequence to taxonomy benchmarking studies

    Get PDF
    Metagenomic and meta-barcode DNA sequencing has rapidly become a widely-used technique for investigating a range of questions, particularly related to health and environmental monitoring. There has also been a proliferation of bioinformatic tools for analysing metagenomic and amplicon datasets, which makes selecting adequate tools a significant challenge. A number of benchmark studies have been undertaken; however, these can present conflicting results. In order to address this issue we have applied a robust Z-score ranking procedure and a network meta-analysis method to identify software tools that are consistently accurate for mapping DNA sequences to taxonomic hierarchies. Based upon these results we have identified some tools and computational strategies that produce robust predictions

    Dysfunction of the Intestinal Microbiome in Inflammatory Bowel Disease and Treatment

    Get PDF
    Background: The inflammatory bowel diseases (IBD) Crohn's disease and ulcerative colitis result from alterations in intestinal microbes and the immune system. However, the precise dysfunctions of microbial metabolism in the gastrointestinal microbiome during IBD remain unclear. We analyzed the microbiota of intestinal biopsies and stool samples from 231 IBD and healthy subjects by 16S gene pyrosequencing and followed up a subset using shotgun metagenomics. Gene and pathway composition were assessed, based on 16S data from phylogenetically-related reference genomes, and associated using sparse multivariate linear modeling with medications, environmental factors, and IBD status. Results: Firmicutes and Enterobacteriaceae abundances were associated with disease status as expected, but also with treatment and subject characteristics. Microbial function, though, was more consistently perturbed than composition, with 12% of analyzed pathways changed compared with 2% of genera. We identified major shifts in oxidative stress pathways, as well as decreased carbohydrate metabolism and amino acid biosynthesis in favor of nutrient transport and uptake. The microbiome of ileal Crohn's disease was notable for increases in virulence and secretion pathways. Conclusions: This inferred functional metagenomic information provides the first insights into community-wide microbial processes and pathways that underpin IBD pathogenesis

    Sub-clinical detection of gut microbial biomarkers of obesity and type 2 diabetes

    Get PDF
    Background: Obesity and type 2 diabetes (T2D) are linked both with host genetics and with environmental factors, including dysbioses of the gut microbiota. However, it is unclear whether these microbial changes precede disease onset. Twin cohorts present a unique genetically-controlled opportunity to study the relationships between lifestyle factors and the microbiome. In particular, we hypothesized that family-independent changes in microbial composition and metabolic function during the sub-clinical state of T2D could be either causal or early biomarkers of progression. Methods: We collected fecal samples and clinical metadata from 20 monozygotic Korean twins at up to two time points, resulting in 36 stool shotgun metagenomes. While the participants were neither obese nor diabetic, they spanned the entire range of healthy to near-clinical values and thus enabled the study of microbial associations during sub-clinical disease while accounting for genetic background. Results: We found changes both in composition and in function of the sub-clinical gut microbiome, including a decrease in Akkermansia muciniphila suggesting a role prior to the onset of disease, and functional changes reflecting a response to oxidative stress comparable to that previously observed in chronic T2D and inflammatory bowel diseases. Finally, our unique study design allowed us to examine the strain similarity between twins, and we found that twins demonstrate strain-level differences in composition despite species-level similarities. Conclusions: These changes in the microbiome might be used for the early diagnosis of an inflamed gut and T2D prior to clinical onset of the disease and will help to advance toward microbial interventions. Electronic supplementary material The online version of this article (doi:10.1186/s13073-016-0271-6) contains supplementary material, which is available to authorized users
    corecore