198 research outputs found

    A High-Throughput DNA Sequence Aligner for Microbial Ecology Studies

    Get PDF
    As the scope of microbial surveys expands with the parallel growth in sequencing capacity, a significant bottleneck in data analysis is the ability to generate a biologically meaningful multiple sequence alignment. The most commonly used aligners have varying alignment quality and speed, tend to depend on a specific reference alignment, or lack a complete description of the underlying algorithm. The purpose of this study was to create and validate an aligner with the goal of quickly generating a high quality alignment and having the flexibility to use any reference alignment. Using the simple nearest alignment space termination algorithm, the resulting aligner operates in linear time, requires a small memory footprint, and generates a high quality alignment. In addition, the alignments generated for variable regions were of as high a quality as the alignment of full-length sequences. As implemented, the method was able to align 18 full-length 16S rRNA gene sequences and 58 V2 region sequences per second to the 50,000-column SILVA reference alignment. Most importantly, the resulting alignments were of a quality equal to SILVA-generated alignments. The aligner described in this study will enable scientists to rapidly generate robust multiple sequences alignments that are implicitly based upon the predicted secondary structure of the 16S rRNA molecule. Furthermore, because the implementation is not connected to a specific database it is easy to generalize the method to reference alignments for any DNA sequence

    Toward a Census of Bacteria in Soil

    Get PDF
    For more than a century, microbiologists have sought to determine the species richness of bacteria in soil, but the extreme complexity and unknown structure of soil microbial communities have obscured the answer. We developed a statistical model that makes the problem of estimating richness statistically accessible by evaluating the characteristics of samples drawn from simulated communities with parametric community distributions. We identified simulated communities with rank-abundance distributions that followed a truncated lognormal distribution whose samples resembled the structure of 16S rRNA gene sequence collections made using Alaskan and Minnesotan soils. The simulated communities constructed based on the distribution of 16S rRNA gene sequences sampled from the Alaskan and Minnesotan soils had a richness of 5,000 and 2,000 operational taxonomic units (OTUs), respectively, where an OTU represents a collection of sequences not more than 3% distant from each other. To sample each of these OTUs in the Alaskan 16S rRNA gene library at least twice, 480,000 sequences would be required; however, to estimate the richness of the simulated communities using nonparametric richness estimators would require only 18,000 sequences. Quantifying the richness of complex environments such as soil is an important step in building an ecological framework. We have shown that generating sufficient sequence data to do so requires less sequencing effort than completely sequencing a bacterial genome

    The dynamics of a family’s gut microbiota reveal variations on a theme

    Full text link
    Abstract Background It is clear that the structure and function of the human microbiota has significant impact on maintenance of health and yet the factors that give rise to an adult microbiota are poorly understood. A combination of genetics, diet, environment, and life history are all thought to impact the development of the gut microbiome. Here we study a chronosequence of the gut microbiota found in eight individuals from a family consisting of two parents and six children ranging in age from two months to ten years old. Results Using 16S rRNA gene and metagenomic shotgun sequence data, it was possible to distinguish the family from a cohort of normal individuals living in the same geographic region and to differentiate each family member. Interestingly, there was a significant core membership to the family members’ microbiota where the abundance of this core accounted for the differences between individuals. It was clear that the introduction of solids represents a significant transition in the development of a mature microbiota. This transition was associated with increased diversity, decreased stability, and the colonization of significant abundances of Bacteroidetes and Clostridiales. Although the children and mother shared essentially the identical diet and environment, the children’s microbiotas were not significantly more similar to their mother than they were to their father. Conclusions This analysis underscores the complex interactions that give rise to a personalized microbiota and suggests the value of studying families as a surrogate for longitudinal studies.http://deepblue.lib.umich.edu/bitstream/2027.42/109502/1/40168_2014_Article_54.pd

    Reducing the Effects of PCR Amplification and Sequencing Artifacts on 16S rRNA-Based Studies

    Get PDF
    The advent of next generation sequencing has coincided with a growth in interest in using these approaches to better understand the role of the structure and function of the microbial communities in human, animal, and environmental health. Yet, use of next generation sequencing to perform 16S rRNA gene sequence surveys has resulted in considerable controversy surrounding the effects of sequencing errors on downstream analyses. We analyzed 2.7×10[superscript 6] reads distributed among 90 identical mock community samples, which were collections of genomic DNA from 21 different species with known 16S rRNA gene sequences; we observed an average error rate of 0.0060. To improve this error rate, we evaluated numerous methods of identifying bad sequence reads, identifying regions within reads of poor quality, and correcting base calls and were able to reduce the overall error rate to 0.0002. Implementation of the PyroNoise algorithm provided the best combination of error rate, sequence length, and number of sequences. Perhaps more problematic than sequencing errors was the presence of chimeras generated during PCR. Because we knew the true sequences within the mock community and the chimeras they could form, we identified 8% of the raw sequence reads as chimeric. After quality filtering the raw sequences and using the Uchime chimera detection program, the overall chimera rate decreased to 1%. The chimeras that could not be detected were largely responsible for the identification of spurious operational taxonomic units (OTUs) and genus-level phylotypes. The number of spurious OTUs and phylotypes increased with sequencing effort indicating that comparison of communities should be made using an equal number of sequences. Finally, we applied our improved quality-filtering pipeline to several benchmarking studies and observed that even with our stringent data curation pipeline, biases in the data generation pipeline and batch effects were observed that could potentially confound the interpretation of microbial community data.National Institutes of Health (U.S.) (1R01HG005975-01)National Science Foundation (U.S.) (award #0743432)National Institutes of Health (U.S.) (grant NIHU54HG004969

    Structure of the gut microbiome following colonization with human feces determines colonic tumor burden

    Full text link
    Abstract Background A growing body of evidence indicates that the gut microbiome plays a role in the development of colorectal cancer (CRC). Patients with CRC harbor gut microbiomes that are structurally distinct from those of healthy individuals; however, without the ability to track individuals during disease progression, it has not been possible to observe changes in the microbiome over the course of tumorigenesis. Mouse models have demonstrated that these changes can further promote colonic tumorigenesis. However, these models have relied upon mouse-adapted bacterial populations and so it remains unclear which human-adapted bacterial populations are responsible for modulating tumorigenesis. Results We transplanted fecal microbiota from three CRC patients and three healthy individuals into germ-free mice, resulting in six structurally distinct microbial communities. Subjecting these mice to a chemically induced model of CRC resulted in different levels of tumorigenesis between mice. Differences in the number of tumors were strongly associated with the baseline microbiome structure in mice, but not with the cancer status of the human donors. Partitioning of baseline communities into enterotypes by Dirichlet multinomial mixture modeling resulted in three enterotypes that corresponded with tumor burden. The taxa most strongly positively correlated with increased tumor burden were members of the Bacteroides, Parabacteroides, Alistipes, and Akkermansia, all of which are Gram-negative. Members of the Gram-positive Clostridiales, including multiple members of Clostridium Group XIVa, were strongly negatively correlated with tumors. Analysis of the inferred metagenome of each community revealed a negative correlation between tumor count and the potential for butyrate production, and a positive correlation between tumor count and the capacity for host glycan degradation. Despite harboring distinct gut communities, all mice underwent conserved structural changes over the course of the model. The extent of these changes was also correlated with tumor incidence. Conclusion Our results suggest that the initial structure of the microbiome determines susceptibility to colonic tumorigenesis. There appear to be opposing roles for certain Gram-negative (Bacteroidales and Verrucomicrobia) and Gram-positive (Clostridiales) bacteria in tumor susceptibility. Thus, the impact of community structure is potentially mediated by the balance between protective, butyrate-producing populations and inflammatory, mucin-degrading populations.http://deepblue.lib.umich.edu/bitstream/2027.42/109448/1/40168_2014_Article_48.pd

    Microbiota-based model improves the sensitivity of fecal immunochemical test for detecting colonic lesions

    Get PDF
    Abstract Background Colorectal cancer (CRC) is the second leading cause of death among cancers in the United States. Although individuals diagnosed early have a greater than 90 % chance of survival, more than one-third of individuals do not adhere to screening recommendations partly because the standard diagnostics, colonoscopy and sigmoidoscopy, are expensive and invasive. Thus, there is a great need to improve the sensitivity of non-invasive tests to detect early stage cancers and adenomas. Numerous studies have identified shifts in the composition of the gut microbiota associated with the progression of CRC, suggesting that the gut microbiota may represent a reservoir of biomarkers that would complement existing non-invasive methods such as the widely used fecal immunochemical test (FIT). Methods We sequenced the 16S rRNA genes from the stool samples of 490 patients. We used the relative abundances of the bacterial populations within each sample to develop a random forest classification model that detects colonic lesions using the relative abundance of gut microbiota and the concentration of hemoglobin in stool. Results The microbiota-based random forest model detected 91.7 % of cancers and 45.5 % of adenomas while FIT alone detected 75.0 % and 15.7 %, respectively. Of the colonic lesions missed by FIT, the model detected 70.0 % of cancers and 37.7 % of adenomas. We confirmed known associations of Porphyromonas assaccharolytica, Peptostreptococcus stomatis, Parvimonas micra, and Fusobacterium nucleatum with CRC. Yet, we found that the loss of potentially beneficial organisms, such as members of the Lachnospiraceae, was more predictive for identifying patients with adenomas when used in combination with FIT. Conclusions These findings demonstrate the potential for microbiota analysis to complement existing screening methods to improve detection of colonic lesions.http://deepblue.lib.umich.edu/bitstream/2027.42/134551/1/13073_2016_Article_290.pd

    DNA from fecal immunochemical test can replace stool for detection of colonic lesions using a microbiota-based model

    Get PDF
    Abstract Background There is a significant demand for colorectal cancer (CRC) screening methods that are noninvasive, inexpensive, and capable of accurately detecting early stage tumors. It has been shown that models based on the gut microbiota can complement the fecal occult blood test and fecal immunochemical test (FIT). However, a barrier to microbiota-based screening is the need to collect and store a patient’s stool sample. Results Using stool samples collected from 404 patients, we tested whether the residual buffer containing resuspended feces in FIT cartridges could be used in place of intact stool samples. We found that the bacterial DNA isolated from FIT cartridges largely recapitulated the community structure and membership of patients’ stool microbiota and that the abundance of bacteria associated with CRC were conserved. We also found that models for detecting CRC that were generated using bacterial abundances from FIT cartridges were equally predictive as models generated using bacterial abundances from stool. Conclusions These findings demonstrate the potential for using residual buffer from FIT cartridges in place of stool for microbiota-based screening for CRC. This may reduce the need to collect and process separate stool samples and may facilitate combining FIT and microbiota-based biomarkers into a single test. Additionally, FIT cartridges could constitute a novel data source for studying the role of the microbiome in cancer and other diseases.http://deepblue.lib.umich.edu/bitstream/2027.42/134673/1/40168_2016_Article_205.pd

    Normalization of the microbiota in patients after treatment for colonic lesions

    Full text link
    Abstract Background Colorectal cancer is a worldwide health problem. Despite growing evidence that members of the gut microbiota can drive tumorigenesis, little is known about what happens to it after treatment for an adenoma or carcinoma. This study tested the hypothesis that treatment for adenoma or carcinoma alters the abundance of bacterial populations associated with disease to those associated with a normal colon. We tested this hypothesis by sequencing the 16S rRNA genes in the feces of 67 individuals before and after treatment for adenoma (N = 22), advanced adenoma (N = 19), and carcinoma (N = 26). Results There were small changes to the bacterial community associated with adenoma or advanced adenoma and large changes associated with carcinoma. The communities from patients with carcinomas changed significantly more than those with adenoma following treatment (P value 0.05). Because the distribution of OTUs across patients and diagnosis groups was irregular, we used the random forest machine learning algorithm to identify groups of OTUs that could be used to classify pre and post-treatment samples for each of the diagnosis groups. Although the adenoma and carcinoma models could reliably differentiate between the pre- and post-treatment samples (P value 0.05). Conclusions By better understanding the response of the microbiota to treatment for adenomas and carcinomas, it is likely that biomarkers will eventually be validated that can be used to quantify the risk of recurrence and the likelihood of survival. Although it was difficult to identify significant differences between pre- and post-treatment samples from patients with adenoma and advanced adenoma, this was not the case for carcinomas. Not only were there large changes in pre- versus post-treatment samples for those with carcinoma, but also these changes were toward a more normal microbiota.https://deepblue.lib.umich.edu/bitstream/2027.42/139593/1/40168_2017_Article_366.pd

    The Effects of Alignment Quality, Distance Calculation Method, Sequence Filtering, and Region on the Analysis of 16S rRNA Gene-Based Studies

    Get PDF
    Pyrosequencing of PCR-amplified fragments that target variable regions within the 16S rRNA gene has quickly become a powerful method for analyzing the membership and structure of microbial communities. This approach has revealed and introduced questions that were not fully appreciated by those carrying out traditional Sanger sequencing-based methods. These include the effects of alignment quality, the best method of calculating pairwise genetic distances for 16S rRNA genes, whether it is appropriate to filter variable regions, and how the choice of variable region relates to the genetic diversity observed in full-length sequences. I used a diverse collection of 13,501 high-quality full-length sequences to assess each of these questions. First, alignment quality had a significant impact on distance values and downstream analyses. Specifically, the greengenes alignment, which does a poor job of aligning variable regions, predicted higher genetic diversity, richness, and phylogenetic diversity than the SILVA and RDP-based alignments. Second, the effect of different gap treatments in determining pairwise genetic distances was strongly affected by the variation in sequence length for a region; however, the effect of different calculation methods was subtle when determining the sample's richness or phylogenetic diversity for a region. Third, applying a sequence mask to remove variable positions had a profound impact on genetic distances by muting the observed richness and phylogenetic diversity. Finally, the genetic distances calculated for each of the variable regions did a poor job of correlating with the full-length gene. Thus, while it is tempting to apply traditional cutoff levels derived for full-length sequences to these shorter sequences, it is not advisable. Analysis of β-diversity metrics showed that each of these factors can have a significant impact on the comparison of community membership and structure. Taken together, these results urge caution in the design and interpretation of analyses using pyrosequencing data
    corecore