51 research outputs found

    MotifCluster: an interactive online tool for clustering and visualizing sequences using shared motifs

    Get PDF
    MotifCluster finds related motifs in a set of sequences and clusters the sequences into families using the motifs they contain

    UniFrac – An online tool for comparing microbial community diversity in a phylogenetic context

    Get PDF
    BACKGROUND: Moving beyond pairwise significance tests to compare many microbial communities simultaneously is critical for understanding large-scale trends in microbial ecology and community assembly. Techniques that allow microbial communities to be compared in a phylogenetic context are rapidly gaining acceptance, but the widespread application of these techniques has been hindered by the difficulty of performing the analyses. RESULTS: We introduce UniFrac, a web application available at , that allows several phylogenetic tests for differences among communities to be easily applied and interpreted. We demonstrate the use of UniFrac to cluster multiple environments, and to test which environments are significantly different. We show that analysis of previously published sequences from the Columbia river, its estuary, and the adjacent coastal ocean using the UniFrac interface provided insights that were not apparent from the initial data analysis, which used other commonly employed techniques to compare the communities. CONCLUSION: UniFrac provides easy access to powerful multivariate techniques for comparing microbial communities in a phylogenetic context. We thus expect that it will provide a completely new picture of many microbial interactions and processes in both environmental and medical contexts

    Using the nucleotide substitution rate matrix to detect horizontal gene transfer

    Get PDF
    BACKGROUND: Horizontal gene transfer (HGT) has allowed bacteria to evolve many new capabilities. Because transferred genes perform many medically important functions, such as conferring antibiotic resistance, improved detection of horizontally transferred genes from sequence data would be an important advance. Existing sequence-based methods for detecting HGT focus on changes in nucleotide composition or on differences between gene and genome phylogenies; these methods have high error rates. RESULTS: First, we introduce a new class of methods for detecting HGT based on the changes in nucleotide substitution rates that occur when a gene is transferred to a new organism. Our new methods discriminate simulated HGT events with an error rate up to 10 times lower than does GC content. Use of models that are not time-reversible is crucial for detecting HGT. Second, we show that using combinations of multiple predictors of HGT offers substantial improvements over using any single predictor, yielding as much as a factor of 18 improvement in performance (a maximum reduction in error rate from 38% to about 3%). Multiple predictors were combined by using the random forests machine learning algorithm to identify optimal classifiers that separate HGT from non-HGT trees. CONCLUSION: The new class of HGT-detection methods introduced here combines advantages of phylogenetic and compositional HGT-detection techniques. These new techniques offer order-of-magnitude improvements over compositional methods because they are better able to discriminate HGT from non-HGT trees under a wide range of simulated conditions. We also found that combining multiple measures of HGT is essential for detecting a wide range of HGT events. These novel indicators of horizontal transfer will be widely useful in detecting HGT events linked to the evolution of important bacterial traits, such as antibiotic resistance and pathogenicity

    Fast-Find: A novel computational approach to analyzing combinatorial motifs

    Get PDF
    BACKGROUND: Many vital biological processes, including transcription and splicing, require a combination of short, degenerate sequence patterns, or motifs, adjacent to defined sequence features. Although these motifs occur frequently by chance, they only have biological meaning within a specific context. Identifying transcripts that contain meaningful combinations of patterns is thus an important problem, which existing tools address poorly. RESULTS: Here we present a new approach, Fast-FIND (Fast-Fully Indexed Nucleotide Database), that uses a relational database to support rapid indexed searches for arbitrary combinations of patterns defined either by sequence or composition. Fast-FIND is easy to implement, takes less than a second to search the entire Drosophila genome sequence for arbitrary patterns adjacent to sites of alternative polyadenylation, and is sufficiently fast to allow sensitivity analysis on the patterns. We have applied this approach to identify transcripts that contain combinations of sequence motifs for RNA-binding proteins that may regulate alternative polyadenylation. CONCLUSION: Fast-FIND provides an efficient way to identify transcripts that are potentially regulated via alternative polyadenylation. We have used it to generate hypotheses about interactions between specific polyadenylation factors, which we will test experimentally

    Short pyrosequencing reads suffice for accurate microbial community analysis

    Get PDF
    Pyrosequencing technology allows us to characterize microbial communities using 16S ribosomal RNA (rRNA) sequences orders of magnitude faster and more cheaply than has previously been possible. However, results from different studies using pyrosequencing and traditional sequencing are often difficult to compare, because amplicons covering different regions of the rRNA might yield different conclusions. We used sequences from over 200 globally dispersed environments to test whether studies that used similar primers clustered together mistakenly, without regard to environment. We then tested whether primer choice affects sequence-based community analyses using UniFrac, our recently-developed method for comparing microbial communities. We performed three tests of primer effects. We tested whether different simulated amplicons generated the same UniFrac clustering results as near-full-length sequences for three recent large-scale studies of microbial communities in the mouse and human gut, and the Guerrero Negro microbial mat. We then repeated this analysis for short sequences (100-, 150-, 200- and 250-base reads) resembling those produced by pyrosequencing. The results show that sequencing effort is best focused on gathering more short sequences rather than fewer longer ones, provided that the primers are chosen wisely, and that community comparison methods such as UniFrac are surprisingly robust to variation in the region sequenced

    Evolution of Symbiotic Bacteria in the Distal Human Intestine

    Get PDF
    The adult human intestine contains trillions of bacteria, representing hundreds of species and thousands of subspecies. Little is known about the selective pressures that have shaped and are shaping this community's component species, which are dominated by members of the Bacteroidetes and Firmicutes divisions. To examine how the intestinal environment affects microbial genome evolution, we have sequenced the genomes of two members of the normal distal human gut microbiota, Bacteroides vulgatus and Bacteroides distasonis, and by comparison with the few other sequenced gut and non-gut Bacteroidetes, analyzed their niche and habitat adaptations. The results show that lateral gene transfer, mobile elements, and gene amplification have played important roles in affecting the ability of gut-dwelling Bacteroidetes to vary their cell surface, sense their environment, and harvest nutrient resources present in the distal intestine. Our findings show that these processes have been a driving force in the adaptation of Bacteroidetes to the distal gut environment, and emphasize the importance of considering the evolution of humans from an additional perspective, namely the evolution of our microbiomes

    Robust estimation of bacterial cell count from optical density

    Get PDF
    Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data
    corecore