124 research outputs found

    A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences

    Get PDF
    Background: We propose a sequence clustering algorithm and compare the partition quality and execution time of the proposed algorithm with those of a popular existing algorithm. The proposed clustering algorithm uses a grammar-based distance metric to determine partitioning for a set of biological sequences. The algorithm performs clustering in which new sequences are compared with cluster-representative sequences to determine membership. If comparison fails to identify a suitable cluster, a new cluster is created. Results: The performance of the proposed algorithm is validated via comparison to the popular DNA/RNA sequence clustering approach, CD-HIT-EST, and to the recently developed algorithm, UCLUST, using two different sets of 16S rDNA sequences from 2,255 genera. The proposed algorithm maintains a comparable CPU execution time with that of CD-HIT-EST which is much slower than UCLUST, and has successfully generated clusters with higher statistical accuracy than both CD-HIT-EST and UCLUST. The validation results are especially striking for large datasets. Conclusions: We introduce a fast and accurate clustering algorithm that relies on a grammar-based sequence distance. Its statistical clustering quality is validated by clustering large datasets containing 16S rDNA sequences

    Genomic Signal Processing Techniques for Taxonomy Prediction

    Get PDF
    To analyze complex biodiversity in microbial communities, 16S rRNA marker gene sequences are often assigned to operational taxonomic units (OTUs). The abundance of methods that have been used to assign 16S rRNA marker gene sequences into OTUs brings discussions in which one is better. Suggestions on having clustering methods should be stable in which generated OTU assignments do not change as additional sequences are added to the dataset is contradicting some other researches contend that the methods should properly present the distances of sequences is more important. We add one more de novo clustering algorithm, Rolling Snowball to existing ones including the single linkage, complete linkage, average linkage, abundance-based greedy clustering, distance-based greedy clustering, and Swarm and the open and closed-reference methods. We use GreenGenes, RDP, and SILVA 16S rRNA gene databases to show the success of the method. The highest accuracy is obtained with SILVA library

    Microbial Similarity between Students in a Common Dormitory Environment Reveals the Forensic Potential of Individual Microbial Signatures.

    Get PDF
    The microbiota of the built environment is an amalgamation of both human and environmental sources. While human sources have been examined within single-family households or in public environments, it is unclear what effect a large number of cohabitating people have on the microbial communities of their shared environment. We sampled the public and private spaces of a college dormitory, disentangling individual microbial signatures and their impact on the microbiota of common spaces. We compared multiple methods for marker gene sequence clustering and found that minimum entropy decomposition (MED) was best able to distinguish between the microbial signatures of different individuals and was able to uncover more discriminative taxa across all taxonomic groups. Further, weighted UniFrac- and random forest-based graph analyses uncovered two distinct spheres of hand- or shoe-associated samples. Using graph-based clustering, we identified spheres of interaction and found that connection between these clusters was enriched for hands, implicating them as a primary means of transmission. In contrast, shoe-associated samples were found to be freely interacting, with individual shoes more connected to each other than to the floors they interact with. Individual interactions were highly dynamic, with groups of samples originating from individuals clustering freely with samples from other individuals, while all floor and shoe samples consistently clustered together.IMPORTANCE Humans leave behind a microbial trail, regardless of intention. This may allow for the identification of individuals based on the "microbial signatures" they shed in built environments. In a shared living environment, these trails intersect, and through interaction with common surfaces may become homogenized, potentially confounding our ability to link individuals to their associated microbiota. We sought to understand the factors that influence the mixing of individual signatures and how best to process sequencing data to best tease apart these signatures

    Taxonomic and environmental annotation of bacterial 16S rRNA gene sequences via Shannon entropy and database metadata terms

    Get PDF
    Microbial ecology seeks to describe the diversity and distribution of microorganisms in various habitats within the context of environmental variables. High throughput sequencing has greatly boosted the number and scope of projects aiming to study and analyse these organisms, with ever-increasing amounts of data being generated. Amplicon based taxonomic analysis, which determines the presence of microbial taxa in different environments on the basis of marker gene annotations, often uses percentage identity as the main metric to determine sequence similarity against databases. This data is then used to study the distribution of biodiversity as well as the response of microbial communities to stressors. However, the 16S rRNA gene displays varying degrees of sequence conservation along its length and is therefore prone to provide different results depending on the part of 16S rRNA gene used for sequencing and analysis. Furthermore, sequence alignment is primarily performed using the popular BLAST sequence alignment tool, which incurs a great computational performance penalty although newer, more efficient tools are being developed. A new approach that is fast and more accurate is critically needed to process the avalanche of data. Additionally, repositories of environmental metadata can provide contextual information to sequence annotations, potentially enhancing analysis if they can be incorporated into bioinformatics pipelines. The overarching aim of this work was to enhance the taxonomic annotation of bacterial sequences by developing a weighted scheme that utilizes inherent evolutionary conservation in the bacterial 16S rRNA gene sequences and by adding contextual, environmental information pertaining to these sequences in a systematic fashion

    Microhabitats Shape Bacterial Community Composition, Ecosystem Function, and Genome Traits

    Full text link
    This dissertation helps to integrate bacteria into the broader field of ecology by investigating bacterial community composition and diversity as it relates to ecosystem function in microhabitats within freshwater systems of the Great Lakes Region. Here, I combine field- and laboratory-based measurements of observational data collected from three major types of lake ecosystems: inland lakes, a freshwater estuary (Muskegon Lake), and a Great Lake (Lake Michigan). First, to determine the primary controls on lake bacterial community composition, I assessed the influence of lake layer (i.e. stratification), lake productivity, and particle-association on the bacterial community across 11 inland lakes with varying productivity in Southwestern Michigan. I found that particle-association very strongly structures freshwater bacterial community composition. Second, I studied a freshwater estuarine lake, Muskegon Lake, which has a large spatio-temporal variation in bacterial heterotrophic productivity, to test whether there was an association between heterotrophic production and bacterial biodiversity (defined as the number of taxa and taxon abundance). I specifically focused on two co-occurring freshwater habitats that my first chapter showed to be populated by very distinct communities: particle-associated and free-living. Positive biodiversity-heterotrophic productivity relationships were found only in particles. Third, I performed a genome-based analysis of free-living specialists, particle-associated bacterial specialists, and generalists to characterize the genomic architecture and genetic traits that are associated with adaptations to these specific habitats. The genomes of particle-associated specialist bacteria were about twice the size of the genomes of free-living specialists and generalists, which had streamlined genomes. Fourth, to identify the bacterial taxa driving heterotrophic productivity across the large set of lake samples, I found that high nucleic acid (i.e., HNA) functional groups identified by flow cytometry can serve as a proxy for freshwater bacterial heterotrophic productivity, whereas low nucleic acid (i.e., LNA) functional groups cannot. Then, I used a machine learning approach to identify bacterial taxa associated with HNA and LNA. This allowed me to identify the bacterial taxa, which were often members of the Phylum Bacteroidetes, that are associated heterotrophic productivity. Finally, I investigated patterns of lake specificity and phylogenetic conservation of taxonomic groups. Throughout my dissertation, I found that there was very deep (Class to Phylum-level) phylogenetic conservation of which bacteria lived in which habitats, but not of what bacterial taxa contributed to HNA and LNA functional groups, and thus heterotrophic productivity. Positive biodiversity-heterotrophic productivity relationships only existed in particle-associated, and not free-living communities, and communities composed of more phylogenetically related organisms were more productive per-capita. These differences in biodiversity-ecosystem function relationships may in part be explained by particle-associated bacteria having larger genomes, higher nitrogen content, and more unique genes that provide the potential for niche complementarity. The taxa that drove HNA and LNA cell numbers, and by proxy heterotrophic productivity, were lake and time-specific and indicated that taxa could switch between the two functional groups. Overall, my dissertation elucidates the ecological and evolutionary effects of microhabitat structure on bacterial communities and genomes in natural systems.PHDEcology and Evolutionary BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147651/1/marschmi_1.pd

    Microbial communities under distinct thermal and geochemical regimes in axial and off-axis sediments of Guaymas Basin

    Get PDF
    © The Author(s), 2021. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Teske, A., Wegener, G., Chanton, J. P., White, D., MacGregor, B., Hoer, D., de Beer, D., Zhuang, G., Saxton, M. A., Joye, S. B., Lizarralde, D., Soule, S. A., & Ruff, S. E. Microbial communities under distinct thermal and geochemical regimes in axial and off-axis sediments of Guaymas Basin. Frontiers in Microbiology, 12, (2021): 633649, https://doi.org/10.3389/fmicb.2021.633649.Cold seeps and hydrothermal vents are seafloor habitats fueled by subsurface energy sources. Both habitat types coexist in Guaymas Basin in the Gulf of California, providing an opportunity to compare microbial communities with distinct physiologies adapted to different thermal regimes. Hydrothermally active sites in the southern Guaymas Basin axial valley, and cold seep sites at Octopus Mound, a carbonate mound with abundant methanotrophic cold seep fauna at the Central Seep location on the northern off-axis flanking regions, show consistent geochemical and microbial differences between hot, temperate, cold seep, and background sites. The changing microbial actors include autotrophic and heterotrophic bacterial and archaeal lineages that catalyze sulfur, nitrogen, and methane cycling, organic matter degradation, and hydrocarbon oxidation. Thermal, biogeochemical, and microbiological characteristics of the sampling locations indicate that sediment thermal regime and seep-derived or hydrothermal energy sources structure the microbial communities at the sediment surface.Research on Guaymas Basin in the Teske lab is supported by NSF Molecular and cellular Biology grant 1817381 “Collaborative Research: Next generation physiology: a systems-level understanding of microbes driving carbon cycling in marine sediments”. Sampling in Guaymas Basin was supported by collaborative NSF Biological Oceanography grants 1357238 and 1357360 “Collaborative Research: Microbial carbon cycling and its interaction with sulfur and nitrogen transformations in Guaymas Basin hydrothermal sediments” to AT and SJ, respectively. SER was supported by an AITF/Eyes High Postdoctoral Fellowship and start-up funds provided by the Marine Biological Laboratory

    Deep Architectures for Visual Recognition and Description

    Get PDF
    In recent times, digital media contents are inherently of multimedia type, consisting of the form text, audio, image and video. Several of the outstanding computer Vision (CV) problems are being successfully solved with the help of modern Machine Learning (ML) techniques. Plenty of research work has already been carried out in the field of Automatic Image Annotation (AIA), Image Captioning and Video Tagging. Video Captioning, i.e., automatic description generation from digital video, however, is a different and complex problem altogether. This study compares various existing video captioning approaches available today and attempts their classification and analysis based on different parameters, viz., type of captioning methods (generation/retrieval), type of learning models employed, the desired output description length generated, etc. This dissertation also attempts to critically analyze the existing benchmark datasets used in various video captioning models and the evaluation metrics for assessing the final quality of the resultant video descriptions generated. A detailed study of important existing models, highlighting their comparative advantages as well as disadvantages are also included. In this study a novel approach for video captioning on the Microsoft Video Description (MSVD) dataset and Microsoft Video-to-Text (MSR-VTT) dataset is proposed using supervised learning techniques to train a deep combinational framework, for achieving better quality video captioning via predicting semantic tags. We develop simple shallow CNN (2D and 3D) as feature extractors, Deep Neural Networks (DNNs and Bidirectional LSTMs (BiLSTMs) as tag prediction models and Recurrent Neural Networks (RNNs) (LSTM) model as the language model. The aim of the work was to provide an alternative narrative to generating captions from videos via semantic tag predictions and deploy simpler shallower deep model architectures with lower memory requirements as solution so that it is not very memory extensive and the developed models prove to be stable and viable options when the scale of the data is increased. This study also successfully employed deep architectures like the Convolutional Neural Network (CNN) for speeding up automation process of hand gesture recognition and classification of the sign languages of the Indian classical dance form, ‘Bharatnatyam’. This hand gesture classification is primarily aimed at 1) building a novel dataset of 2D single hand gestures belonging to 27 classes that were collected from (i) Google search engine (Google images), (ii) YouTube videos (dynamic and with background considered) and (iii) professional artists under staged environment constraints (plain backgrounds). 2) exploring the effectiveness of CNNs for identifying and classifying the single hand gestures by optimizing the hyperparameters, and 3) evaluating the impacts of transfer learning and double transfer learning, which is a novel concept explored for achieving higher classification accuracy

    Microbial Communities Under Distinct Thermal and Geochemical Regimes in Axial and Off-Axis Sediments of Guaymas Basin

    Get PDF
    Cold seeps and hydrothermal vents are seafloor habitats fueled by subsurface energy sources. Both habitat types coexist in Guaymas Basin in the Gulf of California, providing an opportunity to compare microbial communities with distinct physiologies adapted to different thermal regimes. Hydrothermally active sites in the southern Guaymas Basin axial valley, and cold seep sites at Octopus Mound, a carbonate mound with abundant methanotrophic cold seep fauna at the Central Seep location on the northern off-axis flanking regions, show consistent geochemical and microbial differences between hot, temperate, cold seep, and background sites. The changing microbial actors include autotrophic and heterotrophic bacterial and archaeal lineages that catalyze sulfur, nitrogen, and methane cycling, organic matter degradation, and hydrocarbon oxidation. Thermal, biogeochemical, and microbiological characteristics of the sampling locations indicate that sediment thermal regime and seep-derived or hydrothermal energy sources structure the microbial communities at the sediment surface
    • 

    corecore