15 research outputs found
Ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses.
BackgroundFungi play critical roles in many ecosystems, cause serious diseases in plants and animals, and pose significant threats to human health and structural integrity problems in built environments. While most fungal diversity remains unknown, the development of PCR primers for the internal transcribed spacer (ITS) combined with next-generation sequencing has substantially improved our ability to profile fungal microbial diversity. Although the high sequence variability in the ITS region facilitates more accurate species identification, it also makes multiple sequence alignment and phylogenetic analysis unreliable across evolutionarily distant fungi because the sequences are hard to align accurately. To address this issue, we created ghost-tree, a bioinformatics tool that integrates sequence data from two genetic markers into a single phylogenetic tree that can be used for diversity analyses. Our approach starts with a "foundation" phylogeny based on one genetic marker whose sequences can be aligned across organisms spanning divergent taxonomic groups (e.g., fungal families). Then, "extension" phylogenies are built for more closely related organisms (e.g., fungal species or strains) using a second more rapidly evolving genetic marker. These smaller phylogenies are then grafted onto the foundation tree by mapping taxonomic names such that each corresponding foundation-tree tip would branch into its new "extension tree" child.ResultsWe applied ghost-tree to graft fungal extension phylogenies derived from ITS sequences onto a foundation phylogeny derived from fungal 18S sequences. Our analysis of simulated and real fungal ITS data sets found that phylogenetic distances between fungal communities computed using ghost-tree phylogenies explained significantly more variance than non-phylogenetic distances. The phylogenetic metrics also improved our ability to distinguish small differences (effect sizes) between microbial communities, though results were similar to non-phylogenetic methods for larger effect sizes.ConclusionsThe Silva/UNITE-based ghost tree presented here can be easily integrated into existing fungal analysis pipelines to enhance the resolution of fungal community differences and improve understanding of these communities in built environments. The ghost-tree software package can also be used to develop phylogenetic trees for other marker gene sets that afford different taxonomic resolution, or for bridging genome trees with amplicon trees.Availabilityghost-tree is pip-installable. All source code, documentation, and test code are available under the BSD license at https://github.com/JTFouquier/ghost-tree
Geography and Location Are the Primary Drivers of Office Microbiome Composition.
In the United States, humans spend the majority of their time indoors, where they are exposed to the microbiome of the built environment (BE) they inhabit. Despite the ubiquity of microbes in BEs and their potential impacts on health and building materials, basic questions about the microbiology of these environments remain unanswered. We present a study on the impacts of geography, material type, human interaction, location in a room, seasonal variation, and indoor and microenvironmental parameters on bacterial communities in offices. Our data elucidate several important features of microbial communities in BEs. First, under normal office environmental conditions, bacterial communities do not differ on the basis of surface material (e.g., ceiling tile or carpet) but do differ on the basis of the location in a room (e.g., ceiling or floor), two features that are often conflated but that we are able to separate here. We suspect that previous work showing differences in bacterial composition with surface material was likely detecting differences based on different usage patterns. Next, we find that offices have city-specific bacterial communities, such that we can accurately predict which city an office microbiome sample is derived from, but office-specific bacterial communities are less apparent. This differs from previous work, which has suggested office-specific compositions of bacterial communities. We again suspect that the difference from prior work arises from different usage patterns. As has been previously shown, we observe that human skin contributes heavily to the composition of BE surfaces. IMPORTANCE Our study highlights several points that should impact the design of future studies of the microbiology of BEs. First, projects tracking changes in BE bacterial communities should focus sampling efforts on surveying different locations in offices and in different cities but not necessarily different materials or different offices in the same city. Next, disturbance due to repeated sampling, though detectable, is small compared to that due to other variables, opening up a range of longitudinal study designs in the BE. Next, studies requiring more samples than can be sequenced on a single sequencing run (which is increasingly common) must control for run effects by including some of the same samples in all of the sequencing runs as technical replicates. Finally, detailed tracking of indoor and material environment covariates is likely not essential for BE microbiome studies, as the normal range of indoor environmental conditions is likely not large enough to impact bacterial communities
Ecological succession and viability of human-associated microbiota on restroom surfaces
Author Posting. © The Author(s), 2014. This is the author's version of the work. It is posted here by permission of American Society for Microbiology for personal use, not for redistribution. The definitive version was published in Applied and Environmental Microbiology (2014), doi:10.1128/AEM.03117-14.Human-associated bacteria dominate the built environment (BE). Following
decontamination of floors, toilet seats, and soap dispensers in 4 public restrooms, in situ
bacterial communities were characterized hourly, daily, and weekly to determine their
successional ecology. The viability of cultivable bacteria, following the removal of
dispersal agents (humans), was also assessed hourly. A late successional community
developed within 5-8 hours on restroom floors, and showed remarkable stability over
weeks to months. Despite late successional dominance by skin- and outdoor-associated
bacteria, the most ubiquitous organisms were predominantly gut-associated taxa, which
persisted following exclusion of humans. Staphylococcus represented the majority of the
cultivable community, even after several hours of human-exclusion. MRSA-associated
virulence genes were found on floors, but were not present in assembled Staphylococcus
pan-genomes. Viral abundances, which were predominantly enterophage, human
papilloma and herpes viruses, were significantly correlated with bacteria abundances, and
showed an unexpectedly low virus-to-bacteria ratio in surface-associated samples,
suggesting that bacterial hosts are mostly dormant on BE surfaces.S.M.G. was supported by an EPA STAR Graduate Fellowship and the National Institutes
of Health Training Grant 5T-32EB-009412. We acknowledge funding from the Alfred P
Sloan Foundation’s Microbiology of the Built Environment Program.2015-05-1
Citizen Science for Mining the Biomedical Literature
Biomedical literature represents one of the largest and fastest growing collections of unstructured biomedical knowledge. Finding critical information buried in the literature can be challenging. To extract information from free-flowing text, researchers need to: 1. identify the entities in the text (named entity recognition), 2. apply a standardized vocabulary to these entities (normalization), and 3. identify how entities in the text are related to one another (relationship extraction). Researchers have primarily approached these information extraction tasks through manual expert curation and computational methods. We have previously demonstrated that named entity recognition (NER) tasks can be crowdsourced to a group of non-experts via the paid microtask platform, Amazon Mechanical Turk (AMT), and can dramatically reduce the cost and increase the throughput of biocuration efforts. However, given the size of the biomedical literature, even information extraction via paid microtask platforms is not scalable. With our web-based application Mark2Cure (http://mark2cure.org), we demonstrate that NER tasks also can be performed by volunteer citizen scientists with high accuracy. We apply metrics from the Zooniverse Matrices of Citizen Science Success and provide the results here to serve as a basis of comparison for other citizen science projects. Further, we discuss design considerations, issues, and the application of analytics for successfully moving a crowdsourcing workflow from a paid microtask platform to a citizen science platform. To our knowledge, this study is the first application of citizen science to a natural language processing task
Recommended from our members
Geography and Location Are the Primary Drivers of Office Microbiome Composition.
In the United States, humans spend the majority of their time indoors, where they are exposed to the microbiome of the built environment (BE) they inhabit. Despite the ubiquity of microbes in BEs and their potential impacts on health and building materials, basic questions about the microbiology of these environments remain unanswered. We present a study on the impacts of geography, material type, human interaction, location in a room, seasonal variation, and indoor and microenvironmental parameters on bacterial communities in offices. Our data elucidate several important features of microbial communities in BEs. First, under normal office environmental conditions, bacterial communities do not differ on the basis of surface material (e.g., ceiling tile or carpet) but do differ on the basis of the location in a room (e.g., ceiling or floor), two features that are often conflated but that we are able to separate here. We suspect that previous work showing differences in bacterial composition with surface material was likely detecting differences based on different usage patterns. Next, we find that offices have city-specific bacterial communities, such that we can accurately predict which city an office microbiome sample is derived from, but office-specific bacterial communities are less apparent. This differs from previous work, which has suggested office-specific compositions of bacterial communities. We again suspect that the difference from prior work arises from different usage patterns. As has been previously shown, we observe that human skin contributes heavily to the composition of BE surfaces. IMPORTANCE Our study highlights several points that should impact the design of future studies of the microbiology of BEs. First, projects tracking changes in BE bacterial communities should focus sampling efforts on surveying different locations in offices and in different cities but not necessarily different materials or different offices in the same city. Next, disturbance due to repeated sampling, though detectable, is small compared to that due to other variables, opening up a range of longitudinal study designs in the BE. Next, studies requiring more samples than can be sequenced on a single sequencing run (which is increasingly common) must control for run effects by including some of the same samples in all of the sequencing runs as technical replicates. Finally, detailed tracking of indoor and material environment covariates is likely not essential for BE microbiome studies, as the normal range of indoor environmental conditions is likely not large enough to impact bacterial communities
Additional file 1: Figure S1. of ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses
Principal Coordinates comparing unsimulated (real) samples based on (a) unweighted UniFrac distances where trees are computed using ghost-tree, (b) weighted UniFrac distances where trees are computed using ghost-tree, (c) unweighted UniFrac distances where trees are computed using ghost-tree, 0-branch length-foundation, (d) weighted UniFrac distances where trees are computed using ghost-tree, 0-branch-length foundation, (e) unweighted UniFrac distances where trees are computed using ghost-tree, 0-branch-length extensions, (f) weighted UniFrac distances where trees are computed using ghost-tree, 0-branch-length extensions. Blue points are simulated and real human saliva samples, and red points are simulated and real restroom surface samples. Plots were made using EMPeror software [25]. (PDF 522Ă‚Â kb
QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science
Bolyen E, Rideout JR, Dillon MR, et al. QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science. PeerJ. 2018