27 research outputs found

    Microbiome preterm birth DREAM challenge: Crowdsourcing machine learning approaches to advance preterm birth research

    Get PDF
    Every year, 11% of infants are born preterm with significant health consequences, with the vaginal microbiome a risk factor for preterm birth. We crowdsource models to predict (1) preterm birth (PTB; \u3c37 \u3eweeks) or (2) early preterm birth (ePTB; \u3c32 \u3eweeks) from 9 vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from public raw data via phylogenetic harmonization. The predictive models are validated on two independent unpublished datasets representing 331 samples from 148 pregnant individuals. The top-performing models (among 148 and 121 submissions from 318 teams) achieve area under the receiver operator characteristic (AUROC) curve scores of 0.69 and 0.87 predicting PTB and ePTB, respectively. Alpha diversity, VALENCIA community state types, and composition are important features in the top-performing models, most of which are tree-based methods. This work is a model for translation of microbiome data into clinically relevant predictive models and to better understand preterm birth

    Validation of high throughput sequencing and microbial forensics applications

    Get PDF
    High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security.Peer reviewe

    In silico benchmarking of metagenomic tools for coding sequence detection reveals the limits of sensitivity and precision

    Full text link
    Abstract Background High-throughput sequencing can establish the functional capacity of a microbial community by cataloging the protein-coding sequences (CDS) present in the metagenome of the community. The relative performance of different computational methods for identifying CDS from whole-genome shotgun sequencing is not fully established. Results Here we present an automated benchmarking workflow, using synthetic shotgun sequencing reads for which we know the true CDS content of the underlying communities, to determine the relative performance (sensitivity, positive predictive value or PPV, and computational efficiency) of different metagenome analysis tools for extracting the CDS content of a microbial community. Assembly-based methods are limited by coverage depth, with poor sensitivity for CDS at < 5X depth of sequencing, but have excellent PPV. Mapping-based techniques are more sensitive at low coverage depths, but can struggle with PPV. We additionally describe an expectation maximization based iterative algorithmic approach which we show to successfully improve the PPV of a mapping based technique while retaining improved sensitivity and computational efficiency. Conclusion Our benchmarking approach reveals the trade-offs of assembly versus alignment-based approaches and the relative performance of specific implementations when one wishes to extract the protein coding capacity of microbial communities.http://deepblue.lib.umich.edu/bitstream/2027.42/173432/1/12859_2020_Article_3802.pd

    geneshot: gene-level metagenomics identifies genome islands associated with immunotherapy response

    Full text link
    Abstract Researchers must be able to generate experimentally testable hypotheses from sequencing-based observational microbiome experiments to discover the mechanisms underlying the influence of gut microbes on human health. We describe geneshot, a novel bioinformatics tool for identifying testable hypotheses based on gene-level metagenomic analysis of WGS microbiome data. By applying geneshot to two independent previously published cohorts, we identify microbial genomic islands consistently associated with response to immune checkpoint inhibitor (ICI)-based cancer treatment in culturable type strains. The identified genomic islands are within operons involved in type II secretion, TonB-dependent transport, and bacteriophage growth.http://deepblue.lib.umich.edu/bitstream/2027.42/173864/1/13059_2021_Article_2355.pd

    Evolutionary and functional implications of hypervariable loci within the skin virome

    No full text
    Localized genomic variability is crucial for the ongoing conflicts between infectious microbes and their hosts. An understanding of evolutionary and adaptive patterns associated with genomic variability will help guide development of vaccines and antimicrobial agents. While most analyses of the human microbiome have focused on taxonomic classification and gene annotation, we investigated genomic variation of skin-associated viral communities. We evaluated patterns of viral genomic variation across 16 healthy human volunteers. Human papillomavirus (HPV) and Staphylococcus phages contained 106 and 465 regions of diversification, or hypervariable loci, respectively. Propionibacterium phage genomes were minimally divergent and contained no hypervariable loci. Genes containing hypervariable loci were involved in functions including host tropism and immune evasion. HPV and Staphylococcus phage hypervariable loci were associated with purifying selection. Amino acid substitution patterns were virus dependent, as were predictions of their phenotypic effects. We identified diversity generating retroelements as one likely mechanism driving hypervariability. We validated these findings in an independently collected skin metagenomic sequence dataset, suggesting that these features of skin virome genomic variability are widespread. Our results highlight the genomic variation landscape of the skin virome and provide a foundation for better understanding community viral evolution and the functional implications of genomic diversification of skin viruses

    Admixture and recombination among Toxoplasma gondii lineages explain global genome diversity

    No full text
    Toxoplasma gondii is a highly successful protozoan parasite that infects all warm-blooded animals and causes severe disease in immunocompromised and immune-naĂŻve humans. It has an unusual global population structure: In North America and Europe, isolated strains fall predominantly into four largely clonal lineages, but in South America there is great genetic diversity and the North American clonal lineages are rarely found. Genetic variation between Toxoplasma strains determines differences in virulence, modulation of host-signaling pathways, growth, dissemination, and disease severity in mice and likely in humans. Most studies on Toxoplasma genetic variation have focused on either a few loci in many strains or low-resolution genome analysis of three clonal lineages. We use whole-genome sequencing to identify a large number of SNPs between 10 Toxoplasma strains from Europe and North and South America. These were used to identify haplotype blocks (genomic regions) shared between strains and construct a Toxoplasma haplotype map. Additional SNP analysis of RNA-sequencing data of 26 Toxoplasma strains, representing global diversity, allowed us to construct a comprehensive genealogy for Toxoplasma gondii that incorporates sexual recombination. These data show that most current isolates are recent recombinants and cannot be easily grouped into a limited number of haplogroups. A complex picture emerges in which some genomic regions have not been recently exchanged between any strains, and others recently spread from one strain to many others.Massachusetts Life Sciences Center (New Investigator award)Pew Charitable Trusts (Pew Scholar Program in the Biomedical Sciences)National Institutes of Health (U.S.) (grant R01-AI08062)Knights Templar Eye FoundationNational Institute of General Medical Sciences (U.S.) (Biological Sciences 5-T32-GM007287-33)National Institute of General Medical Sciences (U.S.) (Training Grant T32AI060516

    Genomic Stability and Genetic Defense Systems in Dolosigranulum pigrum, a Candidate Beneficial Bacterium from the Human Microbiome

    Get PDF
    Dolosigranulum pigrum is positively associated with indicators of health in multiple epidemiological studies of human nasal microbiota. Knowledge of the basic biology of D. pigrum is a prerequisite for evaluating its potential for future therapeutic use; however, such data are very limited. To gain insight into D. pigrum's chromosomal structure, pangenome, and genomic stability, we compared the genomes of 28 D. pigrum strains that were collected across 20 years. Phylogenomic analysis showed closely related strains circulating over this period and closure of 19 genomes revealed highly conserved chromosomal synteny. Gene clusters involved in the mobilome and in defense against mobile genetic elements (MGEs) were enriched in the accessory genome versus the core genome. A systematic analysis for MGEs identified the first candidate D. pigrum prophage and insertion sequence. A systematic analysis for genetic elements that limit the spread of MGEs, including restriction modification (RM), CRISPR-Cas, and deity-named defense systems, revealed strain-level diversity in host defense systems that localized to specific genomic sites, including one RM system hot spot. Analysis of CRISPR spacers pointed to a wealth of MGEs against which D. pigrum defends itself. These results reveal a role for horizontal gene transfer and mobile genetic elements in strain diversification while highlighting that in D. pigrum this occurs within the context of a highly stable chromosomal organization protected by a variety of defense mechanisms. IMPORTANCE Dolosigranulum pigrum is a candidate beneficial bacterium with potential for future therapeutic use. This is based on its positive associations with characteristics of health in multiple studies of human nasal microbiota across the span of human life. For example, high levels of D. pigrum nasal colonization in adults predicts the absence of Staphylococcus aureus nasal colonization. Also, D. pigrum nasal colonization in young children is associated with healthy control groups in studies of middle ear infections. Our analysis of 28 genomes revealed a remarkable stability of D. pigrum strains colonizing people in the United States across a 20-year span. We subsequently identified factors that can influence this stability, including genomic stability, phage predators, the role of MGEs in strain-level variation, and defenses against MGEs. Finally, these D. pigrum strains also lacked predicted virulence factors. Overall, these findings add additional support to the potential for D. pigrum as a therapeutic bacterium

    Global divergence of the human follicle mite <i>Demodex folliculorum</i>:persistent associations between host ancestry and mite lineages

    Get PDF
    Microscopic mites of the genus Demodex live within the hair follicles of mammals and are ubiquitous symbionts of humans, but little molecular work has been done to understand their genetic diversity or transmission. Here we sampled mite DNA from 70 human hosts of diverse geographic ancestries and analyzed 241 sequences from the mitochondrial genome of the species Demodex folliculorum. Phylogenetic analyses recovered multiple deep lineages including a globally distributed lineage common among hosts of European ancestry and three lineages that primarily include hosts of Asian, African, and Latin American ancestry. To a great extent, the ancestral geography of hosts predicted the lineages of mites found on them; 27% of the total molecular variance segregated according to the regional ancestries of hosts. We found that D. folliculorum populations are stable on an individual over the course of years and that some Asian and African American hosts maintain specific mite lineages over the course of years or generations outside their geographic region of birth or ancestry. D. folliculorum haplotypes were much more likely to be shared within families and between spouses than between unrelated individuals, indicating that transmission requires close contact. Dating analyses indicated that D. folliculorum origins may predate modern humans. Overall, D. folliculorum evolution reflects ancient human population divergences, is consistent with an out-of-Africa dispersal hypothesis, and presents an excellent model system for further understanding the history of human movement
    corecore