35 research outputs found

    Microbiome preterm birth DREAM challenge: Crowdsourcing machine learning approaches to advance preterm birth research

    Get PDF
    This research was carried out within the framework of the DREAM Community of Premature Births, of which UDC researchers Diego Fernández-Edreira and Carlos Fernández-Lozano, who have collaborated in the research, are members.Supplementary research data are available at https://www.cell.com/cms/10.1016/j.xcrm.2023.101350/attachment/e44bcada-f500-4f17-bc33-0ee5d39b3c4b/mmc1.pdf.[Abstract]: Every year, 11% of infants are born preterm with significant health consequences, with the vaginal microbiome a risk factor for preterm birth. We crowdsource models to predict (1) preterm birth (PTB; <37 weeks) or (2) early preterm birth (ePTB; <32 weeks) from 9 vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from public raw data via phylogenetic harmonization. The predictive models are validated on two independent unpublished datasets representing 331 samples from 148 pregnant individuals. The top-performing models (among 148 and 121 submissions from 318 teams) achieve area under the receiver operator characteristic (AUROC) curve scores of 0.69 and 0.87 predicting PTB and ePTB, respectively. Alpha diversity, VALENCIA community state types, and composition are important features in the top-performing models, most of which are tree-based methods. This work is a model for translation of microbiome data into clinically relevant predictive models and to better understand preterm birth.We thank members of the Sirota Lab, University of California, San Francisco, for useful discussion. This study was supported by the March of Dimes (J.L.G., T.T.O., A.R., A.S.T., V.C., C.W.Y.H., R.J.W., K.J.F., G.A., I.K., J.B., A.N., J.G., Z.W., P.N., A.K., I.B., E.K., S.J., S.N., Y.S.L., P.R.B., D.A.M., S.V.L., J.A., D.K.S., N.Aghaeepour, J.C.C., M.S.) and R35GM138353 (N.Aghaeepour), 1R01HL139844 (N.Aghaeepour), 3P30AG066515 (N.Aghaeepour), 1R61NS114926 (N.Aghaeepour), 1R01AG058417 (N.Aghaeepour), R01HD105256 (N.Aghaeepour, M.S.), P01HD106414 (N.Aghaeepour), R01GM140464 (J.G., Z.W., G.C., Z.-Z.T.), NSF DMS-2054346 (J.G., Z.W., G.C., Z.-Z.T.); the Burroughs Welcome Fund (N.Aghaeepour); the Alfred E. Mann Foundation (N.Aghaeepour); and the Robertson Foundation (N.Aghaeepour). A.P.-L. and P.D.-G. are receiving honoraria from the IVI Foundation.United States. National Institute of General Medical Sciences; R35GM138353United States. National Institutes of Health; 1R01HL139844United States. National Institutes of Health; 3P30AG066515United States. National Institutes of Health; 1R61NS114926United States. National Institute on Aging; 1R01AG058417United States. National Institute of Child Health and Human Development; R01HD105256United States. National Institute of Child Health and Human Development; P01HD106414United States. National Institutes of Health; R01GM140464United States. National Science Foundation; DMS-205434

    Evidence That Gene Activation and Silencing during Stem Cell Differentiation Requires a Transcriptionally Paused Intermediate State

    Get PDF
    A surprising portion of both mammalian and Drosophila genomes are transcriptionally paused, undergoing initiation without elongation. We tested the hypothesis that transcriptional pausing is an obligate transition state between definitive activation and silencing as human embryonic stem cells (hESCs) change state from pluripotency to mesoderm. Chromatin immunoprecipitation for trimethyl lysine 4 on histone H3 (ChIP-Chip) was used to analyze transcriptional initiation, and 3′ transcript arrays were used to determine transcript elongation. Pluripotent and mesodermal cells had equivalent fractions of the genome in active and paused transcriptional states (∼48% each), with ∼4% definitively silenced (neither initiation nor elongation). Differentiation to mesoderm changed the transcriptional state of 12% of the genome, with roughly equal numbers of genes moving toward activation or silencing. Interestingly, almost all loci (98–99%) changing transcriptional state do so either by entering or exiting the paused state. A majority of these transitions involve either loss of initiation, as genes specifying alternate lineages are archived, or gain of initiation, in anticipation of future full-length expression. The addition of chromatin dynamics permitted much earlier predictions of final cell fate compared to sole use of conventional transcript arrays. These findings indicate that the paused state may be the major transition state for genes changing expression during differentiation, and implicate control of transcriptional elongation as a key checkpoint in lineage specification

    Microbiome preterm birth DREAM challenge: Crowdsourcing machine learning approaches to advance preterm birth research

    Get PDF
    Every year, 11% of infants are born preterm with significant health consequences, with the vaginal microbiome a risk factor for preterm birth. We crowdsource models to predict (1) preterm birth (PTB; \u3c37 \u3eweeks) or (2) early preterm birth (ePTB; \u3c32 \u3eweeks) from 9 vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from public raw data via phylogenetic harmonization. The predictive models are validated on two independent unpublished datasets representing 331 samples from 148 pregnant individuals. The top-performing models (among 148 and 121 submissions from 318 teams) achieve area under the receiver operator characteristic (AUROC) curve scores of 0.69 and 0.87 predicting PTB and ePTB, respectively. Alpha diversity, VALENCIA community state types, and composition are important features in the top-performing models, most of which are tree-based methods. This work is a model for translation of microbiome data into clinically relevant predictive models and to better understand preterm birth

    Mechanisms of cell fate acquisition in the differentiation of pluripotent stem cells

    Full text link
    Mammalian development requires the creation of hundreds of cell types, each with distinct patterns of gene expression, all sharing the same genetic sequence. Epigenetic mechanisms, such as histone tail modification, are proposed to be critical for this process. The embryonic stem cell's pluripotency, the ability to differentiate into all cell types within the body, makes these cells an excellent model for studying epigenetic change during development. Understanding the histone tail modifications used to maintain and establish cell fate is critical for improving differentiation protocols, evaluating the potency of adult cells, and reprogramming adult cells to an embryonic stem cell-like state. Ultrastructurally, we noted that pluripotent human and mouse embryonic stem cells had scant cytoplasm and a large euchromatin-rich nucleus. Both acetylation and trimethylation of lysine 9 on histone H3 increased genome-wide upon the induction of differentiation. The brachyury T locus, a key transcription factor of mesendoderm, had bivalent chromatin modification in pluripotency, with trimethylation on lysine 4 and lysine 27 on histone H3. This bivalency resolved first to only lysine 4 trimethylation when the locus was actively producing transcript, and lysine 9 and lysine 27 trimethylation when the locus was silenced and never to be active again. In pluripotency or mesendoderm, approximately 20% of protein-coding loci are non-transcribed, 40% are transcriptionally initiating but not elongating and 40% are productively transcribing with initiation and elongation by RNA polymerase. Loci transitioning between transcriptional states are nearly exclusively (98-99% of loci changing state) transitioning to or from initiation without elongation, indicating that elongation and initiation are distinctly regulated steps during differentiation. Nuclear effectors of the canonical Wnt or TGF-beta signaling cascades are binding approximately 3000-6000 loci during this step of differentiation, with the co-binding of Groucho ( Wnt signaling) and coSmad4 (TGF-beta signaling) promoting the gain of initiation at a ten-times higher rate than the loss of initiation. Together these findings indicate a rich rote for transcriptional regulation, interactions with chromatin, extracellular signaling and a potential feed-forward mechanism are all involved in cell fate acquisition during differentiation of pluripotent cells

    DECARD: CC11 Dataset

    Full text link
    Data files for use with DECARD, as described in: Golob JL, Margolis E, Hoffman NG, Fredricks DN. Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities. BMC Bioinformatics. 2017 May 30;18(1):283. PMCID: PMC5450146 Contents: - 16S_SSU.tar.bz2 Filtered full-length repository reads culled from NCBI 16S microbial bioproject and Silva. These are the templates to use when generating a community - CC11.tar.bz2 Reads for the CC11 family of 100 synthetic communities use for our BMC Bioinformatics publication. 454 is for reads for simulated 454 pyrosequencing, amplified with the HMP 454 primers. illumina is for miseq-like paired end reads, amplified with the EMP primers. For each there are no error and reads with simulated error and a map file (specifying the community for each sequence ID and the "true" source sequence for each read. - CC11_targets.csv A target file, to be used with DECARD to recreate the CC11 communities, using the culled reads in 16S_SSU.tar.bz2 and primer sequences of your choosing. This can be used to test different primer sets. - CC11_targets.tre The "true" phylogenetic tree for the full-length 16S sequences used in all of the CC11 communities, in Newick format and suitable for packages like Phyloseq to calculate "true" DPCoA or (weighted) UniFrac pairwise distances between the communitie

    In silico benchmarking of metagenomic tools for coding sequence detection reveals the limits of sensitivity and precision

    Full text link
    Abstract Background High-throughput sequencing can establish the functional capacity of a microbial community by cataloging the protein-coding sequences (CDS) present in the metagenome of the community. The relative performance of different computational methods for identifying CDS from whole-genome shotgun sequencing is not fully established. Results Here we present an automated benchmarking workflow, using synthetic shotgun sequencing reads for which we know the true CDS content of the underlying communities, to determine the relative performance (sensitivity, positive predictive value or PPV, and computational efficiency) of different metagenome analysis tools for extracting the CDS content of a microbial community. Assembly-based methods are limited by coverage depth, with poor sensitivity for CDS at < 5X depth of sequencing, but have excellent PPV. Mapping-based techniques are more sensitive at low coverage depths, but can struggle with PPV. We additionally describe an expectation maximization based iterative algorithmic approach which we show to successfully improve the PPV of a mapping based technique while retaining improved sensitivity and computational efficiency. Conclusion Our benchmarking approach reveals the trade-offs of assembly versus alignment-based approaches and the relative performance of specific implementations when one wishes to extract the protein coding capacity of microbial communities.http://deepblue.lib.umich.edu/bitstream/2027.42/173432/1/12859_2020_Article_3802.pd

    SARS-CoV-2 vaccines: a triumph of science and collaboration

    Full text link
    Roughly 1 year after the first case of COVID-19 was identified and less than 1 year after the sequencing of SARS-CoV-2, multiple SARS-CoV-2 vaccines with demonstrated safety and efficacy in phase III clinical trials are available. The most promising vaccines have targeted the surface glycoprotein (S-protein) of SARS-CoV-2 and achieved an approximate 85%–95% reduction in the risk of symptomatic COVID-19, while retaining excellent safety profiles and modest side effects in the phase III clinical trials. The mRNA, replication-incompetent viral vector, and protein subunit vaccine technologies have all been successfully employed. Some novel SARS-CoV-2 variants evade but do not appear to fully overcome the potent immunity induced by these vaccines. Emerging real-world effectiveness data add evidence for protection from severe COVID-19. This is an impressive first demonstration of the effectiveness of the mRNA vaccine and vector vaccine platforms. The success of SARS-CoV-2 vaccine development should be credited to open science, industry partnerships, harmonization of clinical trials, and the altruism of study participants. The manufacturing and distribution of the emergency use–authorized SARS-CoV-2 vaccines are ongoing challenges. What remains now is to ensure broad and equitable global vaccination against COVID-19

    Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities

    Full text link
    Abstract Background Microbiome studies commonly use 16S rRNA gene amplicon sequencing to characterize microbial communities. Errors introduced at multiple steps in this process can affect the interpretation of the data. Here we evaluate the accuracy of operational taxonomic unit (OTU) generation, taxonomic classification, alpha- and beta-diversity measures for different settings in QIIME, MOTHUR and a pplacer-based classification pipeline, using a novel software package: DECARD. Results In-silico we generated 100 synthetic bacterial communities approximating human stool microbiomes to be used as a gold-standard for evaluating the colligative performance of microbiome analysis software. Our synthetic data closely matched the composition and complexity of actual healthy human stool microbiomes. Genus-level taxonomic classification was correctly done for only 50.4–74.8% of the source organisms. Miscall rates varied from 11.9 to 23.5%. Species-level classification was less successful, (6.9–18.9% correct); miscall rates were comparable to those of genus-level targets (12.5–26.2%). The degree of miscall varied by clade of organism, pipeline and specific settings used. OTU generation accuracy varied by strategy (closed, de novo or subsampling), reference database, algorithm and software implementation. Shannon diversity estimation accuracy correlated generally with OTU-generation accuracy. Beta-diversity estimates with Double Principle Coordinate Analysis (DPCoA) were more robust against errors introduced in processing than Weighted UniFrac. The settings suggested in the tutorials were among the worst performing in all outcomes tested. Conclusions Even when using the same classification pipeline, the specific OTU-generation strategy, reference database and downstream analysis methods selection can have a dramatic effect on the accuracy of taxonomic classification, and alpha- and beta-diversity estimation. Even minor changes in settings adversely affected the accuracy of the results, bringing them far from the best-observed result. Thus, specific details of how a pipeline is used (including OTU generation strategy, reference sets, clustering algorithm and specific software implementation) should be specified in the methods section of all microbiome studies. Researchers should evaluate their chosen pipeline and settings to confirm it can adequately answer the research question rather than assuming the tutorial or standard-operating-procedure settings will be adequate or optimal

    geneshot: gene-level metagenomics identifies genome islands associated with immunotherapy response

    Full text link
    Abstract Researchers must be able to generate experimentally testable hypotheses from sequencing-based observational microbiome experiments to discover the mechanisms underlying the influence of gut microbes on human health. We describe geneshot, a novel bioinformatics tool for identifying testable hypotheses based on gene-level metagenomic analysis of WGS microbiome data. By applying geneshot to two independent previously published cohorts, we identify microbial genomic islands consistently associated with response to immune checkpoint inhibitor (ICI)-based cancer treatment in culturable type strains. The identified genomic islands are within operons involved in type II secretion, TonB-dependent transport, and bacteriophage growth.http://deepblue.lib.umich.edu/bitstream/2027.42/173864/1/13059_2021_Article_2355.pd
    corecore