7 research outputs found

    Microbiome preterm birth DREAM challenge: Crowdsourcing machine learning approaches to advance preterm birth research

    Get PDF
    Every year, 11% of infants are born preterm with significant health consequences, with the vaginal microbiome a risk factor for preterm birth. We crowdsource models to predict (1) preterm birth (PTB; \u3c37 \u3eweeks) or (2) early preterm birth (ePTB; \u3c32 \u3eweeks) from 9 vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from public raw data via phylogenetic harmonization. The predictive models are validated on two independent unpublished datasets representing 331 samples from 148 pregnant individuals. The top-performing models (among 148 and 121 submissions from 318 teams) achieve area under the receiver operator characteristic (AUROC) curve scores of 0.69 and 0.87 predicting PTB and ePTB, respectively. Alpha diversity, VALENCIA community state types, and composition are important features in the top-performing models, most of which are tree-based methods. This work is a model for translation of microbiome data into clinically relevant predictive models and to better understand preterm birth

    Crowdsourcing rare cancer research in the Hack4NF GENIE-NF tumor identification and classification challenge.pptx

    No full text
    Presented at the 2023 AACR meeting minisymposium: "Advancing Cancer Research Through an International Cancer Registry: AACR Project GENIE Use Cases" A key challenge in rare tumor research is the paucity of genomic data that can be used to understand and devise better therapeutic strategies for rare cancers. Furthermore, the “curse of dimensionality,” in which data has many features, such as genetic variants, but few specimens, makes it difficult or impossible to use conventional machine learning techniques to explore these data. To address these challenges in the context of tumors associated with the rare disease neurofibromatosis, we ran a hackathon to stimulate the development of methods to better understand the biology of tumors related to this disease. The hackathon had three challenges centered around variant effect prediction, drug discovery, and genomics. The genomics track of the hackathon leveraged the AACR Project GENIE database (1) and challenged participants to develop new frameworks that accurately use GENIE data to classify neurofibromatosis-related tumors. They were asked to first identify the neurofibromatosis-related tumors in the dataset. They were then asked to use one or more novel classification methods to classify the tumor samples into different groups based on genetic features. To help them do this, we provided access to version 13 of the GENIE database to the hackathon participants, though they were allowed to integrate other relevant datasets. The expected output was a classification method that differentiates different types of NF1, NF2, and schwannomatosis-related tumors using clinical sequencing data, as well as a list of the most important features in the algorithm for differentiating tumor types. Domain expert judges qualitatively scored each team’s rationale for defining and including “NF-related tumors” in their project, and scored the feature list based on the presence of known important biomarkers and features in NF tumors as well as potentially novel features that the algorithm identified. A technical judge also scored the code repository based on documentation and clarity of code. Two teams from the GENIE subchallenge were awarded prizes - Team Next GeNLP as the best overall GENIE challenge submission, and team “Artificial Intelligence for neurofibromatosis” for best project documentation. Both winning teams used methods based on natural language processing (NLP) techniques to reduce the dimensionality and complexity of the variant data, and to identify new representations of NF-relevant tumors, and then applied downstream analysis methods such as distance calculations and feature prioritization to better understand the genomic profiles of different tumors. While these methods and tools focused on NF-specific tumor types, we anticipate that they could be re-used by others to better explore the biology and interrelatedness of other rare tumors within the GENIE database.</p

    Microbiome preterm birth DREAM challenge: Crowdsourcing machine learning approaches to advance preterm birth research

    No full text
    Every year, 11% of infants are born preterm with significant health consequences, with the vaginal microbiome a risk factor for preterm birth. We crowdsource models to predict (1) preterm birth (PTB; &lt;37&nbsp;weeks) or (2) early preterm birth (ePTB; &lt;32&nbsp;weeks) from 9 vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from public raw data via phylogenetic harmonization. The predictive models are validated on two independent unpublished datasets representing 331 samples from 148 pregnant individuals. The top-performing models (among 148 and 121 submissions from 318 teams) achieve area under the receiver operator characteristic (AUROC) curve scores of 0.69 and 0.87 predicting PTB and ePTB, respectively. Alpha diversity, VALENCIA community state types, and composition are important features in the top-performing models, most of which are tree-based methods. This work is a model for translation of microbiome data into clinically relevant predictive models and to better understand preterm birth

    The Human Tumor Atlas Network: Charting Tumor Transitions across Space and Time at Single-Cell Resolution

    No full text