3,471 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Enriching Unsupervised User Embedding via Medical Concepts

    Full text link
    Clinical notes in Electronic Health Records (EHR) present rich documented information of patients to inference phenotype for disease diagnosis and study patient characteristics for cohort selection. Unsupervised user embedding aims to encode patients into fixed-length vectors without human supervisions. Medical concepts extracted from the clinical notes contain rich connections between patients and their clinical categories. However, existing unsupervised approaches of user embeddings from clinical notes do not explicitly incorporate medical concepts. In this study, we propose a concept-aware unsupervised user embedding that jointly leverages text documents and medical concepts from two clinical corpora, MIMIC-III and Diabetes. We evaluate user embeddings on both extrinsic and intrinsic tasks, including phenotype classification, in-hospital mortality prediction, patient retrieval, and patient relatedness. Experiments on the two clinical corpora show our approach exceeds unsupervised baselines, and incorporating medical concepts can significantly improve the baseline performance.Comment: accepted at ACM CHIL 2022. a revision for section reforma

    Learning deep patient representations for the teleICU

    Get PDF
    This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 89-93).This thesis presents a method of extracting deep robust representations of teleICU clinical data using Transformer networks, inspired by recent machine learning literature in language modeling. The utility of these representations is evaluated in various prediction outcome tasks, in which they were able to outperform linear and neural baselines. Also examined are the probability distributions of various patient characteristics across the learned patient representation space; where corresponding high-level spatial structure suggests potential for use as a similarity metric or in combination with other patient similarity metrics. Finally, the code for the models developed is publicly provided as a starting point for further research.by Ini Oguntola.M. Eng.M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienc

    NaroNet: Discovery of tumor microenvironment elements from highly multiplexed images

    Get PDF
    Many efforts have been made to discover tumor-specific microenvironment elements (TMEs) from immunostained tissue sections. However, the identification of yet unknown but relevant TMEs from multiplex immunostained tissues remains a challenge, due to the number of markers involved (tens) and the complexity of their spatial interactions. We present NaroNet, which uses machine learning to identify and annotate known as well as novel TMEs from self-supervised embeddings of cells, organized at different levels (local cell phenotypes and cellular neighborhoods). Then it uses the abundance of TMEs to classify patients based on biological or clinical features. We validate NaroNet using synthetic patient cohorts with adjustable incidence of different TMEs and two cancer patient datasets. In both synthetic and real datasets, NaroNet unsupervisedly identifies novel TMEs, relevant for the user-defined classification task. As NaroNet requires only patient-level information, it renders state-of-the-art computational methods accessible to a broad audience, accelerating the discovery of biomarker signatures.Comment: 37 pages, 4 figure
    corecore