3,471 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Enriching Unsupervised User Embedding via Medical Concepts
Clinical notes in Electronic Health Records (EHR) present rich documented
information of patients to inference phenotype for disease diagnosis and study
patient characteristics for cohort selection. Unsupervised user embedding aims
to encode patients into fixed-length vectors without human supervisions.
Medical concepts extracted from the clinical notes contain rich connections
between patients and their clinical categories. However, existing unsupervised
approaches of user embeddings from clinical notes do not explicitly incorporate
medical concepts. In this study, we propose a concept-aware unsupervised user
embedding that jointly leverages text documents and medical concepts from two
clinical corpora, MIMIC-III and Diabetes. We evaluate user embeddings on both
extrinsic and intrinsic tasks, including phenotype classification, in-hospital
mortality prediction, patient retrieval, and patient relatedness. Experiments
on the two clinical corpora show our approach exceeds unsupervised baselines,
and incorporating medical concepts can significantly improve the baseline
performance.Comment: accepted at ACM CHIL 2022. a revision for section reforma
Learning deep patient representations for the teleICU
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 89-93).This thesis presents a method of extracting deep robust representations of teleICU clinical data using Transformer networks, inspired by recent machine learning literature in language modeling. The utility of these representations is evaluated in various prediction outcome tasks, in which they were able to outperform linear and neural baselines. Also examined are the probability distributions of various patient characteristics across the learned patient representation space; where corresponding high-level spatial structure suggests potential for use as a similarity metric or in combination with other patient similarity metrics. Finally, the code for the models developed is publicly provided as a starting point for further research.by Ini Oguntola.M. Eng.M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienc
NaroNet: Discovery of tumor microenvironment elements from highly multiplexed images
Many efforts have been made to discover tumor-specific microenvironment
elements (TMEs) from immunostained tissue sections. However, the identification
of yet unknown but relevant TMEs from multiplex immunostained tissues remains a
challenge, due to the number of markers involved (tens) and the complexity of
their spatial interactions. We present NaroNet, which uses machine learning to
identify and annotate known as well as novel TMEs from self-supervised
embeddings of cells, organized at different levels (local cell phenotypes and
cellular neighborhoods). Then it uses the abundance of TMEs to classify
patients based on biological or clinical features. We validate NaroNet using
synthetic patient cohorts with adjustable incidence of different TMEs and two
cancer patient datasets. In both synthetic and real datasets, NaroNet
unsupervisedly identifies novel TMEs, relevant for the user-defined
classification task. As NaroNet requires only patient-level information, it
renders state-of-the-art computational methods accessible to a broad audience,
accelerating the discovery of biomarker signatures.Comment: 37 pages, 4 figure
- …