Search CORE

1,113 research outputs found

Application of Clinical Concept Embeddings for Heart Failure Prediction in UK EHR data

Author: Denaxas Spiros
Dobson Richard
Hemingway Harry
Pikoula Maria
Riedel Sebastian
Stenetorp Pontus
Publication venue
Publication date: 28/11/2018
Field of study

Electronic health records (EHR) are increasingly being used for constructing disease risk prediction models. Feature engineering in EHR data however is challenging due to their highly dimensional and heterogeneous nature. Low-dimensional representations of EHR data can potentially mitigate these challenges. In this paper, we use global vectors (GloVe) to learn word embeddings for diagnoses and procedures recorded using 13 million ontology terms across 2.7 million hospitalisations in national UK EHR. We demonstrate the utility of these embeddings by evaluating their performance in identifying patients which are at higher risk of being hospitalised for congestive heart failure. Our findings indicate that embeddings can enable the creation of robust EHR-derived disease risk prediction models and address some the limitations associated with manual clinical feature engineering.Comment: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.0721

arXiv.org e-Print Archive

UCL Discovery

BioConceptVec: creating and evaluating literature-based biomedical concept embeddings on a large scale

Author: Chen Qingyu
Kim Sun
Lee Kyubum
Lu Zhiyong
Wei Chih-Hsuan
Yan Shankai
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/12/2019
Field of study

Capturing the semantics of related biological concepts, such as genes and mutations, is of significant importance to many research tasks in computational biology such as protein-protein interaction detection, gene-drug association prediction, and biomedical literature-based discovery. Here, we propose to leverage state-of-the-art text mining tools and machine learning models to learn the semantics via vector representations (aka. embeddings) of over 400,000 biological concepts mentioned in the entire PubMed abstracts. Our learned embeddings, namely BioConceptVec, can capture related concepts based on their surrounding contextual information in the literature, which is beyond exact term match or co-occurrence-based methods. BioConceptVec has been thoroughly evaluated in multiple bioinformatics tasks consisting of over 25 million instances from nine different biological datasets. The evaluation results demonstrate that BioConceptVec has better performance than existing methods in all tasks. Finally, BioConceptVec is made freely available to the research community and general public via https://github.com/ncbi-nlp/BioConceptVec.Comment: 33 pages, 6 figures, 7 tables, accepted by PLOS Computational Biolog

arXiv.org e-Print Archive

Directory of Open Access Journals