1,113 research outputs found
Application of Clinical Concept Embeddings for Heart Failure Prediction in UK EHR data
Electronic health records (EHR) are increasingly being used for constructing
disease risk prediction models. Feature engineering in EHR data however is
challenging due to their highly dimensional and heterogeneous nature.
Low-dimensional representations of EHR data can potentially mitigate these
challenges. In this paper, we use global vectors (GloVe) to learn word
embeddings for diagnoses and procedures recorded using 13 million ontology
terms across 2.7 million hospitalisations in national UK EHR. We demonstrate
the utility of these embeddings by evaluating their performance in identifying
patients which are at higher risk of being hospitalised for congestive heart
failure. Our findings indicate that embeddings can enable the creation of
robust EHR-derived disease risk prediction models and address some the
limitations associated with manual clinical feature engineering.Comment: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018
arXiv:1811.0721
BioConceptVec: creating and evaluating literature-based biomedical concept embeddings on a large scale
Capturing the semantics of related biological concepts, such as genes and
mutations, is of significant importance to many research tasks in computational
biology such as protein-protein interaction detection, gene-drug association
prediction, and biomedical literature-based discovery. Here, we propose to
leverage state-of-the-art text mining tools and machine learning models to
learn the semantics via vector representations (aka. embeddings) of over
400,000 biological concepts mentioned in the entire PubMed abstracts. Our
learned embeddings, namely BioConceptVec, can capture related concepts based on
their surrounding contextual information in the literature, which is beyond
exact term match or co-occurrence-based methods. BioConceptVec has been
thoroughly evaluated in multiple bioinformatics tasks consisting of over 25
million instances from nine different biological datasets. The evaluation
results demonstrate that BioConceptVec has better performance than existing
methods in all tasks. Finally, BioConceptVec is made freely available to the
research community and general public via
https://github.com/ncbi-nlp/BioConceptVec.Comment: 33 pages, 6 figures, 7 tables, accepted by PLOS Computational Biolog
- …