1 research outputs found
SECNLP: A Survey of Embeddings in Clinical Natural Language Processing
Traditional representations like Bag of words are high dimensional, sparse
and ignore the order as well as syntactic and semantic information. Distributed
vector representations or embeddings map variable length text to dense fixed
length vectors as well as capture the prior knowledge which can transferred to
downstream tasks. Even though embedding has become de facto standard for
representations in deep learning based NLP tasks in both general and clinical
domains, there is no survey paper which presents a detailed review of
embeddings in Clinical Natural Language Processing. In this survey paper, we
discuss various medical corpora and their characteristics, medical codes and
present a brief overview as well as comparison of popular embeddings models. We
classify clinical embeddings into nine types and discuss each embedding type in
detail. We discuss various evaluation methods followed by possible solutions to
various challenges in clinical embeddings. Finally, we conclude with some of
the future directions which will advance the research in clinical embeddings.Comment: Published in Journal of Biomedical Informatics (For updated version,
refer 10.1016/j.jbi.2019.103323