5 research outputs found
KFU at CLEF eHealth 2017 Task 1: ICD-10 coding of English death certificates with recurrent neural networks
This paper describes the participation of the KFU team in the CLEF eHealth 2017 challenge. Specifically, we participated in Task 1, namely "Multilingual Information Extraction - ICD-10 coding" for which we implemented recurrent neural networks to automatically assign ICD-10 codes to fragments of death certificates written in English. Our system uses Long Short-Term Memory (LSTM) to map the input sequence into a vector representation, and then another LSTM to decode the target sequence from the vector. We initialize the input representations with word embeddings trained on user posts in social media. The encoderdecoder model obtained F-measure of 85.01% on a full test set with significant improvement as compared to the average score of 62.2% for all participants' approaches. We also obtained significant improvement from 26.1% to 44.33% on an external test set as compared to the average score of the submitted runs
Deep learning for ICD coding: Looking for medical concepts in clinical documents in english and in French
© Springer Nature Switzerland AG 2018. Medical Concept Coding (MCD) is a crucial task in biomedical information extraction. Recent advances in neural network modeling have demonstrated its usefulness in the task of natural language processing. Modern framework of sequence-to-sequence learning that was initially used for recurrent neural networks has been shown to provide powerful solution to tasks such as Named Entity Recognition or Medical Concept Coding. We have addressed the identification of clinical concepts within the International Classification of Diseases version 10 (ICD-10) in two benchmark data sets of death certificates provided for the task 1 in the CLEF eHealth shared task 2017. A proposed architecture combines ideas from recurrent neural networks and traditional text retrieval term weighting schemes. We found that our models reach accuracy of 75% and 86% as evaluated by the F-measure on the CépiDc corpus of French texts and on the CDC corpus of English texts, respectfully. The proposed models can be employed for coding electronic medical records with ICD codes including diagnosis and procedure codes
Deep Neural Models for Medical Concept Normalization in User-Generated Texts
In this work, we consider the medical concept normalization problem, i.e.,
the problem of mapping a health-related entity mention in a free-form text to a
concept in a controlled vocabulary, usually to the standard thesaurus in the
Unified Medical Language System (UMLS). This is a challenging task since
medical terminology is very different when coming from health care
professionals or from the general public in the form of social media texts. We
approach it as a sequence learning problem with powerful neural networks such
as recurrent neural networks and contextualized word representation models
trained to obtain semantic representations of social media expressions. Our
experimental evaluation over three different benchmarks shows that neural
architectures leverage the semantic meaning of the entity mention and
significantly outperform an existing state of the art models.Comment: This is preprint of the paper "Deep Neural Models for Medical Concept
Normalization in User-Generated Texts" to be published at ACL 2019 - 57th
Annual Meeting of the Association for Computational Linguistics, Proceedings
of the Student Research Worksho
BiTeM at CLEF eHealth Evaluation Lab 2016 Task 2 ::Multilingual Information Extraction
BiTeM/SIB Text Mining (http://bitem.hesge.ch/) is a University re-search group carrying over activities in semantic and text analytics applied to health and life sciences. This paper reports on the participation of our team at the CLEF eHealth 2016 evaluation lab. The processing applied to each evaluation corpus (QUAREO and CépiDC) was originally very similar. Our method is based on an Au-tomatic Text Categorization (ATC) system. First, the system is set with a specific input ontology (French UMLS), and ATC assigns a rank list of related concepts to each document received in input. Then, a second module relocates all of the positive matches in the text, and normalizes the extracted entities. For the CépiDC corpus, the system was loaded with the Swiss ICD-10 GM thesaurus. However a late minute data transformation issue forced us to implement an ad hoc solution based on simple pat-tern matching to comply with the constraints of the CépiDC challenge. We obtained an average precision of 62% on the QUAREO entity extraction (over MEDLINE/EMEA texts, and exact/inexact), 48% on normalizing this entities, and 59% on the CépiDC subtask. Enhancing the recall by expanding the coverage of the terminologies could be an interesting approach to improve this system at moderate labour costs