85 research outputs found

    Extracting information from radiology reports by Natural Language Processing and Deep Learning

    Get PDF
    This work was supported by the NLP4RARE-CM-UC3M, which was developed under the Interdisciplinary Projects Program for Young Researchers at University Carlos III of Madrid. The work was also supported by the Multiannual Agreement with UC3M in the line of Excellence of University Professors (EPUC3M17), and in the context of the V PRICIT (Regional Programme of Research and Technological Innovation)

    Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records

    Get PDF
    Unstructured information in electronic health records provide an invaluable resource for medical research. To protect the confidentiality of patients and to conform to privacy regulations, de-identification methods automatically remove personally identifying information from these medical records. However, due to the unavailability of labeled data, most existing research is constrained to English medical text and little is known about the generalizability of de-identification methods across languages and domains. In this study, we construct a varied dataset consisting of the medical records of 1260 patients by sampling data from 9 institutes and three domains of Dutch healthcare. We test the generalizability of three de-identification methods across languages and domains. Our experiments show that an existing rule-based method specifically developed for the Dutch language fails to generalize to this new data. Furthermore, a state-of-the-art neural architecture performs strongly across languages and domains, even with limited training data. Compared to feature-based and rule-based methods the neural method requires significantly less configuration effort and domain-knowledge. We make all code and pre-trained de-identification models available to the research community, allowing practitioners to apply them to their datasets and to enable future benchmarks.Comment: Proceedings of the 1st ACM WSDM Health Search and Data Mining Workshop (HSDM2020), 202

    Extraction of Information Related to Adverse Drug Events from Electronic Health Record Notes: Design of an End-to-End Model Based on Deep Learning

    Get PDF
    BACKGROUND: Pharmacovigilance and drug-safety surveillance are crucial for monitoring adverse drug events (ADEs), but the main ADE-reporting systems such as Food and Drug Administration Adverse Event Reporting System face challenges such as underreporting. Therefore, as complementary surveillance, data on ADEs are extracted from electronic health record (EHR) notes via natural language processing (NLP). As NLP develops, many up-to-date machine-learning techniques are introduced in this field, such as deep learning and multi-task learning (MTL). However, only a few studies have focused on employing such techniques to extract ADEs. OBJECTIVE: We aimed to design a deep learning model for extracting ADEs and related information such as medications and indications. Since extraction of ADE-related information includes two steps-named entity recognition and relation extraction-our second objective was to improve the deep learning model using multi-task learning between the two steps. METHODS: We employed the dataset from the Medication, Indication and Adverse Drug Events (MADE) 1.0 challenge to train and test our models. This dataset consists of 1089 EHR notes of cancer patients and includes 9 entity types such as Medication, Indication, and ADE and 7 types of relations between these entities. To extract information from the dataset, we proposed a deep-learning model that uses a bidirectional long short-term memory (BiLSTM) conditional random field network to recognize entities and a BiLSTM-Attention network to extract relations. To further improve the deep-learning model, we employed three typical MTL methods, namely, hard parameter sharing, parameter regularization, and task relation learning, to build three MTL models, called HardMTL, RegMTL, and LearnMTL, respectively. RESULTS: Since extraction of ADE-related information is a two-step task, the result of the second step (ie, relation extraction) was used to compare all models. We used microaveraged precision, recall, and F1 as evaluation metrics. Our deep learning model achieved state-of-the-art results (F1=65.9%), which is significantly higher than that (F1=61.7%) of the best system in the MADE1.0 challenge. HardMTL further improved the F1 by 0.8%, boosting the F1 to 66.7%, whereas RegMTL and LearnMTL failed to boost the performance. CONCLUSIONS: Deep learning models can significantly improve the performance of ADE-related information extraction. MTL may be effective for named entity recognition and relation extraction, but it depends on the methods, data, and other factors. Our results can facilitate research on ADE detection, NLP, and machine learning

    Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records

    Get PDF
    Unstructured information in electronic health records provide an invaluable resource for medical research. To protect the confidentiality of patients and to conform to privacy regulations, de-identification methods automatically remove personally identifying information from these medical records. However, due to the unavailability of labeled data, most existing research is constrained to English medical text and little is known about the generalizability of de-identification methods across languages and domains. In this study, we construct a varied dataset consisting of the medical records of 1260 patients by sampling data from 9 institutes and three domains of Dutch healthcare. We test the generalizability of three de-identification methods across languages and domains. Our experiments show that an existing rule-based method specifically developed for the Dutch language fails to generalize to this new data. Furthermore, a state-of-the-art neural architecture performs strongly across languages and domains, even with limited training data. Compared to feature-based and rule-based methods the neural method requires significantly less configuration effort and domain-knowledge. We make all code and pre-trained de-identification models available to the research community, allowing practitioners to apply them to their datasets and to enable future benchmarks

    De-identifying Hospital Discharge Summaries: An End-to-End Framework using Ensemble of Deep Learning Models

    Full text link
    Electronic Medical Records (EMRs) contain clinical narrative text that is of great potential value to medical researchers. However, this information is mixed with Personally Identifiable Information (PII) that presents risks to patient and clinician confidentiality. This paper presents an end-to-end de-identification framework to automatically remove PII from hospital discharge summaries. Our corpus included 600 hospital discharge summaries which were extracted from the EMRs of two principal referral hospitals in Sydney, Australia. Our end-to-end de-identification framework consists of three components: 1) Annotation: labelling of PII in the 600 hospital discharge summaries using five pre-defined categories: person, address, date of birth, identification number, phone number; 2) Modelling: training six named entity recognition (NER) deep learning base-models on balanced and imbalanced datasets; and evaluating ensembles that combine all six base-models, the three base-models with the best F1 scores and the three base-models with the best recall scores respectively, using token-level majority voting and stacking methods; and 3) De-identification: removing PII from the hospital discharge summaries. Our results showed that the ensemble model combined using the stacking Support Vector Machine (SVM) method on the three base-models with the best F1 scores achieved excellent results with a F1 score of 99.16% on the test set of our corpus. We also evaluated the robustness of our modelling component on the 2014 i2b2 de-identification dataset. Our ensemble model, which uses the token-level majority voting method on all six base-models, achieved the highest F1 score of 96.24% at strict entity matching and the highest F1 score of 98.64% at binary token-level matching compared to two state-of-the-art methods. The framework provides a robust solution to de-identifying clinical narrative text safely
    corecore