10,176 research outputs found
Integrating speculation detection and deep learning to extract lung cancer diagnosis from clinical notes
Despite efforts to develop models for extracting medical concepts from clinical notes, there are still some challenges in particular to be able to relate concepts to dates. The high number of clinical notes written for each single patient, the use of negation, speculation, and different date formats cause ambiguity that has to be solved to reconstruct the patient’s natural history. In this paper, we concentrate on extracting from clinical narratives the cancer diagnosis and relating it to the diagnosis date. To address this challenge, a hybrid approach that combines deep learning-based and rule-based methods is proposed. The approach integrates three steps: (i) lung cancer named entity recognition, (ii) negation and speculation detection, and (iii) relating the cancer diagnosis to a valid date. In particular, we apply the proposed approach to extract the lung cancer diagnosis and its diagnosis date from clinical narratives written in Spanish. Results obtained show an F-score of 90% in the named entity recognition task, and a 89% F-score in the task of relating the cancer diagnosis to the diagnosis date. Our findings suggest that speculation detection is together with negation detection a key component to properly extract cancer diagnosis from clinical notesThis work is supported by the EU Horizon 2020 innovation program under grant agreement
No. 780495, project BigMedilytics (Big Data for Medical Analytics). It has been also supported
by Fundación AECC and Instituto de Salud Carlos III (grant AC19/00034), under the frame of
ERA-NET PerMe
Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments
Automatic extraction of patient demographics and psychiatric diagnoses from clinical notes allows for the collection of patient data on a large scale. This data could be used for a variety of research purposes including outcomes studies or developing clinical trials. However, current research has not yet discussed the automatic extraction of demographics and psychiatric diagnoses in detail. The aim of this study is to apply text mining to extract patient demographics - age, gender, marital status, education level, and admission diagnoses from the psychiatric assessments at a mental health hospital and also assign codes to each category. Gender is coded as either Male or Female, marital status is coded as either Single, Married, Divorced, or Widowed, and education level can be coded starting with Some High School through Graduate Degree (PhD/JD/MD etc. Level). Classifications for diagnoses are based on the DSM-IV. For each category, a rule-based approach was developed utilizing keyword-based regular expressions as well as constituency trees and typed dependencies. We employ a two-step approach that first maximizes recall through the development of keyword-based patterns and if necessary, maximizes precision by using NLP-based rules to handle the problem of ambiguity. To develop and evaluate our method, we annotated a corpus of 200 assessments, using a portion of the corpus for developing the method and the rest as a test set. F-score was satisfactory for each category (Age: 0.997; Gender: 0.989; Primary Diagnosis: 0.983; Marital Status: 0.875; Education Level: 0.851) as was coding accuracy (Age: 1.0; Gender: 0.989; Primary Diagnosis: 0.922; Marital Status: 0.889; Education Level: 0.778). These results indicate that a rule-based approach could be considered for extracting these types of information in the psychiatric field. At the same time, the results showed a drop in performance from the development set to the test set, which is partly due to the need for more generality in the rules developed
Towards a New Science of a Clinical Data Intelligence
In this paper we define Clinical Data Intelligence as the analysis of data
generated in the clinical routine with the goal of improving patient care. We
define a science of a Clinical Data Intelligence as a data analysis that
permits the derivation of scientific, i.e., generalizable and reliable results.
We argue that a science of a Clinical Data Intelligence is sensible in the
context of a Big Data analysis, i.e., with data from many patients and with
complete patient information. We discuss that Clinical Data Intelligence
requires the joint efforts of knowledge engineering, information extraction
(from textual and other unstructured data), and statistics and statistical
machine learning. We describe some of our main results as conjectures and
relate them to a recently funded research project involving two major German
university hospitals.Comment: NIPS 2013 Workshop: Machine Learning for Clinical Data Analysis and
Healthcare, 201
Assessing mortality prediction through different representation models based on concepts extracted from clinical notes
Recent years have seen particular interest in using electronic medical
records (EMRs) for secondary purposes to enhance the quality and safety of
healthcare delivery. EMRs tend to contain large amounts of valuable clinical
notes. Learning of embedding is a method for converting notes into a format
that makes them comparable. Transformer-based representation models have
recently made a great leap forward. These models are pre-trained on large
online datasets to understand natural language texts effectively. The quality
of a learning embedding is influenced by how clinical notes are used as input
to representation models. A clinical note has several sections with different
levels of information value. It is also common for healthcare providers to use
different expressions for the same concept. Existing methods use clinical notes
directly or with an initial preprocessing as input to representation models.
However, to learn a good embedding, we identified the most essential clinical
notes section. We then mapped the extracted concepts from selected sections to
the standard names in the Unified Medical Language System (UMLS). We used the
standard phrases corresponding to the unique concepts as input for clinical
models. We performed experiments to measure the usefulness of the learned
embedding vectors in the task of hospital mortality prediction on a subset of
the publicly available Medical Information Mart for Intensive Care (MIMIC-III)
dataset. According to the experiments, clinical transformer-based
representation models produced better results with getting input generated by
standard names of extracted unique concepts compared to other input formats.
The best-performing models were BioBERT, PubMedBERT, and UmlsBERT,
respectively
- …