66 research outputs found
Recommended from our members
High-performance Word Sense Disambiguation with Less Manual Effort
Supervised learning is a widely used paradigm in Natural Language Processing. This paradigm involves learning a classifier from annotated examples and applying it to unseen data. We cast word sense disambiguation, our task of interest, as a supervised learning problem. We then formulate the end goal of this dissertation: to develop a series of methods aimed at achieving the highest possible word sense disambiguation performance with the least reliance on manual effort.
We begin by implementing a word sense disambiguation system, which utilizes rich linguistic features to better represent the contexts of ambiguous words. Our state-of-the-art system captures three types of linguistic features: lexical, syntactic, and semantic. Traditionally, semantic features are extracted with the help of expensive hand-crafted lexical resources. We propose a novel unsupervised approach to extracting a similar type of semantic information from unlabeled corpora. We show that incorporating this information into a classification framework leads to performance improvements. The result is a system that outperforms traditional methods while eliminating the reliance on manual effort for extracting semantic data.
We then proceed by attacking the problem of reducing the manual effort from a different direction. Supervised word sense disambiguation relies on annotated data for learning sense classifiers. However, annotation is expensive since it requires a large time investment from expert labelers. We examine various annotation practices and propose several approaches for making them more efficient. We evaluate the proposed approaches and compare them to the existing ones. We show that the annotation effort can often be reduced significantly without sacrificing the performance of the models trained on the annotated data
Overview of the Problem List Summarization (ProbSum) 2023 Shared Task on Summarizing Patients' Active Diagnoses and Problems from Electronic Health Record Progress Notes
The BioNLP Workshop 2023 initiated the launch of a shared task on Problem
List Summarization (ProbSum) in January 2023. The aim of this shared task is to
attract future research efforts in building NLP models for real-world
diagnostic decision support applications, where a system generating relevant
and accurate diagnoses will augment the healthcare providers decision-making
process and improve the quality of care for patients. The goal for participants
is to develop models that generated a list of diagnoses and problems using
input from the daily care notes collected from the hospitalization of
critically ill patients. Eight teams submitted their final systems to the
shared task leaderboard. In this paper, we describe the tasks, datasets,
evaluation metrics, and baseline systems. Additionally, the techniques and
results of the evaluation of the different approaches tried by the
participating teams are summarized.Comment: To appear in the Proceedings of the 5th BioNLP Workshop at AC
Recommended from our members
Discovering body site and severity modifiers in clinical texts
Objective: To research computational methods for discovering body site and severity modifiers in clinical texts. Methods: We cast the task of discovering body site and severity modifiers as a relation extraction problem in the context of a supervised machine learning framework. We utilize rich linguistic features to represent the pairs of relation arguments and delegate the decision about the nature of the relationship between them to a support vector machine model. We evaluate our models using two corpora that annotate body site and severity modifiers. We also compare the model performance to a number of rule-based baselines. We conduct cross-domain portability experiments. In addition, we carry out feature ablation experiments to determine the contribution of various feature groups. Finally, we perform error analysis and report the sources of errors. Results: The performance of our method for discovering body site modifiers achieves F1 of 0.740–0.908 and our method for discovering severity modifiers achieves F1 of 0.905–0.929. Discussion Results indicate that both methods perform well on both in-domain and out-domain data, approaching the performance of human annotators. The most salient features are token and named entity features, although syntactic dependency features also contribute to the overall performance. The dominant sources of errors are infrequent patterns in the data and inability of the system to discern deeper semantic structures. Conclusions: We investigated computational methods for discovering body site and severity modifiers in clinical texts. Our best system is released open source as part of the clinical Text Analysis and Knowledge Extraction System (cTAKES)
Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning
Generative artificial intelligence (AI) is a promising direction for
augmenting clinical diagnostic decision support and reducing diagnostic errors,
a leading contributor to medical errors. To further the development of clinical
AI systems, the Diagnostic Reasoning Benchmark (DR.BENCH) was introduced as a
comprehensive generative AI framework, comprised of six tasks representing key
components in clinical reasoning. We present a comparative analysis of
in-domain versus out-of-domain language models as well as multi-task versus
single task training with a focus on the problem summarization task in DR.BENCH
(Gao et al., 2023). We demonstrate that a multi-task, clinically trained
language model outperforms its general domain counterpart by a large margin,
establishing a new state-of-the-art performance, with a ROUGE-L score of 28.55.
This research underscores the value of domain-specific training for optimizing
clinical diagnostic reasoning tasks.Comment: Accepted to the Proceedings of the 5th Clinical NLP Workshop at AC
Progress Note Understanding -- Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 Shared Task
Daily progress notes are common types in the electronic health record (EHR)
where healthcare providers document the patient's daily progress and treatment
plans. The EHR is designed to document all the care provided to patients, but
it also enables note bloat with extraneous information that distracts from the
diagnoses and treatment plans. Applications of natural language processing
(NLP) in the EHR is a growing field with the majority of methods in information
extraction. Few tasks use NLP methods for downstream diagnostic decision
support. We introduced the 2022 National NLP Clinical Challenge (N2C2) Track 3:
Progress Note Understanding - Assessment and Plan Reasoning as one step towards
a new suite of tasks. The Assessment and Plan Reasoning task focuses on the
most critical components of progress notes, Assessment and Plan subsections
where health problems and diagnoses are contained. The goal of the task was to
develop and evaluate NLP systems that automatically predict causal relations
between the overall status of the patient contained in the Assessment section
and its relation to each component of the Plan section which contains the
diagnoses and treatment plans. The goal of the task was to identify and
prioritize diagnoses as the first steps in diagnostic decision support to find
the most relevant information in long documents like daily progress notes. We
present the results of 2022 n2c2 Track 3 and provide a description of the data,
evaluation, participation and system performance.Comment: To appear in Journal of Biomedical Informatic
DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing
The meaningful use of electronic health records (EHR) continues to progress
in the digital era with clinical decision support systems augmented by
artificial intelligence. A priority in improving provider experience is to
overcome information overload and reduce the cognitive burden so fewer medical
errors and cognitive biases are introduced during patient care. One major type
of medical error is diagnostic error due to systematic or predictable errors in
judgment that rely on heuristics. The potential for clinical natural language
processing (cNLP) to model diagnostic reasoning in humans with forward
reasoning from data to diagnosis and potentially reduce the cognitive burden
and medical error has not been investigated. Existing tasks to advance the
science in cNLP have largely focused on information extraction and named entity
recognition through classification tasks. We introduce a novel suite of tasks
coined as Diagnostic Reasoning Benchmarks, DR.BENCH, as a new benchmark for
developing and evaluating cNLP models with clinical diagnostic reasoning
ability. The suite includes six tasks from ten publicly available datasets
addressing clinical text understanding, medical knowledge reasoning, and
diagnosis generation. DR.BENCH is the first clinical suite of tasks designed to
be a natural language generation framework to evaluate pre-trained language
models. Experiments with state-of-the-art pre-trained generative language
models using large general domain models and models that were continually
trained on a medical corpus demonstrate opportunities for improvement when
evaluated in DR. BENCH. We share DR. BENCH as a publicly available GitLab
repository with a systematic approach to load and evaluate models for the cNLP
community.Comment: Under revie
Recommended from our members
A common type system for clinical natural language processing
Background: One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings. Results: We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later. Conclusions: We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types
Recommended from our members
Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records
Objective: We aimed to mine the data in the Electronic Medical Record to automatically discover patients' Rheumatoid Arthritis disease activity at discrete rheumatology clinic visits. We cast the problem as a document classification task where the feature space includes concepts from the clinical narrative and lab values as stored in the Electronic Medical Record. Materials and Methods The Training Set consisted of 2792 clinical notes and associated lab values. Test Set 1 included 1749 clinical notes and associated lab values. Test Set 2 included 344 clinical notes for which there were no associated lab values. The Apache clinical Text Analysis and Knowledge Extraction System was used to analyze the text and transform it into informative features to be combined with relevant lab values. Results: Experiments over a range of machine learning algorithms and features were conducted. The best performing combination was linear kernel Support Vector Machines with Unified Medical Language System Concept Unique Identifier features with feature selection and lab values. The Area Under the Receiver Operating Characteristic Curve (AUC) is 0.831 (σ = 0.0317), statistically significant as compared to two baselines (AUC = 0.758, σ = 0.0291). Algorithms demonstrated superior performance on cases clinically defined as extreme categories of disease activity (Remission and High) compared to those defined as intermediate categories (Moderate and Low) and included laboratory data on inflammatory markers. Conclusion: Automatic Rheumatoid Arthritis disease activity discovery from Electronic Medical Record data is a learnable task approximating human performance. As a result, this approach might have several research applications, such as the identification of patients for genome-wide pharmacogenetic studies that require large sample sizes with precise definitions of disease activity and response to therapies
- …