1,548 research outputs found
Relation Classification for Bleeding Events From Electronic Health Records Using Deep Learning Systems: An Empirical Study
BACKGROUND: Accurate detection of bleeding events from electronic health records (EHRs) is crucial for identifying and characterizing different common and serious medical problems. To extract such information from EHRs, it is essential to identify the relations between bleeding events and related clinical entities (eg, bleeding anatomic sites and lab tests). With the advent of natural language processing (NLP) and deep learning (DL)-based techniques, many studies have focused on their applicability for various clinical applications. However, no prior work has utilized DL to extract relations between bleeding events and relevant entities.
OBJECTIVE: In this study, we aimed to evaluate multiple DL systems on a novel EHR data set for bleeding event-related relation classification.
METHODS: We first expert annotated a new data set of 1046 deidentified EHR notes for bleeding events and their attributes. On this data set, we evaluated three state-of-the-art DL architectures for the bleeding event relation classification task, namely, convolutional neural network (CNN), attention-guided graph convolutional network (AGGCN), and Bidirectional Encoder Representations from Transformers (BERT). We used three BERT-based models, namely, BERT pretrained on biomedical data (BioBERT), BioBERT pretrained on clinical text (Bio+Clinical BERT), and BioBERT pretrained on EHR notes (EhrBERT).
RESULTS: Our experiments showed that the BERT-based models significantly outperformed the CNN and AGGCN models. Specifically, BioBERT achieved a macro F1 score of 0.842, outperforming both the AGGCN (macro F1 score, 0.828) and CNN models (macro F1 score, 0.763) by 1.4% (P \u3c .001) and 7.9% (P \u3c .001), respectively.
CONCLUSIONS: In this comprehensive study, we explored and compared different DL systems to classify relations between bleeding events and other medical concepts. On our corpus, BERT-based models outperformed other DL models for identifying the relations of bleeding-related entities. In addition to pretrained contextualized word representation, BERT-based models benefited from the use of target entity representation over traditional sequence representation
MC-DRE: Multi-Aspect Cross Integration for Drug Event/Entity Extraction
Extracting meaningful drug-related information chunks, such as adverse drug
events (ADE), is crucial for preventing morbidity and saving many lives. Most
ADEs are reported via an unstructured conversation with the medical context, so
applying a general entity recognition approach is not sufficient enough. In
this paper, we propose a new multi-aspect cross-integration framework for drug
entity/event detection by capturing and aligning different
context/language/knowledge properties from drug-related documents. We first
construct multi-aspect encoders to describe semantic, syntactic, and medical
document contextual information by conducting those slot tagging tasks, main
drug entity/event detection, part-of-speech tagging, and general medical named
entity recognition. Then, each encoder conducts cross-integration with other
contextual information in three ways: the key-value cross, attention cross, and
feedforward cross, so the multi-encoders are integrated in depth. Our model
outperforms all SOTA on two widely used tasks, flat entity detection and
discontinuous event extraction.Comment: Accepted at CIKM 202
Model Tuning or Prompt Tuning? A Study of Large Language Models for Clinical Concept and Relation Extraction
Objective To develop soft prompt-based learning algorithms for large language
models (LLMs), examine the shape of prompts, prompt-tuning using
frozen/unfrozen LLMs, transfer learning, and few-shot learning abilities.
Methods We developed a soft prompt-based LLM model and compared 4 training
strategies including (1) fine-tuning without prompts; (2) hard-prompt with
unfrozen LLMs; (3) soft-prompt with unfrozen LLMs; and (4) soft-prompt with
frozen LLMs. We evaluated 7 pretrained LLMs using the 4 training strategies for
clinical concept and relation extraction on two benchmark datasets. We
evaluated the transfer learning ability of the prompt-based learning algorithms
in a cross-institution setting. We also assessed the few-shot learning ability.
Results and Conclusion When LLMs are unfrozen, GatorTron-3.9B with soft
prompting achieves the best strict F1-scores of 0.9118 and 0.8604 for concept
extraction, outperforming the traditional fine-tuning and hard prompt-based
models by 0.6~3.1% and 1.2~2.9%, respectively; GatorTron-345M with soft
prompting achieves the best F1-scores of 0.8332 and 0.7488 for end-to-end
relation extraction, outperforming the other two models by 0.2~2% and
0.6~11.7%, respectively. When LLMs are frozen, small (i.e., 345 million
parameters) LLMs have a big gap to be competitive with unfrozen models; scaling
LLMs up to billions of parameters makes frozen LLMs competitive with unfrozen
LLMs. For cross-institute evaluation, soft prompting with a frozen
GatorTron-8.9B model achieved the best performance. This study demonstrates
that (1) machines can learn soft prompts better than humans, (2) frozen LLMs
have better few-shot learning ability and transfer learning ability to
facilitate muti-institution applications, and (3) frozen LLMs require large
models
Adverse drug reaction extraction on electronic health records written in Spanish
148 p.This work focuses on the automatic extraction of Adverse Drug Reactions (ADRs) in Electronic HealthRecords (EHRs). That is, extracting a response to a medicine which is noxious and unintended and whichoccurs at doses normally used. From Natural Language Processing (NLP) perspective, this wasapproached as a relation extraction task in which the drug is the causative agent of a disease, sign orsymptom, that is, the adverse reaction.ADR extraction from EHRs involves major challenges. First, ADRs are rare events. That is, relationsbetween drugs and diseases found in an EHR are seldom ADRs (are often unrelated or, instead, related astreatment). This implies the inference from samples with skewed class distribution. Second, EHRs arewritten by experts often under time pressure, employing both rich medical jargon together with colloquialexpressions (not always grammatical) and it is not infrequent to find misspells and both standard andnon-standard abbreviations. All this leads to a high lexical variability.We explored several ADR detection algorithms and representations to characterize the ADR candidates.In addition, we have assessed the tolerance of the ADR detection model to external noise such as theincorrect detection of implied medical entities implied in the ADR extraction, i.e. drugs and diseases. Westtled the first steps on ADR extraction in Spanish using a corpus of real EHRs
Drug Discov Today
The US Food and Drug Administration (FDA) has been actively promoting the use of real-world data (RWD) in drug development. RWD can generate important real-world evidence reflecting the real-world clinical environment where the treatments are used. Meanwhile, artificial intelligence (AI), especially machine- and deep-learning (ML/DL) methods, have been increasingly used across many stages of the drug development process. Advancements in AI have also provided new strategies to analyze large, multidimensional RWD. Thus, we conducted a rapid review of articles from the past 20 years, to provide an overview of the drug development studies that use both AI and RWD. We found that the most popular applications were adverse event detection, trial recruitment, and drug repurposing. Here, we also discuss current research gaps and future opportunities.U18 DP006512/DP/NCCDPHP CDC HHSUnited States/R21 ES032762/ES/NIEHS NIH HHSUnited States/R21 CA245858/CA/NCI NIH HHSUnited States/UL1 TR001427/TR/NCATS NIH HHSUnited States/R01 CA246418/CA/NCI NIH HHSUnited States/U18DP006512/ACL/ACL HHSUnited States/R21 AG068717/AG/NIA NIH HHSUnited States/2021-11-27T00:00:00Z33358699PMC862686410635vault:4053
Named Entity Recognition in Electronic Health Records: A Methodological Review
Objectives A substantial portion of the data contained in Electronic Health Records (EHR) is unstructured, often appearing as free text. This format restricts its potential utility in clinical decision-making. Named entity recognition (NER) methods address the challenge of extracting pertinent information from unstructured text. The aim of this study was to outline the current NER methods and trace their evolution from 2011 to 2022. Methods We conducted a methodological literature review of NER methods, with a focus on distinguishing the classification models, the types of tagging systems, and the languages employed in various corpora. Results Several methods have been documented for automatically extracting relevant information from EHRs using natural language processing techniques such as NER and relation extraction (RE). These methods can automatically extract concepts, events, attributes, and other data, as well as the relationships between them. Most NER studies conducted thus far have utilized corpora in English or Chinese. Additionally, the bidirectional encoder representation from transformers using the BIO tagging system architecture is the most frequently reported classification scheme. We discovered a limited number of papers on the implementation of NER or RE tasks in EHRs within a specific clinical domain. Conclusions EHRs play a pivotal role in gathering clinical information and could serve as the primary source for automated clinical decision support systems. However, the creation of new corpora from EHRs in specific clinical domains is essential to facilitate the swift development of NER and RE models applied to EHRs for use in clinical practice
- …