82,070 research outputs found
Improving first order temporal fact extraction with unreliable data
In this paper, we deal with the task of extracting first order temporal facts from free text. This task is a subtask of relation extraction and it aims at extracting relations between entity and time. Currently, the field of relation extraction mainly focuses on extracting relations between entities. However, we observe that the multi-granular nature of time expressions can help us divide the dataset constructed by distant supervision to reliable and less reliable subsets, which can help to improve the extraction results on relations between entity and time. We accordingly contribute the first dataset focusing on the first order temporal fact extraction task using distant supervision. To fully utilize both the reliable and the less reliable data, we propose to use curriculum learning to rearrange the training procedure, label dropout to make the model be more conservative about less reliable data, and instance attention to help the model distinguish important instances from unimportant ones. Experiments show that these methods help the model outperform the model trained purely on the reliable dataset as well as the model trained on the dataset where all subsets are mixed together
Prior-RadGraphFormer: A Prior-Knowledge-Enhanced Transformer for Generating Radiology Graphs from X-Rays
The extraction of structured clinical information from free-text radiology
reports in the form of radiology graphs has been demonstrated to be a valuable
approach for evaluating the clinical correctness of report-generation methods.
However, the direct generation of radiology graphs from chest X-ray (CXR)
images has not been attempted. To address this gap, we propose a novel approach
called Prior-RadGraphFormer that utilizes a transformer model with prior
knowledge in the form of a probabilistic knowledge graph (PKG) to generate
radiology graphs directly from CXR images. The PKG models the statistical
relationship between radiology entities, including anatomical structures and
medical observations. This additional contextual information enhances the
accuracy of entity and relation extraction. The generated radiology graphs can
be applied to various downstream tasks, such as free-text or structured reports
generation and multi-label classification of pathologies. Our approach
represents a promising method for generating radiology graphs directly from CXR
images, and has significant potential for improving medical image analysis and
clinical decision-making.Comment: In GRAIL @ MICCAI 202
Generalizing through Forgetting -- Domain Generalization for Symptom Event Extraction in Clinical Notes
Symptom information is primarily documented in free-text clinical notes and
is not directly accessible for downstream applications. To address this
challenge, information extraction approaches that can handle clinical language
variation across different institutions and specialties are needed. In this
paper, we present domain generalization for symptom extraction using
pretraining and fine-tuning data that differs from the target domain in terms
of institution and/or specialty and patient population. We extract symptom
events using a transformer-based joint entity and relation extraction method.
To reduce reliance on domain-specific features, we propose a domain
generalization method that dynamically masks frequent symptoms words in the
source domain. Additionally, we pretrain the transformer language model (LM) on
task-related unlabeled texts for better representation. Our experiments
indicate that masking and adaptive pretraining methods can significantly
improve performance when the source domain is more distant from the target
domain
Extending TextAE for annotation of non-contiguous entities
Named entity recognition tools are used to identify mentions of biomedical entities in free text and are essential components of high-quality information retrieval and extraction systems. Without good entity recognition, methods will mislabel searched text and will miss important information or identify spurious text that will frustrate users. Most tools do not capture non-contiguous entities which are separate spans of text that together refer to an entity, e.g., the entity “type 1 diabetes” in the phrase “type 1 and type 2 diabetes.” This type is commonly found in biomedical texts, especially in lists, where multiple biomedical entities are named in shortened form to avoid repeating words. Most text annotation systems, that enable users to view and edit entity annotations, do not support non-contiguous entities. Therefore, experts cannot even visualize non-contiguous entities, let alone annotate them to build valuable datasets for machine learning methods. To combat this problem and as part of the BLAH6 hackathon, we extended the TextAE platform to allow visualization and annotation of non-contiguous entities. This enables users to add new subspans to existing entities by selecting additional text. We integrate this new functionality with TextAE’s existing editing functionality to allow easy changes to entity annotation and editing of relation annotations involving non-contiguous entities, with importing and exporting to the PubAnnotation format. Finally, we roughly quantify the problem across the entire accessible biomedical literature to highlight that there are a substantial number of non-contiguous entities that appear in lists that would be missed by most text mining systems
- …