5,844 research outputs found
Knowledge Matters: Radiology Report Generation with General and Specific Knowledge
Automatic radiology report generation is critical in clinics which can
relieve experienced radiologists from the heavy workload and remind
inexperienced radiologists of misdiagnosis or missed diagnose. Existing
approaches mainly formulate radiology report generation as an image captioning
task and adopt the encoder-decoder framework. However, in the medical domain,
such pure data-driven approaches suffer from the following problems: 1) visual
and textual bias problem; 2) lack of expert knowledge. In this paper, we
propose a knowledge-enhanced radiology report generation approach introduces
two types of medical knowledge: 1) General knowledge, which is input
independent and provides the broad knowledge for report generation; 2) Specific
knowledge, which is input dependent and provides the fine-grained knowledge for
report generation. To fully utilize both the general and specific knowledge, we
also propose a knowledge-enhanced multi-head attention mechanism. By merging
the visual features of the radiology image with general knowledge and specific
knowledge, the proposed model can improve the quality of generated reports.
Experimental results on two publicly available datasets IU-Xray and MIMIC-CXR
show that the proposed knowledge enhanced approach outperforms state-of-the-art
image captioning based methods. Ablation studies also demonstrate that both
general and specific knowledge can help to improve the performance of radiology
report generation.Comment: Medical Image Analysi
Customizing General-Purpose Foundation Models for Medical Report Generation
Medical caption prediction which can be regarded as a task of medical report
generation (MRG), requires the automatic generation of coherent and accurate
captions for the given medical images. However, the scarcity of labelled
medical image-report pairs presents great challenges in the development of deep
and large-scale neural networks capable of harnessing the potential artificial
general intelligence power like large language models (LLMs). In this work, we
propose customizing off-the-shelf general-purpose large-scale pre-trained
models, i.e., foundation models (FMs), in computer vision and natural language
processing with a specific focus on medical report generation. Specifically,
following BLIP-2, a state-of-the-art vision-language pre-training approach, we
introduce our encoder-decoder-based MRG model. This model utilizes a
lightweight query Transformer to connect two FMs: the giant vision Transformer
EVA-ViT-g and a bilingual LLM trained to align with human intentions (referred
to as ChatGLM-6B). Furthermore, we conduct ablative experiments on the
trainable components of the model to identify the crucial factors for effective
transfer learning. Our findings demonstrate that unfreezing EVA-ViT-g to learn
medical image representations, followed by parameter-efficient training of
ChatGLM-6B to capture the writing styles of medical reports, is essential for
achieving optimal results. Our best attempt (PCLmed Team) achieved the 4th and
the 2nd, respectively, out of 13 participating teams, based on the BERTScore
and ROUGE-1 metrics, in the ImageCLEFmedical Caption 2023 Caption Prediction
Task competition.Comment: 14 pages, 3 figure
Knowledge Graph Embeddings for Multi-Lingual Structured Representations of Radiology Reports
The way we analyse clinical texts has undergone major changes over the last
years. The introduction of language models such as BERT led to adaptations for
the (bio)medical domain like PubMedBERT and ClinicalBERT. These models rely on
large databases of archived medical documents. While performing well in terms
of accuracy, both the lack of interpretability and limitations to transfer
across languages limit their use in clinical setting. We introduce a novel
light-weight graph-based embedding method specifically catering radiology
reports. It takes into account the structure and composition of the report,
while also connecting medical terms in the report through the multi-lingual
SNOMED Clinical Terms knowledge base. The resulting graph embedding uncovers
the underlying relationships among clinical terms, achieving a representation
that is better understandable for clinicians and clinically more accurate,
without reliance on large pre-training datasets. We show the use of this
embedding on two tasks namely disease classification of X-ray reports and image
classification. For disease classification our model is competitive with its
BERT-based counterparts, while being magnitudes smaller in size and training
data requirements. For image classification, we show the effectiveness of the
graph embedding leveraging cross-modal knowledge transfer and show how this
method is usable across different languages
Rethinking Medical Report Generation: Disease Revealing Enhancement with Knowledge Graph
Knowledge Graph (KG) plays a crucial role in Medical Report Generation (MRG)
because it reveals the relations among diseases and thus can be utilized to
guide the generation process. However, constructing a comprehensive KG is
labor-intensive and its applications on the MRG process are under-explored. In
this study, we establish a complete KG on chest X-ray imaging that includes 137
types of diseases and abnormalities. Based on this KG, we find that the current
MRG data sets exhibit a long-tailed problem in disease distribution. To
mitigate this problem, we introduce a novel augmentation strategy that enhances
the representation of disease types in the tail-end of the distribution. We
further design a two-stage MRG approach, where a classifier is first trained to
detect whether the input images exhibit any abnormalities. The classified
images are then independently fed into two transformer-based generators,
namely, ``disease-specific generator" and ``disease-free generator" to generate
the corresponding reports. To enhance the clinical evaluation of whether the
generated reports correctly describe the diseases appearing in the input image,
we propose diverse sensitivity (DS), a new metric that checks whether generated
diseases match ground truth and measures the diversity of all generated
diseases. Results show that the proposed two-stage generation framework and
augmentation strategies improve DS by a considerable margin, indicating a
notable reduction in the long-tailed problem associated with under-represented
diseases
KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization
In this paper, we introduce CheXOFA, a new pre-trained vision-language model
(VLM) for the chest X-ray domain. Our model is initially pre-trained on various
multimodal datasets within the general domain before being transferred to the
chest X-ray domain. Following a prominent VLM, we unify various domain-specific
tasks into a simple sequence-to-sequence schema. It enables the model to
effectively learn the required knowledge and skills from limited resources in
the domain. Demonstrating superior performance on the benchmark datasets
provided by the BioNLP shared task, our model benefits from its training across
multiple tasks and domains. With subtle techniques including ensemble and
factual calibration, our system achieves first place on the RadSum23
leaderboard for the hidden test set.Comment: Published at BioNLP workshop @ ACL 202
Style-Aware Radiology Report Generation with RadGraph and Few-Shot Prompting
Automatically generated reports from medical images promise to improve the
workflow of radiologists. Existing methods consider an image-to-report modeling
task by directly generating a fully-fledged report from an image. However, this
conflates the content of the report (e.g., findings and their attributes) with
its style (e.g., format and choice of words), which can lead to clinically
inaccurate reports. To address this, we propose a two-step approach for
radiology report generation. First, we extract the content from an image; then,
we verbalize the extracted content into a report that matches the style of a
specific radiologist. For this, we leverage RadGraph -- a graph representation
of reports -- together with large language models (LLMs). In our quantitative
evaluations, we find that our approach leads to beneficial performance. Our
human evaluation with clinical raters highlights that the AI-generated reports
are indistinguishably tailored to the style of individual radiologist despite
leveraging only a few examples as context.Comment: Accepted to Findings of EMNLP 202
Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering
To contribute to automating the medical vision-language model, we propose a
novel Chest-Xray Difference Visual Question Answering (VQA) task. Given a pair
of main and reference images, this task attempts to answer several questions on
both diseases and, more importantly, the differences between them. This is
consistent with the radiologist's diagnosis practice that compares the current
image with the reference before concluding the report. We collect a new
dataset, namely MIMIC-Diff-VQA, including 700,703 QA pairs from 164,324 pairs
of main and reference images. Compared to existing medical VQA datasets, our
questions are tailored to the Assessment-Diagnosis-Intervention-Evaluation
treatment procedure used by clinical professionals. Meanwhile, we also propose
a novel expert knowledge-aware graph representation learning model to address
this task. The proposed baseline model leverages expert knowledge such as
anatomical structure prior, semantic, and spatial knowledge to construct a
multi-relationship graph, representing the image differences between two images
for the image difference VQA task. The dataset and code can be found at
https://github.com/Holipori/MIMIC-Diff-VQA. We believe this work would further
push forward the medical vision language model
- …