2,075 research outputs found
Contrastive Attention for Automatic Chest X-ray Report Generation
Recently, chest X-ray report generation, which aims to automatically generate
descriptions of given chest X-ray images, has received growing research
interests. The key challenge of chest X-ray report generation is to accurately
capture and describe the abnormal regions. In most cases, the normal regions
dominate the entire chest X-ray image, and the corresponding descriptions of
these normal regions dominate the final report. Due to such data bias,
learning-based models may fail to attend to abnormal regions. In this work, to
effectively capture and describe abnormal regions, we propose the Contrastive
Attention (CA) model. Instead of solely focusing on the current input image,
the CA model compares the current input image with normal images to distill the
contrastive information. The acquired contrastive information can better
represent the visual features of abnormal regions. According to the experiments
on the public IU-X-ray and MIMIC-CXR datasets, incorporating our CA into
several existing models can boost their performance across most metrics. In
addition, according to the analysis, the CA model can help existing models
better attend to the abnormal regions and provide more accurate descriptions
which are crucial for an interpretable diagnosis. Specifically, we achieve the
state-of-the-art results on the two public datasets.Comment: Appear in Findings of ACL 2021 (The Joint Conference of the 59th
Annual Meeting of the Association for Computational Linguistics and the 11th
International Joint Conference on Natural Language Processing (ACL-IJCNLP
2021)
Competence-based Multimodal Curriculum Learning for Medical Report Generation
Medical report generation task, which targets to produce long and coherent
descriptions of medical images, has attracted growing research interests
recently. Different from the general image captioning tasks, medical report
generation is more challenging for data-driven neural models. This is mainly
due to 1) the serious data bias and 2) the limited medical data. To alleviate
the data bias and make best use of available data, we propose a
Competence-based Multimodal Curriculum Learning framework (CMCL). Specifically,
CMCL simulates the learning process of radiologists and optimizes the model in
a step by step manner. Firstly, CMCL estimates the difficulty of each training
instance and evaluates the competence of current model; Secondly, CMCL selects
the most suitable batch of training instances considering current model
competence. By iterating above two steps, CMCL can gradually improve the
model's performance. The experiments on the public IU-Xray and MIMIC-CXR
datasets show that CMCL can be incorporated into existing models to improve
their performance.Comment: Accepted by ACL 2021 (Oral
DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis
Fast screening and diagnosis are critical in COVID-19 patient treatment. In
addition to the gold standard RT-PCR, radiological imaging like X-ray and CT
also works as an important means in patient screening and follow-up. However,
due to the excessive number of patients, writing reports becomes a heavy burden
for radiologists. To reduce the workload of radiologists, we propose DeltaNet
to generate medical reports automatically. Different from typical image
captioning approaches that generate reports with an encoder and a decoder,
DeltaNet applies a conditional generation process. In particular, given a
medical image, DeltaNet employs three steps to generate a report: 1) first
retrieving related medical reports, i.e., the historical reports from the same
or similar patients; 2) then comparing retrieved images and current image to
find the differences; 3) finally generating a new report to accommodate
identified differences based on the conditional report. We evaluate DeltaNet on
a COVID-19 dataset, where DeltaNet outperforms state-of-the-art approaches.
Besides COVID-19, the proposed DeltaNet can be applied to other diseases as
well. We validate its generalization capabilities on the public IU-Xray and
MIMIC-CXR datasets for chest-related diseases. Code is available at
\url{https://github.com/LX-doctorAI1/DeltaNet}
Token Imbalance Adaptation for Radiology Report Generation
Imbalanced token distributions naturally exist in text documents, leading
neural language models to overfit on frequent tokens. The token imbalance may
dampen the robustness of radiology report generators, as complex medical terms
appear less frequently but reflect more medical information. In this study, we
demonstrate how current state-of-the-art models fail to generate infrequent
tokens on two standard benchmark datasets (IU X-RAY and MIMIC-CXR) of radiology
report generation. % However, no prior study has proposed methods to adapt
infrequent tokens for text generators feeding with medical images. To solve the
challenge, we propose the \textbf{T}oken \textbf{Im}balance Adapt\textbf{er}
(\textit{TIMER}), aiming to improve generation robustness on infrequent tokens.
The model automatically leverages token imbalance by an unlikelihood loss and
dynamically optimizes generation processes to augment infrequent tokens. We
compare our approach with multiple state-of-the-art methods on the two
benchmarks. Experiments demonstrate the effectiveness of our approach in
enhancing model robustness overall and infrequent tokens. Our ablation analysis
shows that our reinforcement learning method has a major effect in adapting
token imbalance for radiology report generation.Comment: Accepted by CHIL202
- …