The goal of automatic report generation is to generate a clinically accurate
and coherent phrase from a single given X-ray image, which could alleviate the
workload of traditional radiology reporting. However, in a real-world scenario,
radiologists frequently face the challenge of producing extensive reports
derived from numerous medical images, thereby medical report generation from
multi-image perspective is needed. In this paper, we propose the Complex Organ
Mask Guided (termed as COMG) report generation model, which incorporates masks
from multiple organs (e.g., bones, lungs, heart, and mediastinum), to provide
more detailed information and guide the model's attention to these crucial body
regions. Specifically, we leverage prior knowledge of the disease corresponding
to each organ in the fusion process to enhance the disease identification phase
during the report generation process. Additionally, cosine similarity loss is
introduced as target function to ensure the convergence of cross-modal
consistency and facilitate model optimization.Experimental results on two
public datasets show that COMG achieves a 11.4% and 9.7% improvement in terms
of BLEU@4 scores over the SOTA model KiUT on IU-Xray and MIMIC, respectively.
The code is publicly available at https://github.com/GaryGuTC/COMG_model.Comment: 12 pages, 7 images. Accepted by WACV 202