17,869 research outputs found
Interactive and Explainable Region-guided Radiology Report Generation
The automatic generation of radiology reports has the potential to assist
radiologists in the time-consuming task of report writing. Existing methods
generate the full report from image-level features, failing to explicitly focus
on anatomical regions in the image. We propose a simple yet effective
region-guided report generation model that detects anatomical regions and then
describes individual, salient regions to form the final report. While previous
methods generate reports without the possibility of human intervention and with
limited explainability, our method opens up novel clinical use cases through
additional interactive capabilities and introduces a high degree of
transparency and explainability. Comprehensive experiments demonstrate our
method's effectiveness in report generation, outperforming previous
state-of-the-art models, and highlight its interactive capabilities. The code
and checkpoints are available at https://github.com/ttanida/rgrg .Comment: Accepted at CVPR 202
On the Importance of Image Encoding in Automated Chest X-Ray Report Generation
Chest X-ray is one of the most popular medical imaging modalities due to its
accessibility and effectiveness. However, there is a chronic shortage of
well-trained radiologists who can interpret these images and diagnose the
patient's condition. Therefore, automated radiology report generation can be a
very helpful tool in clinical practice. A typical report generation workflow
consists of two main steps: (i) encoding the image into a latent space and (ii)
generating the text of the report based on the latent image embedding. Many
existing report generation techniques use a standard convolutional neural
network (CNN) architecture for image encoding followed by a Transformer-based
decoder for medical text generation. In most cases, CNN and the decoder are
trained jointly in an end-to-end fashion. In this work, we primarily focus on
understanding the relative importance of encoder and decoder components.
Towards this end, we analyze four different image encoding approaches: direct,
fine-grained, CLIP-based, and Cluster-CLIP-based encodings in conjunction with
three different decoders on the large-scale MIMIC-CXR dataset. Among these
encoders, the cluster CLIP visual encoder is a novel approach that aims to
generate more discriminative and explainable representations. CLIP-based
encoders produce comparable results to traditional CNN-based encoders in terms
of NLP metrics, while fine-grained encoding outperforms all other encoders both
in terms of NLP and clinical accuracy metrics, thereby validating the
importance of image encoder to effectively extract semantic information. GitHub
repository: https://github.com/mudabek/encoding-cxr-report-ge
Advancing Medical Imaging with Language Models: A Journey from N-grams to ChatGPT
In this paper, we aimed to provide a review and tutorial for researchers in
the field of medical imaging using language models to improve their tasks at
hand. We began by providing an overview of the history and concepts of language
models, with a special focus on large language models. We then reviewed the
current literature on how language models are being used to improve medical
imaging, emphasizing different applications such as image captioning, report
generation, report classification, finding extraction, visual question
answering, interpretable diagnosis, and more for various modalities and organs.
The ChatGPT was specially highlighted for researchers to explore more potential
applications. We covered the potential benefits of accurate and efficient
language models for medical imaging analysis, including improving clinical
workflow efficiency, reducing diagnostic errors, and assisting healthcare
professionals in providing timely and accurate diagnoses. Overall, our goal was
to bridge the gap between language models and medical imaging and inspire new
ideas and innovations in this exciting area of research. We hope that this
review paper will serve as a useful resource for researchers in this field and
encourage further exploration of the possibilities of language models in
medical imaging
- …