Search CORE

26 research outputs found

New Classification and Generative Model for Medical Visual Question Answering

Author: Ren Fuji
Zhou Yangyang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/02/2021
Field of study

Medical images are playing an important role in the medical domain. A mature medical visual question answering system can aid diagnosis, but there is no satisfactory method to solve this comprehensive problem so far. Considering that there are many different types of questions, we propose a model called CGMVQA, including classification and answer generation capabilities to turn this complex problem into multiple simple problems in this paper. We adopt data augmentation on images and tokenization on texts. We use pre-trained ResNet152 to extract image features and add three kinds of embeddings together to deal with texts. We reduce the parameters of the multi-head self-attention transformer to cut the computational cost down. We adjust the masking and output layers to change the functions of the model. This model establishes new state-of-the-art results: 0.640 of classification accuracy, 0.659 of word matching and 0.678 of semantic similarity in ImageCLEF 2019 VQA-Med data set. It suggests that the CGMVQA is effective in medical visual question answering and can better assist doctors in clinical analysis and diagnosis

Tokushima University Institutional Repository

Visual Question Answering in the Medical Domain

Author: Canepa Louisa
Singh Sonit
Sowmya Arcot
Publication venue
Publication date: 20/09/2023
Field of study

Medical visual question answering (Med-VQA) is a machine learning task that aims to create a system that can answer natural language questions based on given medical images. Although there has been rapid progress on the general VQA task, less progress has been made on Med-VQA due to the lack of large-scale annotated datasets. In this paper, we present domain-specific pre-training strategies, including a novel contrastive learning pretraining method, to mitigate the problem of small datasets for the Med-VQA task. We find that the model benefits from components that use fewer parameters. We also evaluate and discuss the model's visual reasoning using evidence verification techniques. Our proposed model obtained an accuracy of 60% on the VQA-Med 2019 test set, giving comparable results to other state-of-the-art Med-VQA models.Comment: 8 pages, 7 figures, Accepted to DICTA 2023 Conferenc

arXiv.org e-Print Archive

ImageCLEF 2020: Multimedia Retrieval in Lifelogging, Medical, Nature, and Security Applications

Author: Abacha Asma Ben
Berari Raul
Brie Paul
Campello Antonio
Chamberlain Jon
Cid Yashin Dicente
Clark Adrian
Constantin Mihai Gabriel
Dang-Nguyen Duc-Tien
Datla Vivek
Demner-Fushman Dina
Dogariu Mihai
Fichou Dimitri
Friedrich Christoph M
Garcia Seco De Herrera Alba
Gurrin Cathal
Halvorsen På L
Hasan Sadid A
Ionescu Bogdan
Kovalev Vassili
Kozlovski Serge
Le Tu-Khiem
Liauchuk Vitali
Lux Mathias
Müller Henning
Ninh Van-Tu
Pelka Obioma
Piras Luca
Péteri Renaud
Riegler Michael
Stefan Liviu Daniel
Tran Minh-Triet
Zhou Liting
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

This paper presents an overview of the 2020 ImageCLEF lab that will be organized as part of the Conference and Labs of the Evaluation Forum - CLEF Labs 2020 in Thessaloniki, Greece. ImageCLEF is an ongoing evaluation initiative (run since 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In 2020, the 18th edition of ImageCLEF will organize four main tasks: (i) a Lifelog task (videos, images and other sources) about daily activity understanding, retrieval and summarization, (ii) a Medical task that groups three previous tasks (caption analysis, tuberculosis prediction, and medical visual question answering) with new data and adapted tasks, (iii) a Coral task about segmenting and labeling collections of coral images for 3D modeling, and a new (iv) Web user interface task addressing the problems of detecting and recognizing hand drawn website UIs (User Interfaces) for generating automatic code. The strong participation, with over 235 research groups registering and 63 submitting over 359 runs for the tasks in 2019 shows an important interest in this benchmarking campaign. We expect the new tasks to attract at least as many researchers for 2020

University of Essex Research Repository

Crossref

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Irish Universities

DCU Online Research Access Service

Overview of the ImageCLEF 2021: Multimedia Retrieval in Medical, Nature, Internet and Social Media Applications

Author: Abacha Asma Ben
Berari Raul
Brie Paul
Campello Antonio
Chamberlain Jon
Cid Yashin Dicente
Clark Adrian
Constantin Mihai Gabriel
Demner-Fushman Dina
Deshayes-Chossart Jérôme
Dogariu Mihai
Fichou Dimitri
Friedrich Christoph M
Garcia Seco De Herrera Alba
Hasan Sadid A
Ionescu Bogdan
Jacutprakart Janadhip
Kovalev Vassili
Kozlovski Serge
Liauchuk Vitali
Moustahfid Hassan
Müller Henning
Oliver Thomas A
Pelka Obioma
Popescu Adrian
Péteri Renaud
Sarrouti Mourad
Tauteanu Andrei
Ştefan Liviu Daniel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

This paper presents an overview of the ImageCLEF 2021 lab that was organized as part of the Conference and Labs of the Evaluation Forum – CLEF Labs 2021. ImageCLEF is an ongoing evaluation initiative (first run in 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In 2021, the 19th edition of ImageCLEF runs four main tasks: (i) a medical task that groups three previous tasks, i.e., caption analysis, tuberculosis prediction, and medical visual question answering and question generation, (ii) a nature coral task about segmenting and labeling collections of coral reef images, (iii) an Internet task addressing the problems of identifying hand-drawn and digital user interface components, and (iv) a new social media aware task on estimating potential real-life effects of online image sharing. Despite the current pandemic situation, the benchmark campaign received a strong participation with over 38 groups submitting more than 250 runs

University of Essex Research Repository

Recommended from our members

Vision-Language Transformer for Interpretable Pathology Visual Question Answering

Author: Khushi M
Kim J
Naseem U
Publication venue: IEEE
Publication date: 31/03/2022
Field of study

Pathology visual question answering (PathVQA) attempts to answer a medical question posed by pathology images. Despite its great potential in healthcare, it is not widely adopted because it requires interactions on both the image (vision) and question (language) to generate an answer. Existing methods focused on treating vision and language features independently, which were unable to capture the high and low-level interactions that are required for VQA. Further, these methods failed to offer capabilities to interpret the retrieved answers, which are obscure to humans where the models’ interpretability to justify the retrieved answers has remained largely unexplored. Motivated by these limitations, we introduce a vision-language transformer that embeds vision (images) and language (questions) features for an interpretable PathVQA. We present an interpretable tra nsformer-based P ath- VQA (TraP-VQA), where we embed transformers’ encoder layers with vision and language features extracted using pre-trained CNN and domain-specific language model (LM), respectively. A decoder layer is then embedded to upsample the encoded features for the final prediction for PathVQA. Our experiments showed that our TraP-VQA outperformed the state-of-the-art comparative methods with public PathVQA dataset. Our experiments validated the robustness of our model on another medical VQA dataset, and the ablation study demonstrated the capability of our integrated transformer-based vision-language model for PathVQA. Finally, we present the visualization results of both text and images, which explain the reason for a retrieved answer in PathVQA.ARC (Grant Number: DP200103748)

Brunel University Research Archive

Recommended from our members

K-PathVQA: Knowledge-Aware Multimodal Representation for Pathology Visual Question Answering

Author: Dunn AG
Khushi M
Kim J
Naseem U
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 11/07/2023
Field of study

ARC (Grant Number: DP200103748

Brunel University Research Archive

Recent, rapid advancement in visual question answering architecture: a review

Author: Berleant Daniel
Kodali Venkat
Publication venue
Publication date: 30/03/2022
Field of study

Understanding visual question answering is going to be crucial for numerous human activities. However, it presents major challenges at the heart of the artificial intelligence endeavor. This paper presents an update on the rapid advancements in visual question answering using images that have occurred in the last couple of years. Tremendous growth in research on improving visual question answering system architecture has been published recently, showing the importance of multimodal architectures. Several points on the benefits of visual question answering are mentioned in the review paper by Manmadhan et al. (2020), on which the present article builds, including subsequent updates in the field.Comment: 11 page

arXiv.org e-Print Archive