Search CORE

7 research outputs found

Visual Question Answering in the Medical Domain

Author: Canepa Louisa
Singh Sonit
Sowmya Arcot
Publication venue
Publication date: 20/09/2023
Field of study

Medical visual question answering (Med-VQA) is a machine learning task that aims to create a system that can answer natural language questions based on given medical images. Although there has been rapid progress on the general VQA task, less progress has been made on Med-VQA due to the lack of large-scale annotated datasets. In this paper, we present domain-specific pre-training strategies, including a novel contrastive learning pretraining method, to mitigate the problem of small datasets for the Med-VQA task. We find that the model benefits from components that use fewer parameters. We also evaluate and discuss the model's visual reasoning using evidence verification techniques. Our proposed model obtained an accuracy of 60% on the VQA-Med 2019 test set, giving comparable results to other state-of-the-art Med-VQA models.Comment: 8 pages, 7 figures, Accepted to DICTA 2023 Conferenc

arXiv.org e-Print Archive

Automatic 3D Multi-modal Ultrasound Segmentation of Human Placenta using Fusion Strategies and Deep Learning

Author: Mein Brendan
Singh Sonit
Sowmya Arcot
Stevenson Gordon
Welsh Alec
Publication venue
Publication date: 17/01/2024
Field of study

Purpose: Ultrasound is the most commonly used medical imaging modality for diagnosis and screening in clinical practice. Due to its safety profile, noninvasive nature and portability, ultrasound is the primary imaging modality for fetal assessment in pregnancy. Current ultrasound processing methods are either manual or semi-automatic and are therefore laborious, time-consuming and prone to errors, and automation would go a long way in addressing these challenges. Automated identification of placental changes at earlier gestation could facilitate potential therapies for conditions such as fetal growth restriction and pre-eclampsia that are currently detected only at late gestational age, potentially preventing perinatal morbidity and mortality. Methods: We propose an automatic three-dimensional multi-modal (B-mode and power Doppler) ultrasound segmentation of the human placenta using deep learning combined with different fusion strategies.We collected data containing Bmode and power Doppler ultrasound scans for 400 studies. Results: We evaluated different fusion strategies and state-of-the-art image segmentation networks for placenta segmentation based on standard overlap- and boundary-based metrics. We found that multimodal information in the form of B-mode and power Doppler scans outperform any single modality. Furthermore, we found that B-mode and power Doppler input scans fused at the data level provide the best results with a mean Dice Similarity Coefficient (DSC) of 0.849. Conclusion: We conclude that the multi-modal approach of combining B-mode and power Doppler scans is effective in segmenting the placenta from 3D ultrasound scans in a fully automated manner and is robust to quality variation of the datasets

arXiv.org e-Print Archive

Attention and Pooling based Sigmoid Colon Segmentation in 3D CT images

Author: Blair Alan
Iyer Sankaran
Rahman Md Akizur
Ravindran Praveen
Shanmugalingam Kuruparan
Singh Sonit
Sowmya Arcot
Publication venue
Publication date: 25/09/2023
Field of study

Segmentation of the sigmoid colon is a crucial aspect of treating diverticulitis. It enables accurate identification and localisation of inflammation, which in turn helps healthcare professionals make informed decisions about the most appropriate treatment options. This research presents a novel deep learning architecture for segmenting the sigmoid colon from Computed Tomography (CT) images using a modified 3D U-Net architecture. Several variations of the 3D U-Net model with modified hyper-parameters were examined in this study. Pyramid pooling (PyP) and channel-spatial Squeeze and Excitation (csSE) were also used to improve the model performance. The networks were trained using manually annotated sigmoid colon. A five-fold cross-validation procedure was used on a test dataset to evaluate the network's performance. As indicated by the maximum Dice similarity coefficient (DSC) of 56.92+/-1.42%, the application of PyP and csSE techniques improves segmentation precision. We explored ensemble methods including averaging, weighted averaging, majority voting, and max ensemble. The results show that average and majority voting approaches with a threshold value of 0.5 and consistent weight distribution among the top three models produced comparable and optimal results with DSC of 88.11+/-3.52%. The results indicate that the application of a modified 3D U-Net architecture is effective for segmenting the sigmoid colon in Computed Tomography (CT) images. In addition, the study highlights the potential benefits of integrating ensemble methods to improve segmentation precision.Comment: 8 Pages, 6 figures, Accepted at IEEE DICTA 202

arXiv.org e-Print Archive

Generalizing link prediction for information extraction

Author: Singh Sonit
Publication venue: Sydney, Australia : Macquarie University
Publication date
Field of study

Empirical thesis.Bibliography: pages 57-67.1. Introduction -- 2. Literature review -- 3. Modeling n-ary relationships -- 4. Experimental evaluation -- 5. Conclusions and future work -- Appendix -- References.Information Extraction (IE) is the task of extracting from a text the entities and the relationships that hold between them, in a form that can be stored in a database called a Knowledge Base (KB) or Knowledge Graph (KG). Link prediction, also called as Knowledge Base Completion, is the task of predicting missing links in order to make KG more complete. While most of IE and link prediction models have focused on binary relationships, in the real world relationships are often n-ary (n > 2). Recently, IE algorithms have been proposed that can extract relationships of arbitrary arity, but as far as we know there is no corresponding work on link prediction involving relationships of arbitrary arity. In this thesis, we introduce the task of n-ary link prediction by proposing two different models to model n-ary relationships and two different training methods to train the proposed models. We also provide new dataset (based on Wikidata) for training and evaluating our proposed approaches. We also propose a modification in the standard evaluation criteria in order to overcome the bottleneck of huge computational complexity when working on large-scale KBs. Evaluation in terms of Mean Rank, Hits@10 and classification accuracy on tuple dataset show that our proposed approaches have the ability to generalize link prediction over tuples having arbitrary arity.Mode of access: World wide web1 online resource (xv, 67 pages) diagrams, graphs, table

Macquarie University ResearchOnline