7 research outputs found
Visual Question Answering in the Medical Domain
Medical visual question answering (Med-VQA) is a machine learning task that
aims to create a system that can answer natural language questions based on
given medical images. Although there has been rapid progress on the general VQA
task, less progress has been made on Med-VQA due to the lack of large-scale
annotated datasets. In this paper, we present domain-specific pre-training
strategies, including a novel contrastive learning pretraining method, to
mitigate the problem of small datasets for the Med-VQA task. We find that the
model benefits from components that use fewer parameters. We also evaluate and
discuss the model's visual reasoning using evidence verification techniques.
Our proposed model obtained an accuracy of 60% on the VQA-Med 2019 test set,
giving comparable results to other state-of-the-art Med-VQA models.Comment: 8 pages, 7 figures, Accepted to DICTA 2023 Conferenc
Automatic 3D Multi-modal Ultrasound Segmentation of Human Placenta using Fusion Strategies and Deep Learning
Purpose: Ultrasound is the most commonly used medical imaging modality for
diagnosis and screening in clinical practice. Due to its safety profile,
noninvasive nature and portability, ultrasound is the primary imaging modality
for fetal assessment in pregnancy. Current ultrasound processing methods are
either manual or semi-automatic and are therefore laborious, time-consuming and
prone to errors, and automation would go a long way in addressing these
challenges. Automated identification of placental changes at earlier gestation
could facilitate potential therapies for conditions such as fetal growth
restriction and pre-eclampsia that are currently detected only at late
gestational age, potentially preventing perinatal morbidity and mortality.
Methods: We propose an automatic three-dimensional multi-modal (B-mode and
power Doppler) ultrasound segmentation of the human placenta using deep
learning combined with different fusion strategies.We collected data containing
Bmode and power Doppler ultrasound scans for 400 studies.
Results: We evaluated different fusion strategies and state-of-the-art image
segmentation networks for placenta segmentation based on standard overlap- and
boundary-based metrics. We found that multimodal information in the form of
B-mode and power Doppler scans outperform any single modality. Furthermore, we
found that B-mode and power Doppler input scans fused at the data level provide
the best results with a mean Dice Similarity Coefficient (DSC) of 0.849.
Conclusion: We conclude that the multi-modal approach of combining B-mode and
power Doppler scans is effective in segmenting the placenta from 3D ultrasound
scans in a fully automated manner and is robust to quality variation of the
datasets
Attention and Pooling based Sigmoid Colon Segmentation in 3D CT images
Segmentation of the sigmoid colon is a crucial aspect of treating
diverticulitis. It enables accurate identification and localisation of
inflammation, which in turn helps healthcare professionals make informed
decisions about the most appropriate treatment options. This research presents
a novel deep learning architecture for segmenting the sigmoid colon from
Computed Tomography (CT) images using a modified 3D U-Net architecture. Several
variations of the 3D U-Net model with modified hyper-parameters were examined
in this study. Pyramid pooling (PyP) and channel-spatial Squeeze and Excitation
(csSE) were also used to improve the model performance. The networks were
trained using manually annotated sigmoid colon. A five-fold cross-validation
procedure was used on a test dataset to evaluate the network's performance. As
indicated by the maximum Dice similarity coefficient (DSC) of 56.92+/-1.42%,
the application of PyP and csSE techniques improves segmentation precision. We
explored ensemble methods including averaging, weighted averaging, majority
voting, and max ensemble. The results show that average and majority voting
approaches with a threshold value of 0.5 and consistent weight distribution
among the top three models produced comparable and optimal results with DSC of
88.11+/-3.52%. The results indicate that the application of a modified 3D U-Net
architecture is effective for segmenting the sigmoid colon in Computed
Tomography (CT) images. In addition, the study highlights the potential
benefits of integrating ensemble methods to improve segmentation precision.Comment: 8 Pages, 6 figures, Accepted at IEEE DICTA 202
Generalizing link prediction for information extraction
Empirical thesis.Bibliography: pages 57-67.1. Introduction -- 2. Literature review -- 3. Modeling n-ary relationships -- 4. Experimental evaluation -- 5. Conclusions and future work -- Appendix -- References.Information Extraction (IE) is the task of extracting from a text the entities and the relationships that hold between them, in a form that can be stored in a database called a Knowledge Base (KB) or Knowledge Graph (KG). Link prediction, also called as Knowledge Base Completion, is the task of predicting missing links in order to make KG more complete. While most of IE and link prediction models have focused on binary relationships, in the real world relationships are often n-ary (n > 2). Recently, IE algorithms have been proposed that can extract relationships of arbitrary arity, but as far as we know there is no corresponding work on link prediction involving relationships of arbitrary arity. In this thesis, we introduce the task of n-ary link prediction by proposing two different models to model n-ary relationships and two different training methods to train the proposed models. We also provide new dataset (based on Wikidata) for training and evaluating our proposed approaches. We also propose a modification in the standard evaluation criteria in order to overcome the bottleneck of huge computational complexity when working on large-scale KBs. Evaluation in terms of Mean Rank, Hits@10 and classification accuracy on tuple dataset show that our proposed approaches have the ability to generalize link prediction over tuples having arbitrary arity.Mode of access: World wide web1 online resource (xv, 67 pages) diagrams, graphs, table