556 research outputs found
Additive Angular Margin for Few Shot Learning to Classify Clinical Endoscopy Images
Endoscopy is a widely used imaging modality to diagnose and treat diseases in
hollow organs as for example the gastrointestinal tract, the kidney and the
liver. However, due to varied modalities and use of different imaging protocols
at various clinical centers impose significant challenges when generalising
deep learning models. Moreover, the assembly of large datasets from different
clinical centers can introduce a huge label bias that renders any learnt model
unusable. Also, when using new modality or presence of images with rare
patterns, a bulk amount of similar image data and their corresponding labels
are required for training these models. In this work, we propose to use a
few-shot learning approach that requires less training data and can be used to
predict label classes of test samples from an unseen dataset. We propose a
novel additive angular margin metric in the framework of prototypical network
in few-shot learning setting. We compare our approach to the several
established methods on a large cohort of multi-center, multi-organ, and
multi-modal endoscopy data. The proposed algorithm outperforms existing
state-of-the-art methods.Comment: 10 page
Alzheimers Disease Diagnosis by Deep Learning Using MRI-Based Approaches
The most frequent kind of dementia of the nervous system, Alzheimer's
disease, weakens several brain processes (such as memory) and eventually
results in death. The clinical study uses magnetic resonance imaging to
diagnose AD. Deep learning algorithms are capable of pattern recognition and
feature extraction from the inputted raw data. As early diagnosis and stage
detection are the most crucial elements in enhancing patient care and treatment
outcomes, deep learning algorithms for MRI images have recently allowed for
diagnosing a medical condition at the beginning stage and identifying
particular symptoms of Alzheimer's disease. As a result, we aimed to analyze
five specific studies focused on AD diagnosis using MRI-based deep learning
algorithms between 2021 and 2023 in this study. To completely illustrate the
differences between these techniques and comprehend how deep learning
algorithms function, we attempted to explore selected approaches in depth
Content-Based Medical Image Retrieval with Opponent Class Adaptive Margin Loss
Broadspread use of medical imaging devices with digital storage has paved the
way for curation of substantial data repositories. Fast access to image samples
with similar appearance to suspected cases can help establish a consulting
system for healthcare professionals, and improve diagnostic procedures while
minimizing processing delays. However, manual querying of large data
repositories is labor intensive. Content-based image retrieval (CBIR) offers an
automated solution based on dense embedding vectors that represent image
features to allow quantitative similarity assessments. Triplet learning has
emerged as a powerful approach to recover embeddings in CBIR, albeit
traditional loss functions ignore the dynamic relationship between opponent
image classes. Here, we introduce a triplet-learning method for automated
querying of medical image repositories based on a novel Opponent Class Adaptive
Margin (OCAM) loss. OCAM uses a variable margin value that is updated
continually during the course of training to maintain optimally discriminative
representations. CBIR performance of OCAM is compared against state-of-the-art
loss functions for representational learning on three public databases
(gastrointestinal disease, skin lesion, lung disease). Comprehensive
experiments in each application domain demonstrate the superior performance of
OCAM against baselines.Comment: 10 pages, 6 figure
A Systematic Review of Few-Shot Learning in Medical Imaging
The lack of annotated medical images limits the performance of deep learning
models, which usually need large-scale labelled datasets. Few-shot learning
techniques can reduce data scarcity issues and enhance medical image analysis,
especially with meta-learning. This systematic review gives a comprehensive
overview of few-shot learning in medical imaging. We searched the literature
systematically and selected 80 relevant articles published from 2018 to 2023.
We clustered the articles based on medical outcomes, such as tumour
segmentation, disease classification, and image registration; anatomical
structure investigated (i.e. heart, lung, etc.); and the meta-learning method
used. For each cluster, we examined the papers' distributions and the results
provided by the state-of-the-art. In addition, we identified a generic pipeline
shared among all the studies. The review shows that few-shot learning can
overcome data scarcity in most outcomes and that meta-learning is a popular
choice to perform few-shot learning because it can adapt to new tasks with few
labelled samples. In addition, following meta-learning, supervised learning and
semi-supervised learning stand out as the predominant techniques employed to
tackle few-shot learning challenges in medical imaging and also best
performing. Lastly, we observed that the primary application areas
predominantly encompass cardiac, pulmonary, and abdominal domains. This
systematic review aims to inspire further research to improve medical image
analysis and patient care.Comment: 48 pages, 29 figures, 10 tables, submitted to Elsevier on 19 Sep 202
Cross-modal data retrieval and generation using deep neural networks
The exponential growth of deep learning has helped solve problems across different fields of study. Convolutional neural networks have become a go-to tool for extracting features from images. Similarly, variations of recurrent neural networks such as Long-Short Term Memory and Gated Recurrent Unit architectures do a good job extracting useful information from temporal data such as text and time series data. Although, these networks are good at extracting features for a particular modality, learning features across multiple modalities is still a challenging task. In this work, we develop a generative common vector space model in which similar concepts from different modalities are brought closer in a common latent space representation while dissimilar concepts are pushed far apart in this same space. The developed model not only aims at solving the cross-modal retrieval problem but also uses the vector generated by the common vector space model to generate real looking data. This work mainly focuses on image and text modalities. However, it can be extended to other modalities as well. We train and evaluate the performance of the model on Caltech CUB and Oxford-102 datasets
A Short Survey on Deep Learning for Multimodal Integration: Applications, Future Perspectives and Challenges
Deep learning has achieved state-of-the-art performances in several research applications nowadays: from computer vision to bioinformatics, from object detection to image generation. In the context of such newly developed deep-learning approaches, we can define the concept of multimodality. The objective of this research field is to implement methodologies which can use several modalities as input features to perform predictions. In this, there is a strong analogy with respect to what happens with human cognition, since we rely on several different senses to make decisions. In this article, we present a short survey on multimodal integration using deep-learning methods. In a first instance, we comprehensively review the concept of multimodality, describing it from a two-dimensional perspective. First, we provide, in fact, a taxonomical description of the multimodality concept. Secondly, we define the second multimodality dimension as the one describing the fusion approaches in multimodal deep learning. Eventually, we describe four applications of multimodal deep learning to the following fields of research: speech recognition, sentiment analysis, forensic applications and image processing
Deep Learning Based Face Detection and Recognition in MWIR and Visible Bands
In non-favorable conditions for visible imaging like extreme illumination or nighttime, there is a need to collect images in other spectra, specifically infrared. Mid-Wave infrared (3-5 microm) images can be collected without giving away the location of the sensor in varying illumination conditions. There are many algorithms for face detection, face alignment, face recognition etc. proposed in visible band till date, while the research using MWIR images is highly limited. Face detection is an important pre-processing step for face recognition, which in turn is an important biometric modality. This thesis works towards bridging the gap between MWIR and visible spectrum through three contributions. First, a dual band based deep face detection model that works well in visible and MWIR spectrum is proposed using transfer learning. Different models are trained and tested extensively using visible and MWIR images and the one model that works well for this data is determined. For this model, experiments are conducted to learn the speed/accuracy trade-off. Following this, the available MWIR dataset is extended through augmentation using traditional methods and generative adversarial networks (GANs). Traditional methods used to augment the data are brightness adjustment, contrast enhancement, applying noise to and de-noising the images. A deep learning based GAN architecture is developed and is used to generate new face identities. The generated images are added to the original dataset and the face detection model developed earlier is once again trained and tested. The third contribution is the proposal of another GAN that converts given thermal ace images into their visible counterparts. A pre-trained model is used as discriminator for this purpose and is trained to classify the images as real and fake and an identity network is used to provide further feedback to the generator. The generated visible images are used as probe images and the original visible images are used as gallery images to perform face recognition experiments using a state-of-the-art visible-to-visible face recognition algorithm
One-shot learning with triplet loss for vegetation classification tasks
Triplet loss function is one of the options that can significantly improve the accuracy of the One-shot Learning tasks. Starting from 2015, many projects use Siamese networks and this kind of loss for face recognition and object classification. In our research, we focused on two tasks related to vegetation. The first one is plant disease detection on 25 classes of five crops (grape, cotton, wheat, cucumbers, and corn). This task is motivated because harvest losses due to diseases is a serious problem for both large farming structures and rural families. The second task is the identification of moss species (5 classes). Mosses are natural bioaccumulators of pollutants; therefore, they are used in environmental monitoring programs. The identification of moss species is an important step in the sample preprocessing. In both tasks, we used self-collected image databases. We tried several deep learning architectures and approaches. Our Siamese network architecture with a triplet loss function and MobileNetV2 as a base network showed the most impressive results in both above-mentioned tasks. The average accuracy for plant disease detection amounted to over 97.8% and 97.6% for moss species classification.A.V.U. and A.V.N. gratefully acknowledge financial support from the Ministry of Science and Higher Education of the Russian Federation in accordance with agreement No 075-15-2020-905 dated November 16, 2020 on providing a grant in the form of subsidies from the Federal budget of Russian Federation. The grant was provided for state support for the creation and development of a World-class Scientific Center "Agrotechnologies for the Future". The database creation part of the reported study was funded by RFBR according to the research project No 18-07-00829
Explainable artificial intelligence (XAI) in deep learning-based medical image analysis
With an increase in deep learning-based methods, the call for explainability
of such methods grows, especially in high-stakes decision making areas such as
medical image analysis. This survey presents an overview of eXplainable
Artificial Intelligence (XAI) used in deep learning-based medical image
analysis. A framework of XAI criteria is introduced to classify deep
learning-based medical image analysis methods. Papers on XAI techniques in
medical image analysis are then surveyed and categorized according to the
framework and according to anatomical location. The paper concludes with an
outlook of future opportunities for XAI in medical image analysis.Comment: Submitted for publication. Comments welcome by email to first autho
- …