1,010 research outputs found
A comprehensive survey on deep active learning and its applications in medical image analysis
Deep learning has achieved widespread success in medical image analysis,
leading to an increasing demand for large-scale expert-annotated medical image
datasets. Yet, the high cost of annotating medical images severely hampers the
development of deep learning in this field. To reduce annotation costs, active
learning aims to select the most informative samples for annotation and train
high-performance models with as few labeled samples as possible. In this
survey, we review the core methods of active learning, including the evaluation
of informativeness and sampling strategy. For the first time, we provide a
detailed summary of the integration of active learning with other
label-efficient techniques, such as semi-supervised, self-supervised learning,
and so on. Additionally, we also highlight active learning works that are
specifically tailored to medical image analysis. In the end, we offer our
perspectives on the future trends and challenges of active learning and its
applications in medical image analysis.Comment: Paper List on Github:
https://github.com/LightersWang/Awesome-Active-Learning-for-Medical-Image-Analysi
Attention Mechanism for Recognition in Computer Vision
It has been proven that humans do not focus their attention on an entire scene at once when they perform a recognition task. Instead, they pay attention to the most important parts of the scene to extract the most discriminative information. Inspired by this observation, in this dissertation, the importance of attention mechanism in recognition tasks in computer vision is studied by designing novel attention-based models. In specific, four scenarios are investigated that represent the most important aspects of attention mechanism.First, an attention-based model is designed to reduce the visual features\u27 dimensionality by selectively processing only a small subset of the data. We study this aspect of the attention mechanism in a framework based on object recognition in distributed camera networks. Second, an attention-based image retrieval system (i.e., person re-identification) is proposed which learns to focus on the most discriminative regions of the person\u27s image and process those regions with higher computation power using a deep convolutional neural network. Furthermore, we show how visualizing the attention maps can make deep neural networks more interpretable. In other words, by visualizing the attention maps we can observe the regions of the input image where the neural network relies on, in order to make a decision. Third, a model for estimating the importance of the objects in a scene based on a given task is proposed. More specifically, the proposed model estimates the importance of the road users that a driver (or an autonomous vehicle) should pay attention to in a driving scenario in order to have safe navigation. In this scenario, the attention estimation is the final output of the model. Fourth, an attention-based module and a new loss function in a meta-learning based few-shot learning system is proposed in order to incorporate the context of the task into the feature representations of the samples and increasing the few-shot recognition accuracy.In this dissertation, we showed that attention can be multi-facet and studied the attention mechanism from the perspectives of feature selection, reducing the computational cost, interpretable deep learning models, task-driven importance estimation, and context incorporation. Through the study of four scenarios, we further advanced the field of where \u27\u27attention is all you need\u27\u27
Knowledge Transfer in Object Recognition.
PhD Thesis.Abstract
Object recognition is a fundamental and long-standing problem in computer vision. Since
the latest resurgence of deep learning, thousands of techniques have been proposed and brought
to commercial products to facilitate people’s daily life. Although remarkable achievements in
object recognition have been witnessed, existing machine learning approaches remain far away
from human vision system, especially in learning new concepts and Knowledge Transfer (KT)
across scenarios. One main reason is that current learning approaches address isolated tasks
by independently training predefined models, without considering any knowledge learned from
previous tasks or models. In contrast, humans have an inherent ability to transfer the knowledge
acquired from earlier tasks or people to new scenarios. Therefore, to scaling object recognition
in realistic deployment, effective KT schemes are required.
This thesis studies several aspects of KT for scaling object recognition systems. Specifically,
to facilitate the KT process, several mechanisms on fine-grained and coarse-grained object recognition
tasks are analyzed and studied, including 1) cross-class KT on person re-identification (reid);
2) cross-domain KT on person re-identification; 3) cross-model KT on image classification;
4) cross-task KT on image classification. In summary, four types of knowledge transfer schemes
are discussed as follows:
Chapter 3 Cross-class KT in person re-identification, one of representative fine-grained object
recognition tasks, is firstly investigated. The nature of person identity classes for person
re-id are totally disjoint between training and testing (a zero-shot learning problem), resulting
in the highly demand of cross-class KT. To solve that, existing person re-id approaches aim
to derive a feature representation for pairwise similarity based matching and ranking, which is
able to generalise to test. However, current person re-id methods assume the provision of accurately
cropped person bounding boxes and each of them is in the same resolution, ignoring the
impact of the background noise and variant scale of images to cross-class KT. This is more severed
in practice when person bounding boxes must be detected automatically given a very large
number of images and/or videos (un-constrained scene images) processed. To address these challenges,
this chapter provides two novel approaches, aiming to promote cross-class KT and boost
re-id performance. 1) This chapter alleviates inaccurate person bounding box by developing a
joint learning deep model that optimises person re-id attention selection within any auto-detected
person bounding boxes by reinforcement learning of background clutter minimisation. Specifically,
this chapter formulates a novel unified re-id architecture called Identity DiscriminativE
Attention reinforcement Learning (IDEAL) to accurately select re-id attention in auto-detected
bounding boxes for optimising re-id performance. 2) This chapter addresses multi-scale problem
by proposing a Cross-Level Semantic Alignment (CLSA) deep learning approach capable of
learning more discriminative identity feature representations in a unified end-to-end model. This
4
is realised by exploiting the in-network feature pyramid structure of a deep neural network enhanced
by a novel cross pyramid-level semantic alignment loss function. Extensive experiments
show the modelling advantages and performance superiority of both IDEAL and CLSA over the
state-of-the-art re-id methods on widely used benchmarking datasets.
Chapter 4 In this chapter, we address the problem of cross-domain KT in unsupervised
domain adaptation for person re-id. Specifically, this chapter considers cross-domain KT as
follows: 1) Unsupervised domain adaptation: “train once, run once” pattern, transferring knowledge
from source domain to specific target domain and the model is restricted to be applied
on target domain only; 2) Universal re-id: “train once, run everywhere” pattern, transferring
knowledge from source domain to any target domains, and therefore is capable of deploying any
domains of re-id task. This chapter firstly develops a novel Hierarchical Unsupervised Domain
Adaptation (HUDA) method for unsupervised domain adaptation for re-id. It can automatically
transfer labelled information of an existing dataset (a source domain) to an unlabelled target
domain for unsupervised person re-id. Specifically, HUDA is designed to model jointly global
distribution alignment and local instance alignment in a two-level hierarchy for discovering transferable
source knowledge in unsupervised domain adaptation. Crucially, this approach aims to
overcome the under-constrained learning problem of existing unsupervised domain adaptation
methods, lacking of the local instance alignment constraint. The consequence is more effective
and cross-domain KT from the labelled source domain to the unlabelled target domain. This
chapter further addresses the limitation of “train once, run once ” for existing domain adaptation
person re-id approaches by presenting a novel “train once, run everywhere” pattern. This
conventional “train once, run once” pattern is unscalable to a large number of target domains
typically encountered in real-world deployments, due to the requirement of training a separate
model for each target domain as supervised learning methods. To mitigate this weakness, a novel
“Universal Model Learning” (UML) approach is formulated to enable domain-generic person
re-id using only limited training data of a “single” seed domain. Specifically, UML trains a universal
re-id model to discriminate between a set of transformed person identity classes. Each of
such classes is formed by applying a variety of random appearance transformations to the images
of that class, where the transformations simulate camera viewing conditions of any domains for
making the model domain generic.
Chapter 5 The third problem considered in this thesis is cross-model KT in coarse-grained
object recognition. This chapter discusses knowledge distillation in image classification. Knowledge
distillation is an effective approach to transfer knowledge from a large teacher neural network
to a small student (target) network for satisfying the low-memory and fast running requirements.
Whilst being able to create stronger target networks compared to the vanilla non-teacher
based learning strategy, this scheme needs to train additionally a large teacher model with expensive
computational cost and requires complex multi-stages training. This chapter firstly presents
a Self-Referenced Deep Learning (SRDL) strategy to accelerate the training process. Unlike
both vanilla optimisation and knowledge distillation, SRDL distils the knowledge discovered
by the in-training target model back to itself for regularising the subsequent learning procedure
therefore eliminating the need for training a large teacher model. Secondly, an On-the-fly Native
Ensemble (ONE) learning strategy for one-stage knowledge distillation is proposed to solve the
weakness of complex multi-stages training. Specifically, ONE only trains a single multi-branch
network while simultaneously establishing a strong teacher on-the-fly to enhance the learning of
target network.
Chapter 6 Forth, this thesis studies the cross-task KT in coarse-grained object recognition.
This chapter focuses on the few-shot classification problem, which aims to train models capable
of recognising new, previously unseen categories from the novel task by using only limited training
samples. Existing metric learning approaches constitute a highly popular strategy, learning
discriminative representations such that images, containing different classes, are well separated
in an embedding space. The commonly held assumption that each class is summarised by a sin5
gle, global representation (referred to as a prototype) that is then used as a reference to infer class
labels brings significant drawbacks. This formulation fails to capture the complex multi-modal
latent distributions that often exist in real-world problems, and yields models that are highly
sensitive to the prototype quality. To address these limitations, this chapter proposes a novel
Mixture of Prototypes (MP) approach that learns multi-modal class representations, and can be
integrated into existing metric based methods. MP models class prototypes as a group of feature
representations carefully designed to be highly diverse and maximise ensembling performance.
Furthermore, this thesis investigates the benefit of incorporating unlabelled data in cross-task
KT, and focuses on the problem of Semi-Supervised Few-shot Learning (SS-FSL). Recent SSFSL
work has relied on popular Semi-Supervised Learning (SSL) concepts, involving iterative
pseudo-labelling, yet often yields models that are susceptible to error propagation and sensitive
to initialisation. To address this limitation, this chapter introduces a novel prototype-based approach
(Fewmatch) for SS-FSL that exploits model Consistency Regularization (CR) in a robust
manner and promotes cross-task unlabelled data knowledge transfer. Fewmatch exploits unlabelled
data via Dynamic Prototype Refinement (DPR) approach, where novel class prototypes
are alternatively refined 1) explicitly, using unlabelled data with high confidence class predictions
and 2) implicitly, by model fine-tuning using a data selective model CR loss. DPR affords
CR convergence, with the explicit refinement providing an increasingly stronger initialisation
and alleviates the issue of error propagation, due to the application of CR.
Chapter 7 draws conclusions and suggests future works that extend the ideas and methods
developed in this thesi
- …