235 research outputs found
A Commentary on the Unsupervised Learning of Disentangled Representations
The goal of the unsupervised learning of disentangled representations is to
separate the independent explanatory factors of variation in the data without
access to supervision. In this paper, we summarize the results of Locatello et
al., 2019, and focus on their implications for practitioners. We discuss the
theoretical result showing that the unsupervised learning of disentangled
representations is fundamentally impossible without inductive biases and the
practical challenges it entails. Finally, we comment on our experimental
findings, highlighting the limitations of state-of-the-art approaches and
directions for future research
Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization
The self-media era provides us tremendous high quality videos. Unfortunately,
frequent video copyright infringements are now seriously damaging the interests
and enthusiasm of video creators. Identifying infringing videos is therefore a
compelling task. Current state-of-the-art methods tend to simply feed
high-dimensional mixed video features into deep neural networks and count on
the networks to extract useful representations. Despite its simplicity, this
paradigm heavily relies on the original entangled features and lacks
constraints guaranteeing that useful task-relevant semantics are extracted from
the features.
In this paper, we seek to tackle the above challenges from two aspects: (1)
We propose to disentangle an original high-dimensional feature into multiple
sub-features, explicitly disentangling the feature into exclusive
lower-dimensional components. We expect the sub-features to encode
non-overlapping semantics of the original feature and remove redundant
information.
(2) On top of the disentangled sub-features, we further learn an auxiliary
feature to enhance the sub-features. We theoretically analyzed the mutual
information between the label and the disentangled features, arriving at a loss
that maximizes the extraction of task-relevant information from the original
feature.
Extensive experiments on two large-scale benchmark datasets (i.e., SVD and
VCSL) demonstrate that our method achieves 90.1% TOP-100 mAP on the large-scale
SVD dataset and also sets the new state-of-the-art on the VCSL benchmark
dataset. Our code and model have been released at
https://github.com/yyyooooo/DMI/, hoping to contribute to the community.Comment: This paper is accepted by ACM MM 202
Representation learning for generalisation in medical image analysis
To help diagnose, treat, manage, prevent and predict diseases, medical image analysis plays an
increasingly crucial role in modern health care. In particular, using machine learning (ML) and
deep learning (DL) techniques to process medical imaging data such as MRI, CT and X-Rays
scans has been a research hot topic. Accurate and generalisable medical image segmentation
using ML and DL is one of the most challenging medical image analysis tasks. The challenges
are mainly caused by two key reasons: a) the variations of data statistics across different clinical centres or hospitals, and b) the lack of extensive annotations of medical data.
To tackle the above challenges, one of the best ways is to learn disentangled representations.
Learning disentangled representations aims to separate out, or disentangle, the underlying explanatory generative factors into disjoint subsets. Importantly, disentangled representations can be efficiently learnt from raw training data with limited annotations. Although, it is evident
that learning disentangled representations is well suited for the challenges, there are several
open problems in this area. First, there is no work to systematically study how much disentanglement is achieved with different learning and design biases and how different biases affect the task performance for medical data. Second, the benefit of leveraging disentanglement to design models that generalise well on new data has not been well studied especially in medical domain. Finally, the independence prior for disentanglement is a too strong assumption that does not approximate well the true generative factors. According to these problems, this thesis focuses on understanding the role of disentanglement in medical image analysis, measuring how different biases affect disentanglement and the task performance, and then finally using disentangled representations to improve generalisation performance and exploring better representations beyond disentanglement.
In the medical domain, content-style disentanglement is one of the most effective frameworks
to learn disentangled presentations. It disentangles and encodes image “content” into a spatial
tensor, and image appearance or “style” into a vector that contains information on imaging characteristics. Based on an extensive review of disentanglement, I conclude that it is unclear how different design and learning biases affect the performance of content-style disentanglement methods. Hence, two metrics are proposed to measure the degree of content-style disentanglement by evaluating the informativeness and correlation of representations. By modifying the design and learning biases in three popular content-style disentanglement models, the degree of disentanglement and task performance of different model variants have been evaluated. A key conclusion is that there exists a sweet spot between task performance and the degree of disentanglement; achieving this sweet spot is the key to design disentanglement models.
Generalising deep models to new data from new centres (termed here domains) remains a challenge. This is largely attributed to shifts in data statistics (domain shifts) between source and unseen domains. With the findings of aforementioned disentanglement metrics study, I design two content-style disentanglement approaches for generalisation. First, I propose two data augmentation methods that improve generalisation. The Resolution Augmentation method generates more diverse data by rescaling images to different resolutions. Subsequently, the Factor-based Augmentation method generates more diverse data by projecting the original samples onto disentangled latent spaces, and combining the learned content and style factors from different domains. To learn more generalisable representations, I integrate gradient-based meta-learning in disentanglement. Gradient-based meta-learning splits the training data into meta-train and meta-test sets to simulate and handle the domain shifts during training, which has shown superior generalisation performance. Considering limited annotations of data, I propose a novel semi-supervised meta-learning framework with disentanglement. I explicitly model the representations related to domain shifts. Disentangling the representations and combining them to reconstruct the input image, allows unlabeled data to be used to better approximate the true domain shifts within a meta-learning setting.
Humans can quickly learn to accurately recognise anatomy of interest from medical images
with limited guidance. Such recognition ability can easily generalise to new images from different clinical centres and new tasks in other contexts. This rapid and generalisable learning
ability is mostly due to the compositional structure of image patterns in the human brain, which is less incorporated in the medical domain. In this thesis, I explore how compositionality can be applied to learning more interpretable and generalisable representations. Overall, I propose that the ground-truth generative factors that generate the medical images satisfy the compositional equivariance property. Hence, a good representation that approximates well the ground-truth factor has to be compositionally equivariant. By modelling the compositional representations with the learnable von-Mises-Fisher kernels, I explore how different design and learning biases can be used to enforce the representations to be more compositionally equivariant under different learning settings.
Overall, this thesis creates new avenues for further research in the area of generalisable representation learning in medical image analysis, which we believe are key to more generalised machine learning and deep learning solutions in healthcare. In particular, the proposed metrics can be used to guide future work on designing better content-style frameworks. The disentanglement-based meta-learning approach sheds light on leveraging meta-learning for better model generalisation in a low-data regime. Finally, compositional representation learning we believe will play an increasingly important role in designing more generalisable and interpretable models in the future
Towards a common theory of explanation for artificial and biological intelligence
Much of the confusion that occurs when working at the intersection of cognitive science, artificial intelligence, and neuroscience stems from disagreement about what it means to explain intelligence. I claim that to integrate these fields, we must reconcile their different theories of explanation. I briefly review theories of scientific explanation in neuroscience and recontextualize the stated views of several prominent cognitive computational neuroscientists in terms of the theories of explanation they espouse. Finally, I describe some of the challenges of forging a new theory of explanation that would apply equally to artificial and biological intelligence. As a first step towards an integration of research on biological and artificial intelligence, my goal in writing this paper is to equip scientists of intelligence to interrogate and justify the theories of explanation that underlie their definitions of scientific progress
Towards a common theory of explanation for artificial and biological intelligence
Much of the confusion that occurs when working at the intersection of cognitive science, artificial intelligence, and neuroscience stems from disagreement about what it means to explain intelligence. I claim that to integrate these fields, we must reconcile their different theories of explanation. I briefly review theories of scientific explanation in neuroscience and recontextualize the stated views of several prominent cognitive computational neuroscientists in terms of the theories of explanation they espouse. Finally, I describe some of the challenges of forging a new theory of explanation that would apply equally to artificial and biological intelligence. As a first step towards an integration of research on biological and artificial intelligence, my goal in writing this paper is to equip scientists of intelligence to interrogate and justify the theories of explanation that underlie their definitions of scientific progress
- …