235 research outputs found

    A Commentary on the Unsupervised Learning of Disentangled Representations

    Full text link
    The goal of the unsupervised learning of disentangled representations is to separate the independent explanatory factors of variation in the data without access to supervision. In this paper, we summarize the results of Locatello et al., 2019, and focus on their implications for practitioners. We discuss the theoretical result showing that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases and the practical challenges it entails. Finally, we comment on our experimental findings, highlighting the limitations of state-of-the-art approaches and directions for future research

    Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization

    Full text link
    The self-media era provides us tremendous high quality videos. Unfortunately, frequent video copyright infringements are now seriously damaging the interests and enthusiasm of video creators. Identifying infringing videos is therefore a compelling task. Current state-of-the-art methods tend to simply feed high-dimensional mixed video features into deep neural networks and count on the networks to extract useful representations. Despite its simplicity, this paradigm heavily relies on the original entangled features and lacks constraints guaranteeing that useful task-relevant semantics are extracted from the features. In this paper, we seek to tackle the above challenges from two aspects: (1) We propose to disentangle an original high-dimensional feature into multiple sub-features, explicitly disentangling the feature into exclusive lower-dimensional components. We expect the sub-features to encode non-overlapping semantics of the original feature and remove redundant information. (2) On top of the disentangled sub-features, we further learn an auxiliary feature to enhance the sub-features. We theoretically analyzed the mutual information between the label and the disentangled features, arriving at a loss that maximizes the extraction of task-relevant information from the original feature. Extensive experiments on two large-scale benchmark datasets (i.e., SVD and VCSL) demonstrate that our method achieves 90.1% TOP-100 mAP on the large-scale SVD dataset and also sets the new state-of-the-art on the VCSL benchmark dataset. Our code and model have been released at https://github.com/yyyooooo/DMI/, hoping to contribute to the community.Comment: This paper is accepted by ACM MM 202

    Representation learning for generalisation in medical image analysis

    Get PDF
    To help diagnose, treat, manage, prevent and predict diseases, medical image analysis plays an increasingly crucial role in modern health care. In particular, using machine learning (ML) and deep learning (DL) techniques to process medical imaging data such as MRI, CT and X-Rays scans has been a research hot topic. Accurate and generalisable medical image segmentation using ML and DL is one of the most challenging medical image analysis tasks. The challenges are mainly caused by two key reasons: a) the variations of data statistics across different clinical centres or hospitals, and b) the lack of extensive annotations of medical data. To tackle the above challenges, one of the best ways is to learn disentangled representations. Learning disentangled representations aims to separate out, or disentangle, the underlying explanatory generative factors into disjoint subsets. Importantly, disentangled representations can be efficiently learnt from raw training data with limited annotations. Although, it is evident that learning disentangled representations is well suited for the challenges, there are several open problems in this area. First, there is no work to systematically study how much disentanglement is achieved with different learning and design biases and how different biases affect the task performance for medical data. Second, the benefit of leveraging disentanglement to design models that generalise well on new data has not been well studied especially in medical domain. Finally, the independence prior for disentanglement is a too strong assumption that does not approximate well the true generative factors. According to these problems, this thesis focuses on understanding the role of disentanglement in medical image analysis, measuring how different biases affect disentanglement and the task performance, and then finally using disentangled representations to improve generalisation performance and exploring better representations beyond disentanglement. In the medical domain, content-style disentanglement is one of the most effective frameworks to learn disentangled presentations. It disentangles and encodes image “content” into a spatial tensor, and image appearance or “style” into a vector that contains information on imaging characteristics. Based on an extensive review of disentanglement, I conclude that it is unclear how different design and learning biases affect the performance of content-style disentanglement methods. Hence, two metrics are proposed to measure the degree of content-style disentanglement by evaluating the informativeness and correlation of representations. By modifying the design and learning biases in three popular content-style disentanglement models, the degree of disentanglement and task performance of different model variants have been evaluated. A key conclusion is that there exists a sweet spot between task performance and the degree of disentanglement; achieving this sweet spot is the key to design disentanglement models. Generalising deep models to new data from new centres (termed here domains) remains a challenge. This is largely attributed to shifts in data statistics (domain shifts) between source and unseen domains. With the findings of aforementioned disentanglement metrics study, I design two content-style disentanglement approaches for generalisation. First, I propose two data augmentation methods that improve generalisation. The Resolution Augmentation method generates more diverse data by rescaling images to different resolutions. Subsequently, the Factor-based Augmentation method generates more diverse data by projecting the original samples onto disentangled latent spaces, and combining the learned content and style factors from different domains. To learn more generalisable representations, I integrate gradient-based meta-learning in disentanglement. Gradient-based meta-learning splits the training data into meta-train and meta-test sets to simulate and handle the domain shifts during training, which has shown superior generalisation performance. Considering limited annotations of data, I propose a novel semi-supervised meta-learning framework with disentanglement. I explicitly model the representations related to domain shifts. Disentangling the representations and combining them to reconstruct the input image, allows unlabeled data to be used to better approximate the true domain shifts within a meta-learning setting. Humans can quickly learn to accurately recognise anatomy of interest from medical images with limited guidance. Such recognition ability can easily generalise to new images from different clinical centres and new tasks in other contexts. This rapid and generalisable learning ability is mostly due to the compositional structure of image patterns in the human brain, which is less incorporated in the medical domain. In this thesis, I explore how compositionality can be applied to learning more interpretable and generalisable representations. Overall, I propose that the ground-truth generative factors that generate the medical images satisfy the compositional equivariance property. Hence, a good representation that approximates well the ground-truth factor has to be compositionally equivariant. By modelling the compositional representations with the learnable von-Mises-Fisher kernels, I explore how different design and learning biases can be used to enforce the representations to be more compositionally equivariant under different learning settings. Overall, this thesis creates new avenues for further research in the area of generalisable representation learning in medical image analysis, which we believe are key to more generalised machine learning and deep learning solutions in healthcare. In particular, the proposed metrics can be used to guide future work on designing better content-style frameworks. The disentanglement-based meta-learning approach sheds light on leveraging meta-learning for better model generalisation in a low-data regime. Finally, compositional representation learning we believe will play an increasingly important role in designing more generalisable and interpretable models in the future

    Towards a common theory of explanation for artificial and biological intelligence

    Get PDF
    Much of the confusion that occurs when working at the intersection of cognitive science, artificial intelligence, and neuroscience stems from disagreement about what it means to explain intelligence. I claim that to integrate these fields, we must reconcile their different theories of explanation. I briefly review theories of scientific explanation in neuroscience and recontextualize the stated views of several prominent cognitive computational neuroscientists in terms of the theories of explanation they espouse. Finally, I describe some of the challenges of forging a new theory of explanation that would apply equally to artificial and biological intelligence. As a first step towards an integration of research on biological and artificial intelligence, my goal in writing this paper is to equip scientists of intelligence to interrogate and justify the theories of explanation that underlie their definitions of scientific progress

    Towards a common theory of explanation for artificial and biological intelligence

    Get PDF
    Much of the confusion that occurs when working at the intersection of cognitive science, artificial intelligence, and neuroscience stems from disagreement about what it means to explain intelligence. I claim that to integrate these fields, we must reconcile their different theories of explanation. I briefly review theories of scientific explanation in neuroscience and recontextualize the stated views of several prominent cognitive computational neuroscientists in terms of the theories of explanation they espouse. Finally, I describe some of the challenges of forging a new theory of explanation that would apply equally to artificial and biological intelligence. As a first step towards an integration of research on biological and artificial intelligence, my goal in writing this paper is to equip scientists of intelligence to interrogate and justify the theories of explanation that underlie their definitions of scientific progress
    corecore