226 research outputs found

    Unsupervised Deep Learning of Visual Representations

    Get PDF
    Interpreting visual signals from complex imagery and video data with a few or no human annotation is challenging yet essential for realising the true values of deep learning techniques in real-world scenarios. During the past decade, deep learning has achieved unprecedented breakthroughs in lots of computer vision fields. Nonetheless, to optimise a large number of parameters in deep neural networks for deriving complex mappings from an input visual space to a discriminative feature representational space, the success of deep learning is heavily relying on a massive amount of human-annotated training data. Collecting such manual annotations are labour-intensive, especially in large-scale that has been proven to be critical to learning generalisable models applicable to new and unseen data. This dramatically limits the usability and scalability of deep learning when being applied in practice. This thesis aims to reduce the reliance of learning deep neural networks on exhaustive human annotations by proposing novel algorithms to learn the underlying visual semantics with insufficient/inadequate manual labels, denoted as generalised unsupervised learning. Based on the different assumptions on the available sources of knowledge used for learning, this thesis studies generalised unsupervised deep learning from four perspectives including learning without any labels by knowledge aggregation from local data structure and knowledge discovery from global data structure, transferring knowledge from relevant labels, and propagating knowledge from incomplete labels. Specifically, novel methods are introduced to address unresolved challenges in these problems as follows: Chapter 3 The first problem is aggregating knowledge from local data structure, which assumes that apparent visual similarities (pixel intensity) among images are encoded in local neighbourhoods in a feature representational space, providing partially the underlying semantic relationships among samples. This thesis studies discriminative representation learning in this problem, aiming to derive visual features which are discriminative in terms of image’s semantic class memberships. This problem is challenging because it is scarcely possible without ground-truth labels to accurately determine reliable neighbourhoods encoding the same underlying class concepts, considering the arbitrarily complex appearance patterns and variations both within and across classes. Existing methods learning from hypothetical inter-sample relationships tend to be error-propagated as the incorrect pairwise supervisions are prone to accumulate across the training process and impact the learned representations. To that end, this thesis proposes to progressively discover sample anchored / centred neighbourhoods to reason and learn the underlying semantic relationships among samples iteratively and accumulatively. Moreover, a novel progressive affinity diffusion process is presented to propagate reliable inter-sample relationships across adjacent neighbourhoods, so as to further identify the within-class visual variation from between-class similarity and bridge the gap between low-level imagery appearance (e.g. pixel intensity) and high-level semantic concepts (e.g. object class memberships). Chapter 4 The second problem is discovering knowledge from global data structure, which makes an assumption that visual similarity among samples of the same semantic classes is generally higher than that of different classes. This thesis investigates deep clustering for solving this problem which simultaneously learns visual features and data grouping without any labels. Existing unsupervised deep learning algorithms fails to benefit from joint representations and partitions learning by either overlooking global class memberships (e.g. contrastive representation learning) or basing on unreliable pseudo labels estimated by updating feature representations that are subject to error-propagation during training. To benefit clustering of images from discriminative visual features derived by a representation learning process, a Semantic Contrastive Learning method is proposed in this thesis, which concurrently optimises both instance visual similarities and cluster decision boundaries to reason about the hypotheses of semantic classes by their consensus. What’s more, based on the observation that assigning visually similar samples into different clusters will implicitly reduce both the intra-cluster compactness and inter-cluster diversity and lead to lower partition confidence, this thesis presents an online deep clustering method named PartItion Confidence mAximisation. It is established on the idea of learning the most semantically plausible data separation by maximising the “global” partition confidence of clustering solution using a novel differentiable partition uncertainty index. Chapter 5 The third problem is transferring knowledge from relevant labels, which assumes the availability of manual labels in relevant domains and the existence of common knowledge shared across domains. This thesis studies transfer clustering in this problem, which aims at learning the semantic class memberships of the unlabelled target data in a novel (target) domain by knowledge transfer from a labelled source domain. Whilst enormous efforts have been made on data annotation during the past decade, accumulating knowledge from existing labelled data to benefit understanding the persistently emerging unlabelled data is intuitively more efficient than exhaustively annotating new data. However, considering the unpredictable changing nature of imagery data distributions, the accumulated pre-learned knowledge does not transfer well without making strong assumptions about the learned source and the novel target domains, e.g. from domain adaptation to zero-shot and few-shot learning. To address this problem and effectively transfer knowledge between domains that are different in both data distributions and label spaces, this thesis proposes a self-SUPervised REMEdy method to align knowledge of domains by learning jointly from the intrinsically available relative (pairwise) imagery information in the unlabelled target domain and the prior-knowledge learned from the labelled source domain, so as to benefit from both transfer and self-supervised learning. Chapter 6 The last problem is propagating knowledge from incomplete labels, with the assumption that incomplete labels (e.g. collective or inexact) are usually easier to be collected and available but tend to be less reliable. This thesis investigates video activity localisation in this problem to locate a short moment (video segment) in an untrimmed and unstructured video according to a natural language query. To derive discriminative representations of video segments to accurately match with sentences, a temporal annotation of the precise start/end frame indices of each target moments are usually required. However, such temporal labels are not only harder to be collected than pairing videos with sentences as they require carefully going through videos frame-by-frame, but also subject to labelling uncertainty due to the intrinsic ambiguity in a video activity’s boundary. To reduce annotation cost for deriving universal visual-textual correlations, a Cross-sentence Relations Mining method is introduced in this thesis to align video segments and query sentences when only a paragraph description of activities (collective label) in a video is available but not per-sentence temporal labels. This is accomplished by exploring cross-sentence relationships in a paragraph as constraints to better interpret and match complex moment-wise temporal and semantic relationships in videos. Moreover, this thesis also studies the problem of propagating knowledge to avoid the negative impacts of inexact labels. To that end, an Elastic Moment Bounding method is proposed, which accommodates flexible and adaptive activity temporal boundaries towards modelling universal video-text correlations with tolerance to underlying temporal uncertainties in pre-fixed human annotations

    Image Tagging using Modified Association Rule based on Semantic Neighbors

    Get PDF
    With the rapid development of the internet, mobiles, and social image-sharing websites, a large number of images are generated daily.  The huge repository of the images poses challenges for an image retrieval system. On image-sharing social websites such as Flickr, the users can assign keywords/tags to the images which can describe the content of the images. These tags play important role in an image retrieval system. However, the user-assigned tags are highly personalized which brings many challenges for retrieval of the images.  Thus, it is necessary to suggest appropriate tags to the images. Existing methods for tag recommendation based on nearest neighbors ignore the relationship between tags. In this paper, the method is proposed for tag recommendations for the images based on semantic neighbors using modified association rule. Given an image, the method identifies the semantic neighbors using random forest based on the weight assigned to each category. The tags associated with the semantic neighbors are used as candidate tags. The candidate tags are expanded by mining tags using modified association rules where each semantic neighbor is considered a transaction. In modified association rules, the probability of each tag is calculated using TF-IDF and confidence value. The experimentation is done on Flickr, NUS-WIDE, and Corel-5k datasets. The result obtained using the proposed method gives better performance as compared to the existing tag recommendation methods

    Unsupervised Deep Learning by Neighbourhood Discovery

    Get PDF
    Deep convolutional neural networks (CNNs) have demonstrated remarkable success in computer vision by supervisedly learning strong visual feature representations. However, training CNNs relies heavily on the availability of exhaustive training data annotations, limiting significantly their deployment and scalability in many application scenarios. In this work, we introduce a generic unsupervised deep learning approach to training deep models without the need for any manual label supervision. Specifically, we progressively discover sample anchored/centred neighbourhoods to reason and learn the underlying class decision boundaries iteratively and accumulatively. Every single neighbourhood is specially formulated so that all the member samples can share the same unseen class labels at high probability for facilitating the extraction of class discriminative feature representations during training. Experiments on image classification show the performance advantages of the proposed method over the state-of-the-art unsupervised learning models on six benchmarks including both coarse-grained and fine-grained object image categorisation.Comment: 36th International Conference on Machine Learning (ICML'19). Code is available at https://github.com/Raymond-sci/AN

    Article Segmentation in Digitised Newspapers

    Get PDF
    Digitisation projects preserve and make available vast quantities of historical text. Among these, newspapers are an invaluable resource for the study of human culture and history. Article segmentation identifies each region in a digitised newspaper page that contains an article. Digital humanities, information retrieval (IR), and natural language processing (NLP) applications over digitised archives improve access to text and allow automatic information extraction. The lack of article segmentation impedes these applications. We contribute a thorough review of the existing approaches to article segmentation. Our analysis reveals divergent interpretations of the task, and inconsistent and often ambiguously defined evaluation metrics, making comparisons between systems challenging. We solve these issues by contributing a detailed task definition that examines the nuances and intricacies of article segmentation that are not immediately apparent. We provide practical guidelines on handling borderline cases and devise a new evaluation framework that allows insightful comparison of existing and future approaches. Our review also reveals that the lack of large datasets hinders meaningful evaluation and limits machine learning approaches. We solve these problems by contributing a distant supervision method for generating large datasets for article segmentation. We manually annotate a portion of our dataset and show that our method produces article segmentations over characters nearly as well as costly human annotators. We reimplement the seminal textual approach to article segmentation (Aiello and Pegoretti, 2006) and show that it does not generalise well when evaluated on a large dataset. We contribute a framework for textual article segmentation that divides the task into two distinct phases: block representation and clustering. We propose several techniques for block representation and contribute a novel highly-compressed semantic representation called similarity embeddings. We evaluate and compare different clustering techniques, and innovatively apply label propagation (Zhu and Ghahramani, 2002) to spread headline labels to similar blocks. Our similarity embeddings and label propagation approach substantially outperforms Aiello and Pegoretti but still falls short of human performance. Exploring visual approaches to article segmentation, we reimplement and analyse the state-of-the-art Bansal et al. (2014) approach. We contribute an innovative 2D Markov model approach that captures reading order dependencies and reduces the structured labelling problem to a Markov chain that we decode with Viterbi (1967). Our approach substantially outperforms Bansal et al., achieves accuracy as good as human annotators, and establishes a new state of the art in article segmentation. Our task definition, evaluation framework, and distant supervision dataset will encourage progress in the task of article segmentation. Our state-of-the-art textual and visual approaches will allow sophisticated IR and NLP applications over digitised newspaper archives, supporting research in the digital humanities

    Cloud-Based Benchmarking of Medical Image Analysis

    Get PDF
    Medical imagin

    Reasoning with Uncertainty in Deep Learning for Safer Medical Image Computing

    Get PDF
    Deep learning is now ubiquitous in the research field of medical image computing. As such technologies progress towards clinical translation, the question of safety becomes critical. Once deployed, machine learning systems unavoidably face situations where the correct decision or prediction is ambiguous. However, the current methods disproportionately rely on deterministic algorithms, lacking a mechanism to represent and manipulate uncertainty. In safety-critical applications such as medical imaging, reasoning under uncertainty is crucial for developing a reliable decision making system. Probabilistic machine learning provides a natural framework to quantify the degree of uncertainty over different variables of interest, be it the prediction, the model parameters and structures, or the underlying data (images and labels). Probability distributions are used to represent all the uncertain unobserved quantities in a model and how they relate to the data, and probability theory is used as a language to compute and manipulate these distributions. In this thesis, we explore probabilistic modelling as a framework to integrate uncertainty information into deep learning models, and demonstrate its utility in various high-dimensional medical imaging applications. In the process, we make several fundamental enhancements to current methods. We categorise our contributions into three groups according to the types of uncertainties being modelled: (i) predictive; (ii) structural and (iii) human uncertainty. Firstly, we discuss the importance of quantifying predictive uncertainty and understanding its sources for developing a risk-averse and transparent medical image enhancement application. We demonstrate how a measure of predictive uncertainty can be used as a proxy for the predictive accuracy in the absence of ground-truths. Furthermore, assuming the structure of the model is flexible enough for the task, we introduce a way to decompose the predictive uncertainty into its orthogonal sources i.e. aleatoric and parameter uncertainty. We show the potential utility of such decoupling in providing a quantitative “explanations” into the model performance. Secondly, we introduce our recent attempts at learning model structures directly from data. One work proposes a method based on variational inference to learn a posterior distribution over connectivity structures within a neural network architecture for multi-task learning, and share some preliminary results in the MR-only radiotherapy planning application. Another work explores how the training algorithm of decision trees could be extended to grow the architecture of a neural network to adapt to the given availability of data and the complexity of the task. Lastly, we develop methods to model the “measurement noise” (e.g., biases and skill levels) of human annotators, and integrate this information into the learning process of the neural network classifier. In particular, we show that explicitly modelling the uncertainty involved in the annotation process not only leads to an improvement in robustness to label noise, but also yields useful insights into the patterns of errors that characterise individual experts

    Semantic Knowledge Graphs for the News: A Review

    Get PDF
    ICT platforms for news production, distribution, and consumption must exploit the ever-growing availability of digital data. These data originate from different sources and in different formats; they arrive at different velocities and in different volumes. Semantic knowledge graphs (KGs) is an established technique for integrating such heterogeneous information. It is therefore well-aligned with the needs of news producers and distributors, and it is likely to become increasingly important for the news industry. This article reviews the research on using semantic knowledge graphs for production, distribution, and consumption of news. The purpose is to present an overview of the field; to investigate what it means; and to suggest opportunities and needs for further research and development.publishedVersio
    • …
    corecore