289,211 research outputs found

    Active Discriminative Text Representation Learning

    Full text link
    We propose a new active learning (AL) method for text classification with convolutional neural networks (CNNs). In AL, one selects the instances to be manually labeled with the aim of maximizing model performance with minimal effort. Neural models capitalize on word embeddings as representations (features), tuning these to the task at hand. We argue that AL strategies for multi-layered neural models should focus on selecting instances that most affect the embedding space (i.e., induce discriminative word representations). This is in contrast to traditional AL approaches (e.g., entropy-based uncertainty sampling), which specify higher level objectives. We propose a simple approach for sentence classification that selects instances containing words whose embeddings are likely to be updated with the greatest magnitude, thereby rapidly learning discriminative, task-specific embeddings. We extend this approach to document classification by jointly considering: (1) the expected changes to the constituent word representations; and (2) the model's current overall uncertainty regarding the instance. The relative emphasis placed on these criteria is governed by a stochastic process that favors selecting instances likely to improve representations at the outset of learning, and then shifts toward general uncertainty sampling as AL progresses. Empirical results show that our method outperforms baseline AL approaches on both sentence and document classification tasks. We also show that, as expected, the method quickly learns discriminative word embeddings. To the best of our knowledge, this is the first work on AL addressing neural models for text classification.Comment: This paper got accepted by AAAI 201

    Active Learning Principles for In-Context Learning with Large Language Models

    Full text link
    The remarkable advancements in large language models (LLMs) have significantly enhanced the performance in few-shot learning settings. By using only a small number of labeled examples, referred to as demonstrations, LLMs can effectively grasp the task at hand through in-context learning. However, the process of selecting appropriate demonstrations has received limited attention in prior work. This paper addresses the issue of identifying the most informative demonstrations for few-shot learning by approaching it as a pool-based Active Learning (AL) problem over a single iteration. Our objective is to investigate how AL algorithms can serve as effective demonstration selection methods for in-context learning. We compare various standard AL algorithms based on uncertainty, diversity, and similarity, and consistently observe that the latter outperforms all other methods, including random sampling. Notably, uncertainty sampling, despite its success in conventional supervised learning scenarios, performs poorly in this context. Our extensive experimentation involving a diverse range of GPT and OPT models across 2424 classification and multi-choice tasks, coupled with thorough analysis, unambiguously demonstrates that in-context example selection through AL prioritizes high-quality examples that exhibit low uncertainty and bear similarity to the test examples.Comment: To appear at Findings of EMNLP (Camera Ready version

    Parametric active learning techniques for 3D hand pose estimation

    Get PDF
    Active learning (AL) has recently gained popularity for deep learning (DL) models due to efficient and informative sampling, especially when the models require large-scale datasets. The DL models designed for 3D-HPE demand accurate and diverse large-scale datasets that are time-consuming, costly and require experts. This thesis aims to explore AL primarily for the 3D hand pose estimation (3D-HPE) task for the first time. The thesis delves directly into an AL methodology customised for 3D-HPE learners to address this. Because predominantly the learners are regression-based algorithms, a Bayesian approximation of a DL architecture is presented to model uncertainties. This approximation generates data and model- dependent uncertainties that are further combined with the data representativeness AL function, CoreSet, for sampling. Despite being the first work, it creates informative samples and minimal joint errors with less training data on three well-known depth datasets. The second AL algorithm continues to improve the selection following a new trend of parametric samplers. Precisely, this is proceeded task-agnostic with a Graph Convolutional Network (GCN) to offer higher order of representations between labelled and unlabelled data. The newly selected unlabelled images are ranked based on uncertainty or GCN feature distribution. Another novel sampler extends this idea, and tackles encountered AL issues, like cold-start and distribution shift, by training in a self-supervised way with contrastive learning. It shows leveraging the visual concepts from labelled and unlabelled images while attaining state-of-the-art results. The last part of the thesis brings prior AL insights and achievements in a unified parametric-based sampler proposal for the multi-modal 3D-HPE task. This sampler trains multi-variational auto-encoders to align the modalities and provide better selection representation. Several query functions are studied to open a new direction in deep AL sampling.Open Acces

    Learning with Low-Quality Data: Multi-View Semi-Supervised Learning with Missing Views

    Get PDF
    The focus of this thesis is on learning approaches for what we call ``low-quality data'' and in particular data in which only small amounts of labeled target data is available. The first part provides background discussion on low-quality data issues, followed by preliminary study in this area. The remainder of the thesis focuses on a particular scenario: multi-view semi-supervised learning. Multi-view learning generally refers to the case of learning with data that has multiple natural views, or sets of features, associated with it. Multi-view semi-supervised learning methods try to exploit the combination of multiple views along with large amounts of unlabeled data in order to learn better predictive functions when limited labeled data is available. However, lack of complete view data limits the applicability of multi-view semi-supervised learning to real world data. Commonly, one data view is readily and cheaply available, but additionally views may be costly or only available in some cases. This thesis work aims to make multi-view semi-supervised learning approaches more applicable to real world data specifically by addressing the issue of missing views through both feature generation and active learning, and addressing the issue of model selection for semi-supervised learning with limited labeled data. This thesis introduces a unified approach for handling missing view data in multi-view semi-supervised learning tasks, which applies to both data with completely missing additional views and data only missing views in some instances. The idea is to learn a feature generation function mapping one view to another with the mapping biased to encourage the features generated to be useful for multi-view semi-supervised learning algorithms. The mapping is then used to fill in views as pre-processing. Unlike previously proposed single-view multi-view learning approaches, the proposed approach is able to take advantage of additional view data when available, and for the case of partial view presence is the first feature-generation approach specifically designed to take into account the multi-view semi-supervised learning aspect. The next component of this thesis is the analysis of an active view completion scenario. In some tasks, it is possible to obtain missing view data for a particular instance, but with some associated cost. Recent work has shown an active selection strategy can be more effective than a random one. In this thesis, a better understanding of active approaches is sought, and it is demonstrated that the effectiveness of an active selection strategy over a random one can depend on the relationship between the views. Finally, an important component of making multi-view semi-supervised learning applicable to real world data is the task of model selection, an open problem which is often avoided entirely in previous work. For cases of very limited labeled training data the commonly used cross-validation approach can become ineffective. This thesis introduces a re-training alternative to the method-dependent approaches similar in motivation to cross-validation, that involves generating new training and test data by sampling from the large amount of unlabeled data and estimated conditional probabilities for the labels. The proposed approaches are evaluated on a variety of multi-view semi-supervised learning data sets, and the experimental results demonstrate their efficacy

    Computational principles for an autonomous active vision system

    Full text link
    Vision research has uncovered computational principles that generalize across species and brain area. However, these biological mechanisms are not frequently implemented in computer vision algorithms. In this thesis, models suitable for application in computer vision were developed to address the benefits of two biologically-inspired computational principles: multi-scale sampling and active, space-variant, vision. The first model investigated the role of multi-scale sampling in motion integration. It is known that receptive fields of different spatial and temporal scales exist in the visual cortex; however, models addressing how this basic principle is exploited by species are sparse and do not adequately explain the data. The developed model showed that the solution to a classical problem in motion integration, the aperture problem, can be reframed as an emergent property of multi-scale sampling facilitated by fast, parallel, bi-directional connections at different spatial resolutions. Humans and most other mammals actively move their eyes to sample a scene (active vision); moreover, the resolution of detail in this sampling process is not uniform across spatial locations (space-variant). It is known that these eye-movements are not simply guided by image saliency, but are also influenced by factors such as spatial attention, scene layout, and task-relevance. However, it is seldom questioned how previous eye movements shape how one learns and recognizes an object in a continuously-learning system. To explore this question, a model (CogEye) was developed that integrates active, space-variant sampling with eye-movement selection (the where visual stream), and object recognition (the what visual stream). The model hypothesizes that a signal from the recognition system helps the where stream select fixation locations that best disambiguate object identity between competing alternatives. The third study used eye-tracking coupled with an object disambiguation psychophysics experiment to validate the second model, CogEye. While humans outperformed the model in recognition accuracy, when the model used information from the recognition pathway to help select future fixations, it was more similar to human eye movement patterns than when the model relied on image saliency alone. Taken together these results show that computational principles in the mammalian visual system can be used to improve computer vision models

    Exploiting Unlabeled Data in CNNs by Self-supervised Learning to Rank

    Get PDF
    For many applications the collection of labeled data is expensive laborious. Exploitation of unlabeled data during training is thus a long pursued objective of machine learning. Self-supervised learning addresses this by positing an auxiliary task (different, but related to the supervised task) for which data is abundantly available. In this paper, we show how ranking can be used as a proxy task for some regression problems. As another contribution, we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. We apply our framework to two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results for both IQA and crowd counting. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning and we show that this reduces labeling effort by up to 50%.Comment: Accepted at TPAMI. (Keywords: Learning from rankings, image quality assessment, crowd counting, active learning). arXiv admin note: text overlap with arXiv:1803.0309
    • …
    corecore