289,211 research outputs found
Active Discriminative Text Representation Learning
We propose a new active learning (AL) method for text classification with
convolutional neural networks (CNNs). In AL, one selects the instances to be
manually labeled with the aim of maximizing model performance with minimal
effort. Neural models capitalize on word embeddings as representations
(features), tuning these to the task at hand. We argue that AL strategies for
multi-layered neural models should focus on selecting instances that most
affect the embedding space (i.e., induce discriminative word representations).
This is in contrast to traditional AL approaches (e.g., entropy-based
uncertainty sampling), which specify higher level objectives. We propose a
simple approach for sentence classification that selects instances containing
words whose embeddings are likely to be updated with the greatest magnitude,
thereby rapidly learning discriminative, task-specific embeddings. We extend
this approach to document classification by jointly considering: (1) the
expected changes to the constituent word representations; and (2) the model's
current overall uncertainty regarding the instance. The relative emphasis
placed on these criteria is governed by a stochastic process that favors
selecting instances likely to improve representations at the outset of
learning, and then shifts toward general uncertainty sampling as AL progresses.
Empirical results show that our method outperforms baseline AL approaches on
both sentence and document classification tasks. We also show that, as
expected, the method quickly learns discriminative word embeddings. To the best
of our knowledge, this is the first work on AL addressing neural models for
text classification.Comment: This paper got accepted by AAAI 201
Active Learning Principles for In-Context Learning with Large Language Models
The remarkable advancements in large language models (LLMs) have
significantly enhanced the performance in few-shot learning settings. By using
only a small number of labeled examples, referred to as demonstrations, LLMs
can effectively grasp the task at hand through in-context learning. However,
the process of selecting appropriate demonstrations has received limited
attention in prior work. This paper addresses the issue of identifying the most
informative demonstrations for few-shot learning by approaching it as a
pool-based Active Learning (AL) problem over a single iteration. Our objective
is to investigate how AL algorithms can serve as effective demonstration
selection methods for in-context learning. We compare various standard AL
algorithms based on uncertainty, diversity, and similarity, and consistently
observe that the latter outperforms all other methods, including random
sampling. Notably, uncertainty sampling, despite its success in conventional
supervised learning scenarios, performs poorly in this context. Our extensive
experimentation involving a diverse range of GPT and OPT models across
classification and multi-choice tasks, coupled with thorough analysis,
unambiguously demonstrates that in-context example selection through AL
prioritizes high-quality examples that exhibit low uncertainty and bear
similarity to the test examples.Comment: To appear at Findings of EMNLP (Camera Ready version
Parametric active learning techniques for 3D hand pose estimation
Active learning (AL) has recently gained popularity for deep learning (DL) models due to efficient and informative sampling, especially when the models
require large-scale datasets. The DL models designed for 3D-HPE demand
accurate and diverse large-scale datasets that are time-consuming, costly and
require experts. This thesis aims to explore AL primarily for the 3D hand
pose estimation (3D-HPE) task for the first time.
The thesis delves directly into an AL methodology customised for 3D-HPE learners to address this. Because predominantly the learners are regression-based algorithms, a Bayesian approximation of a DL architecture is presented to model uncertainties. This approximation generates data and model-
dependent uncertainties that are further combined with the data representativeness AL function, CoreSet, for sampling. Despite being the first work, it
creates informative samples and minimal joint errors with less training data
on three well-known depth datasets.
The second AL algorithm continues to improve the selection following a
new trend of parametric samplers. Precisely, this is proceeded task-agnostic with a Graph Convolutional Network (GCN) to offer higher order of representations between labelled and unlabelled data. The newly selected unlabelled
images are ranked based on uncertainty or GCN feature distribution.
Another novel sampler extends this idea, and tackles encountered AL issues,
like cold-start and distribution shift, by training in a self-supervised way with
contrastive learning. It shows leveraging the visual concepts from labelled
and unlabelled images while attaining state-of-the-art results.
The last part of the thesis brings prior AL insights and achievements in a
unified parametric-based sampler proposal for the multi-modal 3D-HPE task.
This sampler trains multi-variational auto-encoders to align the modalities
and provide better selection representation. Several query functions are
studied to open a new direction in deep AL sampling.Open Acces
Learning with Low-Quality Data: Multi-View Semi-Supervised Learning with Missing Views
The focus of this thesis is on learning approaches for what we call ``low-quality data'' and in particular data in which only small amounts of labeled target data is available. The first part provides background discussion on low-quality data issues, followed by preliminary study in this area. The remainder of the thesis focuses on a particular scenario: multi-view semi-supervised learning. Multi-view learning generally refers to the case of learning with data that has multiple natural views, or sets of features, associated with it. Multi-view semi-supervised learning methods try to exploit the combination of multiple views along with large amounts of unlabeled data in order to learn better predictive functions when limited labeled data is available. However, lack of complete view data limits the applicability of multi-view semi-supervised learning to real world data. Commonly, one data view is readily and cheaply available, but additionally views may be costly or only available in some cases. This thesis work aims to make multi-view semi-supervised learning approaches more applicable to real world data specifically by addressing the issue of missing views through both feature generation and active learning, and addressing the issue of model selection for semi-supervised learning with limited labeled data. This thesis introduces a unified approach for handling missing view data in multi-view semi-supervised learning tasks, which applies to both data with completely missing additional views and data only missing views in some instances. The idea is to learn a feature generation function mapping one view to another with the mapping biased to encourage the features generated to be useful for multi-view semi-supervised learning algorithms. The mapping is then used to fill in views as pre-processing. Unlike previously proposed single-view multi-view learning approaches, the proposed approach is able to take advantage of additional view data when available, and for the case of partial view presence is the first feature-generation approach specifically designed to take into account the multi-view semi-supervised learning aspect. The next component of this thesis is the analysis of an active view completion scenario. In some tasks, it is possible to obtain missing view data for a particular instance, but with some associated cost. Recent work has shown an active selection strategy can be more effective than a random one. In this thesis, a better understanding of active approaches is sought, and it is demonstrated that the effectiveness of an active selection strategy over a random one can depend on the relationship between the views. Finally, an important component of making multi-view semi-supervised learning applicable to real world data is the task of model selection, an open problem which is often avoided entirely in previous work. For cases of very limited labeled training data the commonly used cross-validation approach can become ineffective. This thesis introduces a re-training alternative to the method-dependent approaches similar in motivation to cross-validation, that involves generating new training and test data by sampling from the large amount of unlabeled data and estimated conditional probabilities for the labels. The proposed approaches are evaluated on a variety of multi-view semi-supervised learning data sets, and the experimental results demonstrate their efficacy
Computational principles for an autonomous active vision system
Vision research has uncovered computational principles that generalize across species and brain area. However, these biological mechanisms are not frequently implemented in computer vision algorithms. In this thesis, models suitable for application in computer vision were developed to address the benefits of two biologically-inspired computational principles: multi-scale sampling and active, space-variant, vision.
The first model investigated the role of multi-scale sampling in motion integration. It is known that receptive fields of different spatial and temporal scales exist in the visual cortex; however, models addressing how this basic principle is exploited by species are sparse and do not adequately explain the data. The developed model showed that the solution to a classical problem in motion integration, the aperture problem, can be reframed as an emergent property of multi-scale sampling facilitated by fast, parallel, bi-directional connections at different spatial resolutions.
Humans and most other mammals actively move their eyes to sample a scene (active vision); moreover, the resolution of detail in this sampling process is not uniform across spatial locations (space-variant). It is known that these eye-movements are not simply guided by image saliency, but are also influenced by factors such as spatial attention, scene layout, and task-relevance. However, it is seldom questioned how previous eye movements shape how one learns and recognizes an object in a continuously-learning system. To explore this question, a model (CogEye) was developed that integrates active, space-variant sampling with eye-movement selection (the where visual stream), and object recognition (the what visual stream). The model hypothesizes that a signal from the recognition system helps the where stream select fixation locations that best disambiguate object identity between competing alternatives.
The third study used eye-tracking coupled with an object disambiguation psychophysics experiment to validate the second model, CogEye. While humans outperformed the model in recognition accuracy, when the model used information from the recognition pathway to help select future fixations, it was more similar to human eye movement patterns than when the model relied on image saliency alone.
Taken together these results show that computational principles in the mammalian visual system can be used to improve computer vision models
Exploiting Unlabeled Data in CNNs by Self-supervised Learning to Rank
For many applications the collection of labeled data is expensive laborious.
Exploitation of unlabeled data during training is thus a long pursued objective
of machine learning. Self-supervised learning addresses this by positing an
auxiliary task (different, but related to the supervised task) for which data
is abundantly available. In this paper, we show how ranking can be used as a
proxy task for some regression problems. As another contribution, we propose an
efficient backpropagation technique for Siamese networks which prevents the
redundant computation introduced by the multi-branch network architecture. We
apply our framework to two regression problems: Image Quality Assessment (IQA)
and Crowd Counting. For both we show how to automatically generate ranked image
sets from unlabeled data. Our results show that networks trained to regress to
the ground truth targets for labeled data and to simultaneously learn to rank
unlabeled data obtain significantly better, state-of-the-art results for both
IQA and crowd counting. In addition, we show that measuring network uncertainty
on the self-supervised proxy task is a good measure of informativeness of
unlabeled data. This can be used to drive an algorithm for active learning and
we show that this reduces labeling effort by up to 50%.Comment: Accepted at TPAMI. (Keywords: Learning from rankings, image quality
assessment, crowd counting, active learning). arXiv admin note: text overlap
with arXiv:1803.0309
- …