106,731 research outputs found
Efficient Learning with Soft Label Information and Multiple Annotators
Nowadays, large real-world data sets are collected in science, engineering, health care and other fields. These data provide us with a great resource for building automated learning systems. However, for many machine learning applications, data need to be annotated (labelled) by human before they can be used for learning. Unfortunately, the annotation process by a human expert is often very time-consuming and costly. As the result, the amount of labeled training data instances to learn from may be limited, which in turn influences the learning process and the quality of learned models. In this thesis, we investigate ways of improving the learning process in supervised classification settings in which labels are provided by human annotators. First, we study and propose a new classification learning framework, that learns, in addition to binary class label information, also from soft-label information reflecting the certainty or belief in the class label. We propose multiple methods, based on regression, max-margin and ranking methodologies, that use the soft label information in order to learn better classifiers with smaller training data and hence smaller annotation effort. We also study our soft-label approach when examples to be labeled next are selected online using active learning. Second, we study ways of distributing the annotation effort among multiple experts. We develop a new multiple-annotator learning framework that explicitly models and embraces annotator differences and biases in order to learn a consensus and annotator specific models. We demonstrate the benefits and advantages of our frameworks on both UCI data sets and our real-world clinical data extracted from Electronic Health Records
IST Austria Thesis
Deep learning is best known for its empirical success across a wide range of applications
spanning computer vision, natural language processing and speech. Of equal significance,
though perhaps less known, are its ramifications for learning theory: deep networks have
been observed to perform surprisingly well in the high-capacity regime, aka the overfitting
or underspecified regime. Classically, this regime on the far right of the bias-variance curve
is associated with poor generalisation; however, recent experiments with deep networks
challenge this view.
This thesis is devoted to investigating various aspects of underspecification in deep learning.
First, we argue that deep learning models are underspecified on two levels: a) any given
training dataset can be fit by many different functions, and b) any given function can be
expressed by many different parameter configurations. We refer to the second kind of
underspecification as parameterisation redundancy and we precisely characterise its extent.
Second, we characterise the implicit criteria (the inductive bias) that guide learning in the
underspecified regime. Specifically, we consider a nonlinear but tractable classification
setting, and show that given the choice, neural networks learn classifiers with a large margin.
Third, we consider learning scenarios where the inductive bias is not by itself sufficient to
deal with underspecification. We then study different ways of ‘tightening the specification’: i)
In the setting of representation learning with variational autoencoders, we propose a hand-
crafted regulariser based on mutual information. ii) In the setting of binary classification, we
consider soft-label (real-valued) supervision. We derive a generalisation bound for linear
networks supervised in this way and verify that soft labels facilitate fast learning. Finally, we
explore an application of soft-label supervision to the training of multi-exit models
Soft-Label Dataset Distillation and Text Dataset Distillation
Dataset distillation is a method for reducing dataset sizes by learning a
small number of synthetic samples containing all the information of a large
dataset. This has several benefits like speeding up model training, reducing
energy consumption, and reducing required storage space. Currently, each
synthetic sample is assigned a single `hard' label, and also, dataset
distillation can currently only be used with image data.
We propose to simultaneously distill both images and their labels, thus
assigning each synthetic sample a `soft' label (a distribution of labels). Our
algorithm increases accuracy by 2-4% over the original algorithm for several
image classification tasks. Using `soft' labels also enables distilled datasets
to consist of fewer samples than there are classes as each sample can encode
information for multiple classes. For example, training a LeNet model with 10
distilled images (one per class) results in over 96% accuracy on MNIST, and
almost 92% accuracy when trained on just 5 distilled images.
We also extend the dataset distillation algorithm to distill sequential
datasets including texts. We demonstrate that text distillation outperforms
other methods across multiple datasets. For example, models attain almost their
original accuracy on the IMDB sentiment analysis task using just 20 distilled
sentences.
Our code can be found at
Progressive Cross-camera Soft-label Learning for Semi-supervised Person Re-identification
In this paper, we focus on the semi-supervised person re-identification
(Re-ID) case, which only has the intra-camera (within-camera) labels but not
inter-camera (cross-camera) labels. In real-world applications, these
intra-camera labels can be readily captured by tracking algorithms or few
manual annotations, when compared with cross-camera labels. In this case, it is
very difficult to explore the relationships between cross-camera persons in the
training stage due to the lack of cross-camera label information. To deal with
this issue, we propose a novel Progressive Cross-camera Soft-label Learning
(PCSL) framework for the semi-supervised person Re-ID task, which can generate
cross-camera soft-labels and utilize them to optimize the network. Concretely,
we calculate an affinity matrix based on person-level features and adapt them
to produce the similarities between cross-camera persons (i.e., cross-camera
soft-labels). To exploit these soft-labels to train the network, we investigate
the weighted cross-entropy loss and the weighted triplet loss from the
classification and discrimination perspectives, respectively. Particularly, the
proposed framework alternately generates progressive cross-camera soft-labels
and gradually improves feature representations in the whole learning course.
Extensive experiments on five large-scale benchmark datasets show that PCSL
significantly outperforms the state-of-the-art unsupervised methods that employ
labeled source domains or the images generated by the GAN-based models.
Furthermore, the proposed method even has a competitive performance with
respect to deep supervised Re-ID methods.Comment: Accepted by IEEE Transactions on Circuits and Systems for Video
Technology (TCSVT
Fidelity-Weighted Learning
Training deep neural networks requires many training samples, but in practice
training labels are expensive to obtain and may be of varying quality, as some
may be from trusted expert labelers while others might be from heuristics or
other sources of weak supervision such as crowd-sourcing. This creates a
fundamental quality versus-quantity trade-off in the learning process. Do we
learn from the small amount of high-quality data or the potentially large
amount of weakly-labeled data? We argue that if the learner could somehow know
and take the label-quality into account when learning the data representation,
we could get the best of both worlds. To this end, we propose
"fidelity-weighted learning" (FWL), a semi-supervised student-teacher approach
for training deep neural networks using weakly-labeled data. FWL modulates the
parameter updates to a student network (trained on the task we care about) on a
per-sample basis according to the posterior confidence of its label-quality
estimated by a teacher (who has access to the high-quality labels). Both
student and teacher are learned from the data. We evaluate FWL on two tasks in
information retrieval and natural language processing where we outperform
state-of-the-art alternative semi-supervised methods, indicating that our
approach makes better use of strong and weak labels, and leads to better
task-dependent data representations.Comment: Published as a conference paper at ICLR 201
- …