1,927 research outputs found
Learning to Rank from Samples of Variable Quality
Training deep neural networks requires many training samples, but in
practice, training labels are expensive to obtain and may be of varying
quality, as some may be from trusted expert labelers while others might be from
heuristics or other sources of weak supervision such as crowd-sourcing. This
creates a fundamental quality-versus quantity trade-off in the learning
process. Do we learn from the small amount of high-quality data or the
potentially large amount of weakly-labeled data? We argue that if the learner
could somehow know and take the label-quality into account when learning the
data representation, we could get the best of both worlds. To this end, we
introduce "fidelity-weighted learning" (FWL), a semi-supervised student-teacher
approach for training deep neural networks using weakly-labeled data. FWL
modulates the parameter updates to a student network (trained on the task we
care about) on a per-sample basis according to the posterior confidence of its
label-quality estimated by a teacher (who has access to the high-quality
labels). Both student and teacher are learned from the data. We evaluate FWL on
document ranking where we outperform state-of-the-art alternative
semi-supervised methods.Comment: Presented at The First International SIGIR2016 Workshop on Learning
From Limited Or Noisy Data For Information Retrieval. arXiv admin note:
substantial text overlap with arXiv:1711.0279
IALE: Imitating Active Learner Ensembles
Active learning (AL) prioritizes the labeling of the most informative data
samples. However, the performance of AL heuristics depends on the structure of
the underlying classifier model and the data. We propose an imitation learning
scheme that imitates the selection of the best expert heuristic at each stage
of the AL cycle in a batch-mode pool-based setting. We use DAGGER to train the
policy on a dataset and later apply it to datasets from similar domains. With
multiple AL heuristics as experts, the policy is able to reflect the choices of
the best AL heuristics given the current state of the AL process. Our
experiment on well-known datasets show that we both outperform state of the art
imitation learners and heuristics.Comment: 17 page
Fidelity-Weighted Learning
Training deep neural networks requires many training samples, but in practice
training labels are expensive to obtain and may be of varying quality, as some
may be from trusted expert labelers while others might be from heuristics or
other sources of weak supervision such as crowd-sourcing. This creates a
fundamental quality versus-quantity trade-off in the learning process. Do we
learn from the small amount of high-quality data or the potentially large
amount of weakly-labeled data? We argue that if the learner could somehow know
and take the label-quality into account when learning the data representation,
we could get the best of both worlds. To this end, we propose
"fidelity-weighted learning" (FWL), a semi-supervised student-teacher approach
for training deep neural networks using weakly-labeled data. FWL modulates the
parameter updates to a student network (trained on the task we care about) on a
per-sample basis according to the posterior confidence of its label-quality
estimated by a teacher (who has access to the high-quality labels). Both
student and teacher are learned from the data. We evaluate FWL on two tasks in
information retrieval and natural language processing where we outperform
state-of-the-art alternative semi-supervised methods, indicating that our
approach makes better use of strong and weak labels, and leads to better
task-dependent data representations.Comment: Published as a conference paper at ICLR 201
Deep Active Learning for Named Entity Recognition
Deep learning has yielded state-of-the-art performance on many natural
language processing tasks including named entity recognition (NER). However,
this typically requires large amounts of labeled data. In this work, we
demonstrate that the amount of labeled training data can be drastically reduced
when deep learning is combined with active learning. While active learning is
sample-efficient, it can be computationally expensive since it requires
iterative retraining. To speed this up, we introduce a lightweight architecture
for NER, viz., the CNN-CNN-LSTM model consisting of convolutional character and
word encoders and a long short term memory (LSTM) tag decoder. The model
achieves nearly state-of-the-art performance on standard datasets for the task
while being computationally much more efficient than best performing models. We
carry out incremental active learning, during the training process, and are
able to nearly match state-of-the-art performance with just 25\% of the
original training data
- …