15,913 research outputs found
Learning to Rank from Samples of Variable Quality
Training deep neural networks requires many training samples, but in
practice, training labels are expensive to obtain and may be of varying
quality, as some may be from trusted expert labelers while others might be from
heuristics or other sources of weak supervision such as crowd-sourcing. This
creates a fundamental quality-versus quantity trade-off in the learning
process. Do we learn from the small amount of high-quality data or the
potentially large amount of weakly-labeled data? We argue that if the learner
could somehow know and take the label-quality into account when learning the
data representation, we could get the best of both worlds. To this end, we
introduce "fidelity-weighted learning" (FWL), a semi-supervised student-teacher
approach for training deep neural networks using weakly-labeled data. FWL
modulates the parameter updates to a student network (trained on the task we
care about) on a per-sample basis according to the posterior confidence of its
label-quality estimated by a teacher (who has access to the high-quality
labels). Both student and teacher are learned from the data. We evaluate FWL on
document ranking where we outperform state-of-the-art alternative
semi-supervised methods.Comment: Presented at The First International SIGIR2016 Workshop on Learning
From Limited Or Noisy Data For Information Retrieval. arXiv admin note:
substantial text overlap with arXiv:1711.0279
Fidelity-Weighted Learning
Training deep neural networks requires many training samples, but in practice
training labels are expensive to obtain and may be of varying quality, as some
may be from trusted expert labelers while others might be from heuristics or
other sources of weak supervision such as crowd-sourcing. This creates a
fundamental quality versus-quantity trade-off in the learning process. Do we
learn from the small amount of high-quality data or the potentially large
amount of weakly-labeled data? We argue that if the learner could somehow know
and take the label-quality into account when learning the data representation,
we could get the best of both worlds. To this end, we propose
"fidelity-weighted learning" (FWL), a semi-supervised student-teacher approach
for training deep neural networks using weakly-labeled data. FWL modulates the
parameter updates to a student network (trained on the task we care about) on a
per-sample basis according to the posterior confidence of its label-quality
estimated by a teacher (who has access to the high-quality labels). Both
student and teacher are learned from the data. We evaluate FWL on two tasks in
information retrieval and natural language processing where we outperform
state-of-the-art alternative semi-supervised methods, indicating that our
approach makes better use of strong and weak labels, and leads to better
task-dependent data representations.Comment: Published as a conference paper at ICLR 201
Learning with Weak Supervision for Email Intent Detection
Email remains one of the most frequently used means of online communication.
People spend a significant amount of time every day on emails to exchange
information, manage tasks and schedule events. Previous work has studied
different ways for improving email productivity by prioritizing emails,
suggesting automatic replies or identifying intents to recommend appropriate
actions. The problem has been mostly posed as a supervised learning problem
where models of different complexities were proposed to classify an email
message into a predefined taxonomy of intents or classes. The need for labeled
data has always been one of the largest bottlenecks in training supervised
models. This is especially the case for many real-world tasks, such as email
intent classification, where large scale annotated examples are either hard to
acquire or unavailable due to privacy or data access constraints. Email users
often take actions in response to intents expressed in an email (e.g., setting
up a meeting in response to an email with a scheduling request). Such actions
can be inferred from user interaction logs. In this paper, we propose to
leverage user actions as a source of weak supervision, in addition to a limited
set of annotated examples, to detect intents in emails. We develop an
end-to-end robust deep neural network model for email intent identification
that leverages both clean annotated data and noisy weak supervision along with
a self-paced learning mechanism. Extensive experiments on three different
intent detection tasks show that our approach can effectively leverage the
weakly supervised data to improve intent detection in emails.Comment: 10 pages, 3 figure
Learning to Detect Noisy Labels Using Model-Based Features
Label noise is ubiquitous in various machine learning scenarios such as
self-labeling with model predictions and erroneous data annotation. Many
existing approaches are based on heuristics such as sample losses, which might
not be flexible enough to achieve optimal solutions. Meta learning based
methods address this issue by learning a data selection function, but can be
hard to optimize. In light of these pros and cons, we propose
Selection-Enhanced Noisy label Training (SENT) that does not rely on meta
learning while having the flexibility of being data-driven. SENT transfers the
noise distribution to a clean set and trains a model to distinguish noisy
labels from clean ones using model-based features. Empirically, on a wide range
of tasks including text classification and speech recognition, SENT improves
performance over strong baselines under the settings of self-training and label
corruption
- …