2 research outputs found
Sampling Approach Matters: Active Learning for Robotic Language Acquisition
Ordering the selection of training data using active learning can lead to
improvements in learning efficiently from smaller corpora. We present an
exploration of active learning approaches applied to three grounded language
problems of varying complexity in order to analyze what methods are suitable
for improving data efficiency in learning. We present a method for analyzing
the complexity of data in this joint problem space, and report on how
characteristics of the underlying task, along with design decisions such as
feature selection and classification model, drive the results. We observe that
representativeness, along with diversity, is crucial in selecting data samples.Comment: To appear in IEEE Big Data 202
Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling
Obtaining large annotated datasets is critical for training successful
machine learning models and it is often a bottleneck in practice. Weak
supervision offers a promising alternative for producing labeled datasets
without ground truth annotations by generating probabilistic labels using
multiple noisy heuristics. This process can scale to large datasets and has
demonstrated state of the art performance in diverse domains such as healthcare
and e-commerce. One practical issue with learning from user-generated
heuristics is that their creation requires creativity, foresight, and domain
expertise from those who hand-craft them, a process which can be tedious and
subjective. We develop the first framework for interactive weak supervision in
which a method proposes heuristics and learns from user feedback given on each
proposed heuristic. Our experiments demonstrate that only a small number of
feedback iterations are needed to train models that achieve highly competitive
test set performance without access to ground truth training labels. We conduct
user studies, which show that users are able to effectively provide feedback on
heuristics and that test set results track the performance of simulated
oracles.Comment: Accepted as a conference paper at ICLR 202