38,206 research outputs found
Deep Active Learning for Named Entity Recognition
Deep learning has yielded state-of-the-art performance on many natural
language processing tasks including named entity recognition (NER). However,
this typically requires large amounts of labeled data. In this work, we
demonstrate that the amount of labeled training data can be drastically reduced
when deep learning is combined with active learning. While active learning is
sample-efficient, it can be computationally expensive since it requires
iterative retraining. To speed this up, we introduce a lightweight architecture
for NER, viz., the CNN-CNN-LSTM model consisting of convolutional character and
word encoders and a long short term memory (LSTM) tag decoder. The model
achieves nearly state-of-the-art performance on standard datasets for the task
while being computationally much more efficient than best performing models. We
carry out incremental active learning, during the training process, and are
able to nearly match state-of-the-art performance with just 25\% of the
original training data
Task Selection for Bandit-Based Task Assignment in Heterogeneous Crowdsourcing
Task selection (picking an appropriate labeling task) and worker selection
(assigning the labeling task to a suitable worker) are two major challenges in
task assignment for crowdsourcing. Recently, worker selection has been
successfully addressed by the bandit-based task assignment (BBTA) method, while
task selection has not been thoroughly investigated yet. In this paper, we
experimentally compare several task selection strategies borrowed from active
learning literature, and show that the least confidence strategy significantly
improves the performance of task assignment in crowdsourcing.Comment: arXiv admin note: substantial text overlap with arXiv:1507.0580
Learning Active Learning from Data
In this paper, we suggest a novel data-driven approach to active learning
(AL). The key idea is to train a regressor that predicts the expected error
reduction for a candidate sample in a particular learning state. By formulating
the query selection procedure as a regression problem we are not restricted to
working with existing AL heuristics; instead, we learn strategies based on
experience from previous AL outcomes. We show that a strategy can be learnt
either from simple synthetic 2D datasets or from a subset of domain-specific
data. Our method yields strategies that work well on real data from a wide
range of domains
Expected exponential loss for gaze-based video and volume ground truth annotation
Many recent machine learning approaches used in medical imaging are highly
reliant on large amounts of image and ground truth data. In the context of
object segmentation, pixel-wise annotations are extremely expensive to collect,
especially in video and 3D volumes. To reduce this annotation burden, we
propose a novel framework to allow annotators to simply observe the object to
segment and record where they have looked at with a \$200 eye gaze tracker. Our
method then estimates pixel-wise probabilities for the presence of the object
throughout the sequence from which we train a classifier in semi-supervised
setting using a novel Expected Exponential loss function. We show that our
framework provides superior performances on a wide range of medical image
settings compared to existing strategies and that our method can be combined
with current crowd-sourcing paradigms as well.Comment: 9 pages, 5 figues, MICCAI 2017 - LABELS Worksho
- …