12,597 research outputs found
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
A lot of the recent success in natural language processing (NLP) has been
driven by distributed vector representations of words trained on large amounts
of text in an unsupervised manner. These representations are typically used as
general purpose features for words across a range of NLP problems. However,
extending this success to learning representations of sequences of words, such
as sentences, remains an open problem. Recent work has explored unsupervised as
well as supervised learning techniques with different training objectives to
learn general purpose fixed-length sentence representations. In this work, we
present a simple, effective multi-task learning framework for sentence
representations that combines the inductive biases of diverse training
objectives in a single model. We train this model on several data sources with
multiple training objectives on over 100 million sentences. Extensive
experiments demonstrate that sharing a single recurrent sentence encoder across
weakly related tasks leads to consistent improvements over previous methods. We
present substantial improvements in the context of transfer learning and
low-resource settings using our learned general-purpose representations.Comment: Accepted at ICLR 201
Knowledge Base Population using Semantic Label Propagation
A crucial aspect of a knowledge base population system that extracts new
facts from text corpora, is the generation of training data for its relation
extractors. In this paper, we present a method that maximizes the effectiveness
of newly trained relation extractors at a minimal annotation cost. Manual
labeling can be significantly reduced by Distant Supervision, which is a method
to construct training data automatically by aligning a large text corpus with
an existing knowledge base of known facts. For example, all sentences
mentioning both 'Barack Obama' and 'US' may serve as positive training
instances for the relation born_in(subject,object). However, distant
supervision typically results in a highly noisy training set: many training
sentences do not really express the intended relation. We propose to combine
distant supervision with minimal manual supervision in a technique called
feature labeling, to eliminate noise from the large and noisy initial training
set, resulting in a significant increase of precision. We further improve on
this approach by introducing the Semantic Label Propagation method, which uses
the similarity between low-dimensional representations of candidate training
instances, to extend the training set in order to increase recall while
maintaining high precision. Our proposed strategy for generating training data
is studied and evaluated on an established test collection designed for
knowledge base population tasks. The experimental results show that the
Semantic Label Propagation strategy leads to substantial performance gains when
compared to existing approaches, while requiring an almost negligible manual
annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge
Bases for Natural Language Processin
Weakly Supervised Cross-Lingual Named Entity Recognition via Effective Annotation and Representation Projection
The state-of-the-art named entity recognition (NER) systems are supervised
machine learning models that require large amounts of manually annotated data
to achieve high accuracy. However, annotating NER data by human is expensive
and time-consuming, and can be quite difficult for a new language. In this
paper, we present two weakly supervised approaches for cross-lingual NER with
no human annotation in a target language. The first approach is to create
automatically labeled NER data for a target language via annotation projection
on comparable corpora, where we develop a heuristic scheme that effectively
selects good-quality projection-labeled data from noisy data. The second
approach is to project distributed representations of words (word embeddings)
from a target language to a source language, so that the source-language NER
system can be applied to the target language without re-training. We also
design two co-decoding schemes that effectively combine the outputs of the two
projection-based approaches. We evaluate the performance of the proposed
approaches on both in-house and open NER data for several target languages. The
results show that the combined systems outperform three other weakly supervised
approaches on the CoNLL data.Comment: 11 pages, The 55th Annual Meeting of the Association for
Computational Linguistics (ACL), 201
Weakly-Supervised Temporal Localization via Occurrence Count Learning
We propose a novel model for temporal detection and localization which allows
the training of deep neural networks using only counts of event occurrences as
training labels. This powerful weakly-supervised framework alleviates the
burden of the imprecise and time-consuming process of annotating event
locations in temporal data. Unlike existing methods, in which localization is
explicitly achieved by design, our model learns localization implicitly as a
byproduct of learning to count instances. This unique feature is a direct
consequence of the model's theoretical properties. We validate the
effectiveness of our approach in a number of experiments (drum hit and piano
onset detection in audio, digit detection in images) and demonstrate
performance comparable to that of fully-supervised state-of-the-art methods,
despite much weaker training requirements.Comment: Accepted at ICML 201
Domain adaptation for sequence labeling using hidden Markov models
Most natural language processing systems based on machine learning are not
robust to domain shift. For example, a state-of-the-art syntactic dependency
parser trained on Wall Street Journal sentences has an absolute drop in
performance of more than ten points when tested on textual data from the Web.
An efficient solution to make these methods more robust to domain shift is to
first learn a word representation using large amounts of unlabeled data from
both domains, and then use this representation as features in a supervised
learning algorithm. In this paper, we propose to use hidden Markov models to
learn word representations for part-of-speech tagging. In particular, we study
the influence of using data from the source, the target or both domains to
learn the representation and the different ways to represent words using an
HMM.Comment: New Directions in Transfer and Multi-Task: Learning Across Domains
and Tasks (NIPS Workshop) (2013
- …