29,763 research outputs found
Named Entity Disambiguation for Noisy Text
We address the task of Named Entity Disambiguation (NED) for noisy text. We
present WikilinksNED, a large-scale NED dataset of text fragments from the web,
which is significantly noisier and more challenging than existing news-based
datasets. To capture the limited and noisy local context surrounding each
mention, we design a neural model and train it with a novel method for sampling
informative negative examples. We also describe a new way of initializing word
and entity embeddings that significantly improves performance. Our model
significantly outperforms existing state-of-the-art methods on WikilinksNED
while achieving comparable performance on a smaller newswire dataset.Comment: Accepted to CoNLL 201
Differentiable Scheduled Sampling for Credit Assignment
We demonstrate that a continuous relaxation of the argmax operation can be
used to create a differentiable approximation to greedy decoding for
sequence-to-sequence (seq2seq) models. By incorporating this approximation into
the scheduled sampling training procedure (Bengio et al., 2015)--a well-known
technique for correcting exposure bias--we introduce a new training objective
that is continuous and differentiable everywhere and that can provide
informative gradients near points where previous decoding decisions change
their value. In addition, by using a related approximation, we demonstrate a
similar approach to sampled-based training. Finally, we show that our approach
outperforms cross-entropy training and scheduled sampling procedures in two
sequence prediction tasks: named entity recognition and machine translation.Comment: Accepted at ACL2017 (http://bit.ly/2oj1muX
SwellShark: A Generative Model for Biomedical Named Entity Recognition without Labeled Data
We present SwellShark, a framework for building biomedical named entity
recognition (NER) systems quickly and without hand-labeled data. Our approach
views biomedical resources like lexicons as function primitives for
autogenerating weak supervision. We then use a generative model to unify and
denoise this supervision and construct large-scale, probabilistically labeled
datasets for training high-accuracy NER taggers. In three biomedical NER tasks,
SwellShark achieves competitive scores with state-of-the-art supervised
benchmarks using no hand-labeled training data. In a drug name extraction task
using patient medical records, one domain expert using SwellShark achieved
within 5.1% of a crowdsourced annotation approach -- which originally utilized
20 teams over the course of several weeks -- in 24 hours
A Continuous Relaxation of Beam Search for End-to-end Training of Neural Sequence Models
Beam search is a desirable choice of test-time decoding algorithm for neural
sequence models because it potentially avoids search errors made by simpler
greedy methods. However, typical cross entropy training procedures for these
models do not directly consider the behaviour of the final decoding method. As
a result, for cross-entropy trained models, beam decoding can sometimes yield
reduced test performance when compared with greedy decoding. In order to train
models that can more effectively make use of beam search, we propose a new
training procedure that focuses on the final loss metric (e.g. Hamming loss)
evaluated on the output of beam search. While well-defined, this "direct loss"
objective is itself discontinuous and thus difficult to optimize. Hence, in our
approach, we form a sub-differentiable surrogate objective by introducing a
novel continuous approximation of the beam search decoding procedure. In
experiments, we show that optimizing this new training objective yields
substantially better results on two sequence tasks (Named Entity Recognition
and CCG Supertagging) when compared with both cross entropy trained greedy
decoding and cross entropy trained beam decoding baselines.Comment: Updated for clarity and notational consistenc
Adversarial Structured Prediction for Multivariate Measures
Many predicted structured objects (e.g., sequences, matchings, trees) are
evaluated using the F-score, alignment error rate (AER), or other multivariate
performance measures. Since inductively optimizing these measures using
training data is typically computationally difficult, empirical risk
minimization of surrogate losses is employed, using, e.g., the hinge loss for
(structured) support vector machines. These approximations often introduce a
mismatch between the learner's objective and the desired application
performance, leading to inconsistency. We take a different approach:
adversarially approximate training data while optimizing the exact F-score or
AER. Structured predictions under this formulation result from solving zero-sum
games between a predictor seeking the best performance and an adversary seeking
the worst while required to (approximately) match certain structured properties
of the training data. We explore this approach for word alignment (AER
evaluation) and named entity recognition (F-score evaluation) with linear-chain
constraints
Softmax Q-Distribution Estimation for Structured Prediction: A Theoretical Interpretation for RAML
Reward augmented maximum likelihood (RAML), a simple and effective learning
framework to directly optimize towards the reward function in structured
prediction tasks, has led to a number of impressive empirical successes. RAML
incorporates task-specific reward by performing maximum-likelihood updates on
candidate outputs sampled according to an exponentiated payoff distribution,
which gives higher probabilities to candidates that are close to the reference
output. While RAML is notable for its simplicity, efficiency, and its
impressive empirical successes, the theoretical properties of RAML, especially
the behavior of the exponentiated payoff distribution, has not been examined
thoroughly. In this work, we introduce softmax Q-distribution estimation, a
novel theoretical interpretation of RAML, which reveals the relation between
RAML and Bayesian decision theory. The softmax Q-distribution can be regarded
as a smooth approximation of the Bayes decision boundary, and the Bayes
decision rule is achieved by decoding with this Q-distribution. We further show
that RAML is equivalent to approximately estimating the softmax Q-distribution,
with the temperature controlling approximation error. We perform two
experiments, one on synthetic data of multi-class classification and one on
real data of image captioning, to demonstrate the relationship between RAML and
the proposed softmax Q-distribution estimation method, verifying our
theoretical analysis. Additional experiments on three structured prediction
tasks with rewards defined on sequential (named entity recognition), tree-based
(dependency parsing) and irregular (machine translation) structures show
notable improvements over maximum likelihood baselines.Comment: Under Review of ICLR 201
End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures
We present a novel end-to-end neural model to extract entities and relations
between them. Our recurrent neural network based model captures both word
sequence and dependency tree substructure information by stacking bidirectional
tree-structured LSTM-RNNs on bidirectional sequential LSTM-RNNs. This allows
our model to jointly represent both entities and relations with shared
parameters in a single model. We further encourage detection of entities during
training and use of entity information in relation extraction via entity
pretraining and scheduled sampling. Our model improves over the
state-of-the-art feature-based model on end-to-end relation extraction,
achieving 12.1% and 5.7% relative error reductions in F1-score on ACE2005 and
ACE2004, respectively. We also show that our LSTM-RNN based model compares
favorably to the state-of-the-art CNN based model (in F1-score) on nominal
relation classification (SemEval-2010 Task 8). Finally, we present an extensive
ablation analysis of several model components.Comment: Accepted for publication at the Association for Computational
Linguistics (ACL), 2016. 13 pages, 1 figure, 6 table
Tracking the Diffusion of Named Entities
Existing studies of how information diffuses across social networks have thus
far concentrated on analysing and recovering the spread of deterministic
innovations such as URLs, hashtags, and group membership. However investigating
how mentions of real-world entities appear and spread has yet to be explored,
largely due to the computationally intractable nature of performing large-scale
entity extraction. In this paper we present, to the best of our knowledge, one
of the first pieces of work to closely examine the diffusion of named entities
on social media, using Reddit as our case study platform. We first investigate
how named entities can be accurately recognised and extracted from discussion
posts. We then use these extracted entities to study the patterns of entity
cascades and how the probability of a user adopting an entity (i.e. mentioning
it) is associated with exposures to the entity. We put these pieces together by
presenting a parallelised diffusion model that can forecast the probability of
entity adoption, finding that the influence of adoption between users can be
characterised by their prior interactions -- as opposed to whether the users
propagated entity-adoptions beforehand. Our findings have important
implications for researchers studying influence and language, and for community
analysts who wish to understand entity-level influence dynamics
Generating Fine-Grained Open Vocabulary Entity Type Descriptions
While large-scale knowledge graphs provide vast amounts of structured facts
about entities, a short textual description can often be useful to succinctly
characterize an entity and its type. Unfortunately, many knowledge graph
entities lack such textual descriptions. In this paper, we introduce a dynamic
memory-based network that generates a short open vocabulary description of an
entity by jointly leveraging induced fact embeddings as well as the dynamic
context of the generated sequence of words. We demonstrate the ability of our
architecture to discern relevant information for more accurate generation of
type description by pitting the system against several strong baselines.Comment: Published in ACL 201
AMR Parsing as Sequence-to-Graph Transduction
We propose an attention-based model that treats AMR parsing as
sequence-to-graph transduction. Unlike most AMR parsers that rely on
pre-trained aligners, external semantic resources, or data augmentation, our
proposed parser is aligner-free, and it can be effectively trained with limited
amounts of labeled AMR data. Our experimental results outperform all previously
reported SMATCH scores, on both AMR 2.0 (76.3% F1 on LDC2017T10) and AMR 1.0
(70.2% F1 on LDC2014T12).Comment: Accepted at ACL 201
- …