29,578 research outputs found
End-to-End Learning for Structured Prediction Energy Networks
Structured Prediction Energy Networks (SPENs) are a simple, yet expressive
family of structured prediction models (Belanger and McCallum, 2016). An energy
function over candidate structured outputs is given by a deep network, and
predictions are formed by gradient-based optimization. This paper presents
end-to-end learning for SPENs, where the energy function is discriminatively
trained by back-propagating through gradient-based prediction. In our
experience, the approach is substantially more accurate than the structured SVM
method of Belanger and McCallum (2016), as it allows us to use more
sophisticated non-convex energies. We provide a collection of techniques for
improving the speed, accuracy, and memory requirements of end-to-end SPENs, and
demonstrate the power of our method on 7-Scenes image denoising and CoNLL-2005
semantic role labeling tasks. In both, inexact minimization of non-convex SPEN
energies is superior to baseline methods that use simplistic energy functions
that can be minimized exactly.Comment: ICML 201
Deep Semantic Role Labeling with Self-Attention
Semantic Role Labeling (SRL) is believed to be a crucial step towards natural
language understanding and has been widely studied. Recent years, end-to-end
SRL with recurrent neural networks (RNN) has gained increasing attention.
However, it remains a major challenge for RNNs to handle structural information
and long range dependencies. In this paper, we present a simple and effective
architecture for SRL which aims to address these problems. Our model is based
on self-attention which can directly capture the relationships between two
tokens regardless of their distance. Our single model achieves F on
the CoNLL-2005 shared task dataset and F on the CoNLL-2012 shared task
dataset, which outperforms the previous state-of-the-art results by and
F score respectively. Besides, our model is computationally
efficient, and the parsing speed is 50K tokens per second on a single Titan X
GPU.Comment: Accepted by AAAI-201
Improving Implicit Semantic Role Labeling by Predicting Semantic Frame Arguments
Implicit semantic role labeling (iSRL) is the task of predicting the semantic
roles of a predicate that do not appear as explicit arguments, but rather
regard common sense knowledge or are mentioned earlier in the discourse. We
introduce an approach to iSRL based on a predictive recurrent neural semantic
frame model (PRNSFM) that uses a large unannotated corpus to learn the
probability of a sequence of semantic arguments given a predicate. We leverage
the sequence probabilities predicted by the PRNSFM to estimate selectional
preferences for predicates and their arguments. On the NomBank iSRL test set,
our approach improves state-of-the-art performance on implicit semantic role
labeling with less reliance than prior work on manually constructed language
resources.Comment: IJCNLP 201
A Sequence-to-Sequence Model for Semantic Role Labeling
We explore a novel approach for Semantic Role Labeling (SRL) by casting it as
a sequence-to-sequence process. We employ an attention-based model enriched
with a copying mechanism to ensure faithful regeneration of the input sequence,
while enabling interleaved generation of argument role labels. Here, we apply
this model in a monolingual setting, performing PropBank SRL on English
language data. The constrained sequence generation set-up enforced with the
copying mechanism allows us to analyze the performance and special properties
of the model on manually labeled data and benchmarking against state-of-the-art
sequence labeling models. We show that our model is able to solve the SRL
argument labeling task on English data, yet further structural decoding
constraints will need to be added to make the model truly competitive. Our work
represents a first step towards more advanced, generative SRL labeling setups
Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs
We present a transition-based parser that jointly produces syntactic and
semantic dependencies. It learns a representation of the entire algorithm
state, using stack long short-term memories. Our greedy inference algorithm has
linear time, including feature extraction. On the CoNLL 2008--9 English shared
tasks, we obtain the best published parsing performance among models that
jointly learn syntax and semantics.Comment: Proceedings of CoNLL 2016; 13 pages, 5 figure
The Natural Language Decathlon: Multitask Learning as Question Answering
Deep learning has improved performance on many natural language processing
(NLP) tasks individually. However, general NLP models cannot emerge within a
paradigm that focuses on the particularities of a single metric, dataset, and
task. We introduce the Natural Language Decathlon (decaNLP), a challenge that
spans ten tasks: question answering, machine translation, summarization,
natural language inference, sentiment analysis, semantic role labeling,
zero-shot relation extraction, goal-oriented dialogue, semantic parsing, and
commonsense pronoun resolution. We cast all tasks as question answering over a
context. Furthermore, we present a new Multitask Question Answering Network
(MQAN) jointly learns all tasks in decaNLP without any task-specific modules or
parameters in the multitask setting. MQAN shows improvements in transfer
learning for machine translation and named entity recognition, domain
adaptation for sentiment analysis and natural language inference, and zero-shot
capabilities for text classification. We demonstrate that the MQAN's
multi-pointer-generator decoder is key to this success and performance further
improves with an anti-curriculum training strategy. Though designed for
decaNLP, MQAN also achieves state of the art results on the WikiSQL semantic
parsing task in the single-task setting. We also release code for procuring and
processing data, training and evaluating models, and reproducing all
experiments for decaNLP
Scene Parsing via Dense Recurrent Neural Networks with Attentional Selection
Recurrent neural networks (RNNs) have shown the ability to improve scene
parsing through capturing long-range dependencies among image units. In this
paper, we propose dense RNNs for scene labeling by exploring various long-range
semantic dependencies among image units. Different from existing RNN based
approaches, our dense RNNs are able to capture richer contextual dependencies
for each image unit by enabling immediate connections between each pair of
image units, which significantly enhances their discriminative power. Besides,
to select relevant dependencies and meanwhile to restrain irrelevant ones for
each unit from dense connections, we introduce an attention model into dense
RNNs. The attention model allows automatically assigning more importance to
helpful dependencies while less weight to unconcerned dependencies. Integrating
with convolutional neural networks (CNNs), we develop an end-to-end scene
labeling system. Extensive experiments on three large-scale benchmarks
demonstrate that the proposed approach can improve the baselines by large
margins and outperform other state-of-the-art algorithms.Comment: 10 pages. arXiv admin note: substantial text overlap with
arXiv:1801.0683
CFO: Conditional Focused Neural Question Answering with Large-scale Knowledge Bases
How can we enable computers to automatically answer questions like "Who
created the character Harry Potter"? Carefully built knowledge bases provide
rich sources of facts. However, it remains a challenge to answer factoid
questions raised in natural language due to numerous expressions of one
question. In particular, we focus on the most common questions --- ones that
can be answered with a single fact in the knowledge base. We propose CFO, a
Conditional Focused neural-network-based approach to answering factoid
questions with knowledge bases. Our approach first zooms in a question to find
more probable candidate subject mentions, and infers the final answers with a
unified conditional probabilistic framework. Powered by deep recurrent neural
networks and neural embeddings, our proposed CFO achieves an accuracy of 75.7%
on a dataset of 108k questions - the largest public one to date. It outperforms
the current state of the art by an absolute margin of 11.8%.Comment: Accepted by ACL 201
Cross-Lingual Transfer of Semantic Roles: From Raw Text to Semantic Roles
We describe a transfer method based on annotation projection to develop a
dependency-based semantic role labeling system for languages for which no
supervised linguistic information other than parallel data is available. Unlike
previous work that presumes the availability of supervised features such as
lemmas, part-of-speech tags, and dependency parse trees, we only make use of
word and character features. Our deep model considers using character-based
representations as well as unsupervised stem embeddings to alleviate the need
for supervised features. Our experiments outperform a state-of-the-art method
that uses supervised lexico-syntactic features on 6 out of 7 languages in the
Universal Proposition Bank.Comment: Accepted at the 13th International Conference on Computational
Semantics (IWCS 2019
Semantic Frame Parsing for Information Extraction : the CALOR corpus
This paper presents a publicly available corpus of French encyclopedic
history texts annotated according to the Berkeley FrameNet formalism. The main
difference in our approach compared to previous works on semantic parsing with
FrameNet is that we are not interested here in full text parsing but rather on
partial parsing. The goal is to select from the FrameNet resources the minimal
set of frames that are going to be useful for the applicative framework
targeted, in our case Information Extraction from encyclopedic documents. Such
an approach leverages the manual annotation of larger corpora than those
obtained through full text parsing and therefore opens the door to alternative
methods for Frame parsing than those used so far on the FrameNet 1.5 benchmark
corpus. The approaches compared in this study rely on an integrated sequence
labeling model which jointly optimizes frame identification and semantic role
segmentation and identification. The models compared are CRFs and multitasks
bi-LSTMs
- …