1,650 research outputs found
Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks
Word embeddings have been widely adopted across several NLP applications.
Most existing word embedding methods utilize sequential context of a word to
learn its embedding. While there have been some attempts at utilizing syntactic
context of a word, such methods result in an explosion of the vocabulary size.
In this paper, we overcome this problem by proposing SynGCN, a flexible Graph
Convolution based method for learning word embeddings. SynGCN utilizes the
dependency context of a word without increasing the vocabulary size. Word
embeddings learned by SynGCN outperform existing methods on various intrinsic
and extrinsic tasks and provide an advantage when used with ELMo. We also
propose SemGCN, an effective framework for incorporating diverse semantic
knowledge for further enhancing learned word representations. We make the
source code of both models available to encourage reproducible research.Comment: 11 pages, 2 figure
Graph Convolutional Encoders for Syntax-aware Neural Machine Translation
We present a simple and effective approach to incorporating syntactic
structure into neural attention-based encoder-decoder models for machine
translation. We rely on graph-convolutional networks (GCNs), a recent class of
neural networks developed for modeling graph-structured data. Our GCNs use
predicted syntactic dependency trees of source sentences to produce
representations of words (i.e. hidden states of the encoder) that are sensitive
to their syntactic neighborhoods. GCNs take word representations as input and
produce word representations as output, so they can easily be incorporated as
layers into standard encoders (e.g., on top of bidirectional RNNs or
convolutional neural networks). We evaluate their effectiveness with
English-German and English-Czech translation experiments for different types of
encoders and observe substantial improvements over their syntax-agnostic
versions in all the considered setups
Syntax Helps ELMo Understand Semantics: Is Syntax Still Relevant in a Deep Neural Architecture for SRL?
Do unsupervised methods for learning rich, contextualized token
representations obviate the need for explicit modeling of linguistic structure
in neural network models for semantic role labeling (SRL)? We address this
question by incorporating the massively successful ELMo embeddings (Peters et
al., 2018) into LISA (Strubell et al., 2018), a strong, linguistically-informed
neural network architecture for SRL. In experiments on the CoNLL-2005 shared
task we find that though ELMo out-performs typical word embeddings, beginning
to close the gap in F1 between LISA with predicted and gold syntactic parses,
syntactically-informed models still out-perform syntax-free models when both
use ELMo, especially on out-of-domain data. Our results suggest that linguistic
structures are indeed still relevant in this golden age of deep learning for
NLP.Comment: In Proceedings of the Workshop on the Relevance of Linguistic
Structure in Neural Architectures for NLP, ACL 201
Look Again at the Syntax: Relational Graph Convolutional Network for Gendered Ambiguous Pronoun Resolution
Gender bias has been found in existing coreference resolvers. In order to
eliminate gender bias, a gender-balanced dataset Gendered Ambiguous Pronouns
(GAP) has been released and the best baseline model achieves only 66.9% F1.
Bidirectional Encoder Representations from Transformers (BERT) has broken
several NLP task records and can be used on GAP dataset. However, fine-tune
BERT on a specific task is computationally expensive. In this paper, we propose
an end-to-end resolver by combining pre-trained BERT with Relational Graph
Convolutional Network (R-GCN). R-GCN is used for digesting structural syntactic
information and learning better task-specific embeddings. Empirical results
demonstrate that, under explicit syntactic supervision and without the need to
fine tune BERT, R-GCN's embeddings outperform the original BERT embeddings on
the coreference task. Our work significantly improves the snippet-context
baseline F1 score on GAP dataset from 66.9% to 80.3%. We participated in the
2019 GAP Coreference Shared Task, and our codes are available online.Comment: Accepted by ACL 2019 Workshop on Gender Bias for Natural Language
Processin
Dating Documents using Graph Convolution Networks
Document date is essential for many important tasks, such as document
retrieval, summarization, event detection, etc. While existing approaches for
these tasks assume accurate knowledge of the document date, this is not always
available, especially for arbitrary documents from the Web. Document Dating is
a challenging problem which requires inference over the temporal structure of
the document. Prior document dating systems have largely relied on handcrafted
features while ignoring such document internal structures. In this paper, we
propose NeuralDater, a Graph Convolutional Network (GCN) based document dating
approach which jointly exploits syntactic and temporal graph structures of
document in a principled way. To the best of our knowledge, this is the first
application of deep learning for the problem of document dating. Through
extensive experiments on real-world datasets, we find that NeuralDater
significantly outperforms state-of-the-art baseline by 19% absolute (45%
relative) accuracy points.Comment: Accepted at ACL 201
Effective Representation for Easy-First Dependency Parsing
Easy-first parsing relies on subtree re-ranking to build the complete parse
tree. Whereas the intermediate state of parsing processing is represented by
various subtrees, whose internal structural information is the key lead for
later parsing action decisions, we explore a better representation for such
subtrees. In detail, this work introduces a bottom-up subtree encoding method
based on the child-sum tree-LSTM. Starting from an easy-first dependency parser
without other handcraft features, we show that the effective subtree encoder
does promote the parsing process, and can make a greedy search easy-first
parser achieve promising results on benchmark treebanks compared to
state-of-the-art baselines. Furthermore, with the help of the current
pre-training language model, we further improve the state-of-the-art results of
the easy-first approach
Linguistically-Informed Self-Attention for Semantic Role Labeling
Current state-of-the-art semantic role labeling (SRL) uses a deep neural
network with no explicit linguistic features. However, prior work has shown
that gold syntax trees can dramatically improve SRL decoding, suggesting the
possibility of increased accuracy from explicit modeling of syntax. In this
work, we present linguistically-informed self-attention (LISA): a neural
network model that combines multi-head self-attention with multi-task learning
across dependency parsing, part-of-speech tagging, predicate detection and SRL.
Unlike previous models which require significant pre-processing to prepare
linguistic features, LISA can incorporate syntax using merely raw tokens as
input, encoding the sequence only once to simultaneously perform parsing,
predicate detection and role labeling for all predicates. Syntax is
incorporated by training one attention head to attend to syntactic parents for
each token. Moreover, if a high-quality syntactic parse is already available,
it can be beneficially injected at test time without re-training our SRL model.
In experiments on CoNLL-2005 SRL, LISA achieves new state-of-the-art
performance for a model using predicted predicates and standard word
embeddings, attaining 2.5 F1 absolute higher than the previous state-of-the-art
on newswire and more than 3.5 F1 on out-of-domain data, nearly 10% reduction in
error. On ConLL-2012 English SRL we also show an improvement of more than 2.5
F1. LISA also out-performs the state-of-the-art with contextually-encoded
(ELMo) word representations, by nearly 1.0 F1 on news and more than 2.0 F1 on
out-of-domain text.Comment: In Conference on Empirical Methods in Natural Language Processing
(EMNLP). Brussels, Belgium. October 201
More Data, More Relations, More Context and More Openness: A Review and Outlook for Relation Extraction
Relational facts are an important component of human knowledge, which are
hidden in vast amounts of text. In order to extract these facts from text,
people have been working on relation extraction (RE) for years. From early
pattern matching to current neural networks, existing RE methods have achieved
significant progress. Yet with explosion of Web text and emergence of new
relations, human knowledge is increasing drastically, and we thus require
"more" from RE: a more powerful RE system that can robustly utilize more data,
efficiently learn more relations, easily handle more complicated context, and
flexibly generalize to more open domains. In this paper, we look back at
existing RE methods, analyze key challenges we are facing nowadays, and show
promising directions towards more powerful RE. We hope our view can advance
this field and inspire more efforts in the community
Cross-Sentence N-ary Relation Extraction with Graph LSTMs
Past work in relation extraction has focused on binary relations in single
sentences. Recent NLP inroads in high-value domains have sparked interest in
the more general setting of extracting n-ary relations that span multiple
sentences. In this paper, we explore a general relation extraction framework
based on graph long short-term memory networks (graph LSTMs) that can be easily
extended to cross-sentence n-ary relation extraction. The graph formulation
provides a unified way of exploring different LSTM approaches and incorporating
various intra-sentential and inter-sentential dependencies, such as sequential,
syntactic, and discourse relations. A robust contextual representation is
learned for the entities, which serves as input to the relation classifier.
This simplifies handling of relations with arbitrary arity, and enables
multi-task learning with related relations. We evaluate this framework in two
important precision medicine settings, demonstrating its effectiveness with
both conventional supervised learning and distant supervision. Cross-sentence
extraction produced larger knowledge bases. and multi-task learning
significantly improved extraction accuracy. A thorough analysis of various LSTM
approaches yielded useful insight the impact of linguistic analysis on
extraction accuracy.Comment: Conditional accepted by TACL in December 2016; published in April
2017; presented at ACL in August 201
Graph Convolutional Networks for Named Entity Recognition
In this paper we investigate the role of the dependency tree in a named
entity recognizer upon using a set of GCN. We perform a comparison among
different NER architectures and show that the grammar of a sentence positively
influences the results. Experiments on the ontonotes dataset demonstrate
consistent performance improvements, without requiring heavy feature
engineering nor additional language-specific knowledge.Comment: Accepted at the 16th International Workshop on Treebanks and
Linguistic Theorie
- …