67 research outputs found
Neural CRF Parsing
This paper describes a parsing model that combines the exact dynamic
programming of CRF parsing with the rich nonlinear featurization of neural net
approaches. Our model is structurally a CRF that factors over anchored rule
productions, but instead of linear potential functions based on sparse
features, we use nonlinear potentials computed via a feedforward neural
network. Because potentials are still local to anchored rules, structured
inference (CKY) is unchanged from the sparse case. Computing gradients during
learning involves backpropagating an error signal formed from standard CRF
sufficient statistics (expected rule counts). Using only dense features, our
neural CRF already exceeds a strong baseline CRF model (Hall et al., 2014). In
combination with sparse features, our system achieves 91.1 F1 on section 23 of
the Penn Treebank, and more generally outperforms the best prior single parser
results on a range of languages.Comment: Accepted for publication at ACL 201
Spherical Latent Spaces for Stable Variational Autoencoders
A hallmark of variational autoencoders (VAEs) for text processing is their
combination of powerful encoder-decoder models, such as LSTMs, with simple
latent distributions, typically multivariate Gaussians. These models pose a
difficult optimization problem: there is an especially bad local optimum where
the variational posterior always equals the prior and the model does not use
the latent variable at all, a kind of "collapse" which is encouraged by the KL
divergence term of the objective. In this work, we experiment with another
choice of latent distribution, namely the von Mises-Fisher (vMF) distribution,
which places mass on the surface of the unit hypersphere. With this choice of
prior and posterior, the KL divergence term now only depends on the variance of
the vMF distribution, giving us the ability to treat it as a fixed
hyperparameter. We show that doing so not only averts the KL collapse, but
consistently gives better likelihoods than Gaussians across a range of modeling
conditions, including recurrent language modeling and bag-of-words document
modeling. An analysis of the properties of our vMF representations shows that
they learn richer and more nuanced structures in their latent representations
than their Gaussian counterparts.Comment: To appear in EMNLP 2018; 11 pages; Code release:
https://github.com/jiacheng-xu/vmf_vae_nl
Understanding Dataset Design Choices for Multi-hop Reasoning
Learning multi-hop reasoning has been a key challenge for reading
comprehension models, leading to the design of datasets that explicitly focus
on it. Ideally, a model should not be able to perform well on a multi-hop
question answering task without doing multi-hop reasoning. In this paper, we
investigate two recently proposed datasets, WikiHop and HotpotQA. First, we
explore sentence-factored models for these tasks; by design, these models
cannot do multi-hop reasoning, but they are still able to solve a large number
of examples in both datasets. Furthermore, we find spurious correlations in the
unmasked version of WikiHop, which make it easy to achieve high performance
considering only the questions and answers. Finally, we investigate one key
difference between these datasets, namely span-based vs. multiple-choice
formulations of the QA task. Multiple-choice versions of both datasets can be
easily gamed, and two models we examine only marginally exceed a baseline in
this setting. Overall, while these datasets are useful testbeds,
high-performing models may not be learning as much multi-hop reasoning as
previously thought.Comment: NAACL 201
Tracking Discrete and Continuous Entity State for Process Understanding
Procedural text, which describes entities and their interactions as they
undergo some process, depicts entities in a uniquely nuanced way. First, each
entity may have some observable discrete attributes, such as its state or
location; modeling these involves imposing global structure and enforcing
consistency. Second, an entity may have properties which are not made explicit
but can be effectively induced and tracked by neural networks. In this paper,
we propose a structured neural architecture that reflects this dual nature of
entity evolution. The model tracks each entity recurrently, updating its hidden
continuous representation at each step to contain relevant state information.
The global discrete state structure is explicitly modeled with a neural CRF
over the changing hidden representation of the entity. This CRF can explicitly
capture constraints on entity states over time, enforcing that, for example, an
entity cannot move to a location after it is destroyed. We evaluate the
performance of our proposed model on QA tasks over process paragraphs in the
ProPara dataset and find that our model achieves state-of-the-art results.Comment: 5 page
Robust Question Answering Through Sub-part Alignment
Current textual question answering models achieve strong performance on
in-domain test sets, but often do so by fitting surface-level patterns in the
data, so they fail to generalize to out-of-distribution settings. To make a
more robust and understandable QA system, we model question answering as an
alignment problem. We decompose both the question and context into smaller
units based on off-the-shelf semantic representations (here, semantic roles),
and align the question to a subgraph of the context in order to find the
answer. We formulate our model as a structured SVM, with alignment scores
computed via BERT, and we can train end-to-end despite using beam search for
approximate inference. Our explicit use of alignments allows us to explore a
set of constraints with which we can prohibit certain types of bad model
behavior arising in cross-domain settings. Furthermore, by investigating
differences in scores across different potential answers, we can seek to
understand what particular aspects of the input lead the model to choose the
answer without relying on post-hoc explanation techniques. We train our model
on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
The results show that our model is more robust cross-domain than the standard
BERT QA model, and constraints derived from alignment scores allow us to
effectively trade off coverage and accuracy
A Genetic Algorithm to Minimize Chromatic Entropy
Abstract. We present an algorithmic approach to solving the problem of chromatic entropy, a combinatorial optimization problem related to graph coloring. This problem is a component in algorithms for optimizing data compression when computing a function of two correlated sources at a receiver. Our genetic algorithm for minimizing chromatic entropy uses an order-based genome inspired by graph coloring genetic algorithms, as well as some problem-specific heuristics. It performs consistently well on synthetic instances, and for an expositional set of functional compression problems, the GA routinely finds a compression scheme that is 20-30% more efficient than that given by a reference compression algorithm
Modeling Semantic Plausibility by Injecting World Knowledge
Distributional data tells us that a man can swallow candy, but not that a man
can swallow a paintball, since this is never attested. However both are
physically plausible events. This paper introduces the task of semantic
plausibility: recognizing plausible but possibly novel events. We present a new
crowdsourced dataset of semantic plausibility judgments of single events such
as "man swallow paintball". Simple models based on distributional
representations perform poorly on this task, despite doing well on selection
preference, but injecting manually elicited knowledge about entity properties
provides a substantial performance boost. Our error analysis shows that our new
dataset is a great testbed for semantic plausibility models: more sophisticated
knowledge representation and propagation could address many of the remaining
errors.Comment: camera-ready draft (with link to data), Published at NAACL 2018 as a
conference paper (oral
Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks
A key challenge in entity linking is making effective use of contextual
information to disambiguate mentions that might refer to different entities in
different contexts. We present a model that uses convolutional neural networks
to capture semantic correspondence between a mention's context and a proposed
target entity. These convolutional networks operate at multiple granularities
to exploit various kinds of topic information, and their rich parameterization
gives them the capacity to learn which n-grams characterize different topics.
We combine these networks with a sparse linear model to achieve
state-of-the-art performance on multiple entity linking datasets, outperforming
the prior systems of Durrett and Klein (2014) and Nguyen et al. (2014).Comment: Accepted at NAACL 201
Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints
We present a discriminative model for single-document summarization that
integrally combines compression and anaphoricity constraints. Our model selects
textual units to include in the summary based on a rich set of sparse features
whose weights are learned on a large corpus. We allow for the deletion of
content within a sentence when that deletion is licensed by compression rules;
in our framework, these are implemented as dependencies between subsentential
units of text. Anaphoricity constraints then improve cross-sentence coherence
by guaranteeing that, for each pronoun included in the summary, the pronoun's
antecedent is included as well or the pronoun is rewritten as a full mention.
When trained end-to-end, our final system outperforms prior work on both ROUGE
as well as on human judgments of linguistic quality.Comment: ACL 201
LambdaNet: Probabilistic Type Inference using Graph Neural Networks
As gradual typing becomes increasingly popular in languages like Python and
TypeScript, there is a growing need to infer type annotations automatically.
While type annotations help with tasks like code completion and static error
catching, these annotations cannot be fully determined by compilers and are
tedious to annotate by hand. This paper proposes a probabilistic type inference
scheme for TypeScript based on a graph neural network. Our approach first uses
lightweight source code analysis to generate a program abstraction called a
type dependency graph, which links type variables with logical constraints as
well as name and usage information. Given this program abstraction, we then use
a graph neural network to propagate information between related type variables
and eventually make type predictions. Our neural architecture can predict both
standard types, like number or string, as well as user-defined types that have
not been encountered during training. Our experimental results show that our
approach outperforms prior work in this space by (absolute) on library
types, while having the ability to make type predictions that are out of scope
for existing techniques.Comment: Accepted as a poster at ICLR 202
- …