16,191 research outputs found
Syntactic Topic Models
The syntactic topic model (STM) is a Bayesian nonparametric model of language
that discovers latent distributions of words (topics) that are both
semantically and syntactically coherent. The STM models dependency parsed
corpora where sentences are grouped into documents. It assumes that each word
is drawn from a latent topic chosen by combining document-level features and
the local syntactic context. Each document has a distribution over latent
topics, as in topic models, which provides the semantic consistency. Each
element in the dependency parse tree also has a distribution over the topics of
its children, as in latent-state syntax models, which provides the syntactic
consistency. These distributions are convolved so that the topic of each word
is likely under both its document and syntactic context. We derive a fast
posterior inference algorithm based on variational methods. We report
qualitative and quantitative studies on both synthetic data and hand-parsed
documents. We show that the STM is a more predictive model of language than
current models based only on syntax or only on topics
Evaluating Scoped Meaning Representations
Semantic parsing offers many opportunities to improve natural language
understanding. We present a semantically annotated parallel corpus for English,
German, Italian, and Dutch where sentences are aligned with scoped meaning
representations in order to capture the semantics of negation, modals,
quantification, and presupposition triggers. The semantic formalism is based on
Discourse Representation Theory, but concepts are represented by WordNet
synsets and thematic roles by VerbNet relations. Translating scoped meaning
representations to sets of clauses enables us to compare them for the purpose
of semantic parser evaluation and checking translations. This is done by
computing precision and recall on matching clauses, in a similar way as is done
for Abstract Meaning Representations. We show that our matching tool for
evaluating scoped meaning representations is both accurate and efficient.
Applying this matching tool to three baseline semantic parsers yields F-scores
between 43% and 54%. A pilot study is performed to automatically find changes
in meaning by comparing meaning representations of translations. This
comparison turns out to be an additional way of (i) finding annotation mistakes
and (ii) finding instances where our semantic analysis needs to be improved.Comment: Camera-ready for LREC 201
A Theme-Rewriting Approach for Generating Algebra Word Problems
Texts present coherent stories that have a particular theme or overall
setting, for example science fiction or western. In this paper, we present a
text generation method called {\it rewriting} that edits existing
human-authored narratives to change their theme without changing the underlying
story. We apply the approach to math word problems, where it might help
students stay more engaged by quickly transforming all of their homework
assignments to the theme of their favorite movie without changing the math
concepts that are being taught. Our rewriting method uses a two-stage decoding
process, which proposes new words from the target theme and scores the
resulting stories according to a number of factors defining aspects of
syntactic, semantic, and thematic coherence. Experiments demonstrate that the
final stories typically represent the new theme well while still testing the
original math concepts, outperforming a number of baselines. We also release a
new dataset of human-authored rewrites of math word problems in several themes.Comment: To appear EMNLP 201
Towards Universal Semantic Tagging
The paper proposes the task of universal semantic tagging---tagging word
tokens with language-neutral, semantically informative tags. We argue that the
task, with its independent nature, contributes to better semantic analysis for
wide-coverage multilingual text. We present the initial version of the semantic
tagset and show that (a) the tags provide semantically fine-grained
information, and (b) they are suitable for cross-lingual semantic parsing. An
application of the semantic tagging in the Parallel Meaning Bank supports both
of these points as the tags contribute to formal lexical semantics and their
cross-lingual projection. As a part of the application, we annotate a small
corpus with the semantic tags and present new baseline result for universal
semantic tagging.Comment: 9 pages, International Conference on Computational Semantics (IWCS
Measuring Thematic Fit with Distributional Feature Overlap
In this paper, we introduce a new distributional method for modeling
predicate-argument thematic fit judgments. We use a syntax-based DSM to build a
prototypical representation of verb-specific roles: for every verb, we extract
the most salient second order contexts for each of its roles (i.e. the most
salient dimensions of typical role fillers), and then we compute thematic fit
as a weighted overlap between the top features of candidate fillers and role
prototypes. Our experiments show that our method consistently outperforms a
baseline re-implementing a state-of-the-art system, and achieves better or
comparable results to those reported in the literature for the other
unsupervised systems. Moreover, it provides an explicit representation of the
features characterizing verb-specific semantic roles.Comment: 9 pages, 2 figures, 5 tables, EMNLP, 2017, thematic fit, selectional
preference, semantic role, DSMs, Distributional Semantic Models, Vector Space
Models, VSMs, cosine, APSyn, similarity, prototyp
- …