3,357 research outputs found
A Unified Kernel Approach For Learning Typed Sentence Rewritings
International audienceMany high level natural language processing problems can be framed as determining if two given sentences are a rewriting of each other. In this paper, we propose a class of kernel functions, referred to as type-enriched string rewriting kernels, which, used in kernel-based machine learning algorithms, allow to learn sentence rewritings. Unlike previous work, this method can be fed external lexical semantic relations to capture a wider class of rewriting rules. It also does not assume preliminary syntactic parsing but is still able to provide a unified framework to capture syntactic structure and alignments between the two sentences. We experiment on three different natural sentence rewriting tasks and obtain state-of-the-art results for all of them
A Theme-Rewriting Approach for Generating Algebra Word Problems
Texts present coherent stories that have a particular theme or overall
setting, for example science fiction or western. In this paper, we present a
text generation method called {\it rewriting} that edits existing
human-authored narratives to change their theme without changing the underlying
story. We apply the approach to math word problems, where it might help
students stay more engaged by quickly transforming all of their homework
assignments to the theme of their favorite movie without changing the math
concepts that are being taught. Our rewriting method uses a two-stage decoding
process, which proposes new words from the target theme and scores the
resulting stories according to a number of factors defining aspects of
syntactic, semantic, and thematic coherence. Experiments demonstrate that the
final stories typically represent the new theme well while still testing the
original math concepts, outperforming a number of baselines. We also release a
new dataset of human-authored rewrites of math word problems in several themes.Comment: To appear EMNLP 201
Restricting the Weak-Generative Capacity of Synchronous Tree-Adjoining Grammars
The formalism of synchronous tree-adjoining grammars, a variant of standard
tree-adjoining grammars (TAG), was intended to allow the use of TAGs for
language transduction in addition to language specification. In previous work,
the definition of the transduction relation defined by a synchronous TAG was
given by appeal to an iterative rewriting process. The rewriting definition of
derivation is problematic in that it greatly extends the expressivity of the
formalism and makes the design of parsing algorithms difficult if not
impossible. We introduce a simple, natural definition of synchronous
tree-adjoining derivation, based on isomorphisms between standard
tree-adjoining derivations, that avoids the expressivity and implementability
problems of the original rewriting definition. The decrease in expressivity,
which would otherwise make the method unusable, is offset by the incorporation
of an alternative definition of standard tree-adjoining derivation, previously
proposed for completely separate reasons, thereby making it practical to
entertain using the natural definition of synchronous derivation. Nonetheless,
some remaining problematic cases call for yet more flexibility in the
definition; the isomorphism requirement may have to be relaxed. It remains for
future research to tune the exact requirements on the allowable mappings.Comment: 21 pages, uses lingmacros.sty, psfig.sty, fullname.sty; minor
typographical changes onl
From surface dependencies towards deeper semantic representations [Semantic representations]
In the past, a divide could be seen between ’deep’ parsers on the one hand, which construct a semantic representation out of their input, but usually have significant coverage problems, and more robust parsers on the other hand, which are usually based on a (statistical) model derived from a treebank and have larger coverage, but leave the problem of semantic interpretation to the user. More recently, approaches have emerged that combine the robustness of datadriven (statistical) models with more detailed linguistic interpretation such that the output could be used for deeper semantic analysis. Cahill et al. (2002) use a PCFG-based parsing model in combination with a set of principles and heuristics to derive functional (f-)structures of Lexical-Functional Grammar (LFG). They show that the derived functional structures have a better quality than those generated by a parser based on a state-of-the-art hand-crafted LFG grammar. Advocates of Dependency Grammar usually point out that dependencies already are a semantically meaningful representation (cf. Menzel, 2003). However, parsers based on dependency grammar normally create underspecified representations with respect to certain phenomena such as coordination, apposition and control structures. In these areas they are too "shallow" to be directly used for semantic interpretation. In this paper, we adopt a similar approach to Cahill et al. (2002) using a dependency-based analysis to derive functional structure, and demonstrate the feasibility of this approach using German data. A major focus of our discussion is on the treatment of coordination and other potentially underspecified structures of the dependency data input. F-structure is one of the two core levels of syntactic representation in LFG (Bresnan, 2001). Independently of surface order, it encodes abstract syntactic functions that constitute predicate argument structure and other dependency relations such as subject, predicate, adjunct, but also further semantic information such as the semantic type of an adjunct (e.g. directional). Normally f-structure is captured as a recursive attribute value matrix, which is isomorphic to a directed graph representation. Figure 5 depicts an example target f-structure. As mentioned earlier, these deeper-level dependency relations can be used to construct logical forms as in the approaches of van Genabith and Crouch (1996), who construct underspecified discourse representations (UDRSs), and Spreyer and Frank (2005), who have robust minimal recursion semantics (RMRS) as their target representation. We therefore think that f-structures are a suitable target representation for automatic syntactic analysis in a larger pipeline of mapping text to interpretation. In this paper, we report on the conversion from dependency structures to fstructure. Firstly, we evaluate the f-structure conversion in isolation, starting from hand-corrected dependencies based on the TüBa-D/Z treebank and Versley (2005)´s conversion. Secondly, we start from tokenized text to evaluate the combined process of automatic parsing (using Foth and Menzel (2006)´s parser) and f-structure conversion. As a test set, we randomly selected 100 sentences from TüBa-D/Z which we annotated using a scheme very close to that of the TiGer Dependency Bank (Forst et al., 2004). In the next section, we sketch dependency analysis, the underlying theory of our input representations, and introduce four different representations of coordination. We also describe Weighted Constraint Dependency Grammar (WCDG), the dependency parsing formalism that we use in our experiments. Section 3 characterises the conversion of dependencies to f-structures. Our evaluation is presented in section 4, and finally, section 5 summarises our results and gives an overview of problems remaining to be solved
An Alternative Conception of Tree-Adjoining Derivation
The precise formulation of derivation for tree-adjoining grammars has
important ramifications for a wide variety of uses of the formalism, from
syntactic analysis to semantic interpretation and statistical language
modeling. We argue that the definition of tree-adjoining derivation must be
reformulated in order to manifest the proper linguistic dependencies in
derivations. The particular proposal is both precisely characterizable through
a definition of TAG derivations as equivalence classes of ordered derivation
trees, and computationally operational, by virtue of a compilation to linear
indexed grammars together with an efficient algorithm for recognition and
parsing according to the compiled grammar.Comment: 33 page
A Type-coherent, Expressive Representation as an Initial Step to Language Understanding
A growing interest in tasks involving language understanding by the NLP
community has led to the need for effective semantic parsing and inference.
Modern NLP systems use semantic representations that do not quite fulfill the
nuanced needs for language understanding: adequately modeling language
semantics, enabling general inferences, and being accurately recoverable. This
document describes underspecified logical forms (ULF) for Episodic Logic (EL),
which is an initial form for a semantic representation that balances these
needs. ULFs fully resolve the semantic type structure while leaving issues such
as quantifier scope, word sense, and anaphora unresolved; they provide a
starting point for further resolution into EL, and enable certain structural
inferences without further resolution. This document also presents preliminary
results of creating a hand-annotated corpus of ULFs for the purpose of training
a precise ULF parser, showing a three-person pairwise interannotator agreement
of 0.88 on confident annotations. We hypothesize that a divide-and-conquer
approach to semantic parsing starting with derivation of ULFs will lead to
semantic analyses that do justice to subtle aspects of linguistic meaning, and
will enable construction of more accurate semantic parsers.Comment: Accepted for publication at The 13th International Conference on
Computational Semantics (IWCS 2019
Non-simplifying Graph Rewriting Termination
So far, a very large amount of work in Natural Language Processing (NLP) rely
on trees as the core mathematical structure to represent linguistic
informations (e.g. in Chomsky's work). However, some linguistic phenomena do
not cope properly with trees. In a former paper, we showed the benefit of
encoding linguistic structures by graphs and of using graph rewriting rules to
compute on those structures. Justified by some linguistic considerations, graph
rewriting is characterized by two features: first, there is no node creation
along computations and second, there are non-local edge modifications. Under
these hypotheses, we show that uniform termination is undecidable and that
non-uniform termination is decidable. We describe two termination techniques
based on weights and we give complexity bound on the derivation length for
these rewriting system.Comment: In Proceedings TERMGRAPH 2013, arXiv:1302.599
- …