2,569 research outputs found
Optimal rank reduction for Linear Context-Free Rewriting Systems with Fan-Out Two
International audienceLinear Context-Free Rewriting Systems (LCFRSs) are a grammar formalism capable of modeling discontinuous phrases. Many parsing applications use LCFRSs where the fan-out (a measure of the discontinuity of phrases) does not exceed 2. We present an efficient algorithm for optimal reduction of the length of production right-hand side in LCFRSs with fan-out at most 2. This results in asymptotical running time improvement for known parsing algorithms for this class
A Transition-Based Directed Acyclic Graph Parser for UCCA
We present the first parser for UCCA, a cross-linguistically applicable
framework for semantic representation, which builds on extensive typological
work and supports rapid annotation. UCCA poses a challenge for existing parsing
techniques, as it exhibits reentrancy (resulting in DAG structures),
discontinuous structures and non-terminal nodes corresponding to complex
semantic units. To our knowledge, the conjunction of these formal properties is
not supported by any existing parser. Our transition-based parser, which uses a
novel transition set and features based on bidirectional LSTMs, has value not
just for UCCA parsing: its ability to handle more general graph structures can
inform the development of parsers for other semantic DAG structures, and in
languages that frequently use discontinuous structures.Comment: 16 pages; Accepted as long paper at ACL201
The Tüba-D/Z treebank : annotating German with a context-free backbone
The purpose of this paper is to describe the TüBa-D/Z treebank of written German and to compare it to the independently developed TIGER treebank (Brants et al., 2002). Both treebanks, TIGER and TüBa-D/Z, use an annotation framework that is based on phrase structure grammar and that is enhanced by a level of predicate-argument structure. The comparison between the annotation schemes of the two treebanks focuses on the different treatments of free word order and discontinuous constituents in German as well as on differences in phrase-internal annotation
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
Interaction Grammars
Interaction Grammar (IG) is a grammatical formalism based on the notion of
polarity. Polarities express the resource sensitivity of natural languages by
modelling the distinction between saturated and unsaturated syntactic
structures. Syntactic composition is represented as a chemical reaction guided
by the saturation of polarities. It is expressed in a model-theoretic framework
where grammars are constraint systems using the notion of tree description and
parsing appears as a process of building tree description models satisfying
criteria of saturation and minimality
Statistical parsing of morphologically rich languages (SPMRL): what, how and whither
The term Morphologically Rich Languages (MRLs) refers to languages in which significant information concerning syntactic units and relations is expressed at word-level. There is ample evidence that the application of readily available statistical parsing models to such languages is susceptible to serious performance degradation. The first workshop on statistical parsing of MRLs hosts a variety of contributions which show that despite language-specific idiosyncrasies, the problems associated with parsing MRLs cut across languages and parsing frameworks. In this paper we review the current state-of-affairs with respect to parsing MRLs and point out central challenges. We synthesize the contributions of researchers working on parsing Arabic, Basque, French, German, Hebrew, Hindi and Korean to point out shared solutions across languages. The overarching analysis suggests itself as a source of directions for future investigations
- …