474 research outputs found
Head-initial constructions in japanese
Japanese is often taken to be strictly head-final in its syntax. In our work on a broad-coverage, precision implemented HPSG for Japanese, we have found that while this is generally true, there are nonetheless a few minor exceptions to the broad trend. In this paper, we describe the grammar engineering project, present the exceptions we have found, and conclude that this kind of phenomenon motivates on the one hand the HPSG type hierarchical approach which allows for the statement of both broad generalizations and exceptions to those generalizations and on the other hand the usefulness of grammar engineering as a means of testing linguistic hypotheses
Parallel Distributed Grammar Engineering for Practical Applications
Based on a detailed case study of parallel grammar development distributed across two sites, we review some of the requirements for regression testing in grammar engineering, summarize our approach to systematic competence and performance profiling, and discuss our experience with grammar development for a commercial application. If possible, the workshop presentation will be organized around a software demonstration
TuLiPA : towards a multi-formalism parsing environment for grammar engineering
In this paper, we present an open-source parsing environment (TĂźbingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammars (TAG) and Multi-Component Tree-Adjoining Grammars with Tree Tuples (TT-MCTAG)) and allows computation not only of syntactic structures, but also of the corresponding semantic representations. It is used for the development of a tree-based grammar for German
The CoreGram Project: Theoretical Linguistics, Theory Development and Verification
This paper describes the CoreGram project, a multilingual grammar engineering project that develops HPSG grammars for several typologically diverse languages that share a common core. The paper provides a general motivation for doing theoretical linguistics the way it is done in the CoreGram project and therefore is not targeted at computational linguists exclusively. I argue for a constraint-based approach to language rather than a generative-enumerative one and discuss issues of formalization. Recent advantages in the language acquisition research are mentioned and conclusions on how theories should be constructed are drawn. The paper discusses some of the highlights in the implemented grammars, gives a brief overview of central theoretical concepts and their implementation in TRALE and compares the CoreGram project with other multilingual grammar engineering projects
Linguistic Constraints in LFG-DOP
LFG-DOP (Bod and Kaplan, 1998, 2003) provides an appealing answer to the question of how probabilistic methods can be incorporated into linguistic theory. However, despite its attractions, the standard model of LFG-DOP suffers from serious problems of overgeneration, because (a) it is unable to define fragments of the right level of generality, and (b) it has no way of capturing the effect of anything except simple positive constraints. We show how the model can be extended to overcome these problems. The question of how probabilistic methods should be incorporated into linguistic theory is important from both a practical, grammar engineering, perspective, and from the perspective of âpure â linguistic theory. From a practical point of view such techniques are essential if a system is to achieve a useful breadth of coverag
Decreasing lexical data sparsity in statistical syntactic parsing - experiments with named entities
In this paper we present preliminary experiments that aim to reduce lexical data sparsity in statistical parsing by exploiting information about named entities. Words in the
WSJ corpus are mapped to named entity clusters and a latent variable constituency parser is trained and tested on the transformed corpus. We explore two different methods for
mapping words to entities, and look at the effect of mapping various subsets of named entity types. Thus far, results show no improvement in parsing accuracy over the best baseline score; we identify possible problems and outline suggestions for future directions
- âŚ