Search CORE

3,074 research outputs found

The incremental use of morphological information and lexicalization in data-driven dependency parsing

Author: Eryigit Gulsen
Eryiğit Gülşen
Nivre Joakim
Oflazer Kemal
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/12/2006
Field of study

Typological diversity among the natural languages of the world poses interesting challenges for the models and algorithms used in syntactic parsing. In this paper, we apply a data-driven dependency parser to Turkish, a language characterized by rich morphology and flexible constituent order, and study the effect of employing varying amounts of morpholexical information on parsing performance. The investigations show that accuracy can be improved by using representations based on inflectional groups rather than word forms, confirming earlier studies. In addition, lexicalization and the use of rich morphological features are found to have a positive effect. By combining all these techniques, we obtain the highest reported accuracy for parsing the Turkish Treebank

Sabanci University Research Database

A syntactified direct translation model with linear-time decoding

Author: Hassan Hany
Sima'an Khalil
Way Andy
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

Recent syntactic extensions of statistical translation models work with a synchronous context-free or tree-substitution grammar extracted from an automatically parsed parallel corpus. The decoders accompanying these extensions typically exceed quadratic time complexity. This paper extends the Direct Translation Model 2 (DTM2) with syntax while maintaining linear-time decoding. We employ a linear-time parsing algorithm based on an eager, incremental interpretation of Combinatory Categorial Grammar (CCG). As every input word is processed, the local parsing decisions resolve ambiguity eagerly, by selecting a single supertag–operator pair for extending the dependency parse incrementally. Alongside translation features extracted from the derived parse tree, we explore syntactic features extracted from the incremental derivation process. Our empirical experiments show that our model significantly outperforms the state-of-the art DTM2 system

Crossref

Irish Universities

DCU Online Research Access Service

International Migration, Integration and Social Cohesion online publications

Dependency parsing of Turkish

Author: Eryigit Gulsen
Eryiğit Gülşen
Nivre Joakim
Oflazer Kemal
Publication venue: 'MIT Press - Journals'
Publication date: 01/09/2006
Field of study

The suitability of different parsing methods for different languages is an important topic in syntactic parsing. Especially lesser-studied languages, typologically different from the languages for which methods have originally been developed, poses interesting challenges in this respect. This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative free constituent order language that can be seen as the representative of a wider class of languages of similar type. Our investigations show that morphological structure plays an essential role in finding syntactic relations in such a language. In particular, we show that employing sublexical representations called inflectional groups, rather than word forms, as the basic parsing units improves parsing accuracy. We compare two different parsing methods, one based on a probabilistic model with beam search, the other based on discriminative classifiers and a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless of parsing method.We examine the impact of morphological and lexical information in detail and show that, properly used, this kind of information can improve parsing accuracy substantially. Applying the techniques presented in this article, we achieve the highest reported accuracy for parsing the Turkish Treebank

CiteSeerX

Crossref

Sabanci University Research Database