Search CORE

524 research outputs found

A MT System from Turkmen to Turkish employing finite state and statistical methods

Author: Adali Esref
Adalı Eşref
Oflazer Kemal
Tantug A. Cuneyd
Tantuğ A. Cüneyd
Publication venue: European Association for Machine Translation (EAMT)
Publication date: 01/09/2007
Field of study

In this work, we present a MT system from Turkmen to Turkish. Our system exploits the similarity of the languages by using a modified version of direct translation method. However, the complex inflectional and derivational morphology of the Turkic languages necessitate special treatment for word-by-word translation model. We also employ morphology-aware multi-word processing and statistical disambiguation processes in our system. We believe that this approach is valid for most of the Turkic languages and the architecture implemented using FSTs can be easily extended to those languages

CiteSeerX

Sabanci University Research Database

Dependency parsing of Turkish

Author: Eryigit Gulsen
Eryiğit Gülşen
Nivre Joakim
Oflazer Kemal
Publication venue: 'MIT Press - Journals'
Publication date: 01/09/2006
Field of study

The suitability of different parsing methods for different languages is an important topic in syntactic parsing. Especially lesser-studied languages, typologically different from the languages for which methods have originally been developed, poses interesting challenges in this respect. This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative free constituent order language that can be seen as the representative of a wider class of languages of similar type. Our investigations show that morphological structure plays an essential role in finding syntactic relations in such a language. In particular, we show that employing sublexical representations called inflectional groups, rather than word forms, as the basic parsing units improves parsing accuracy. We compare two different parsing methods, one based on a probabilistic model with beam search, the other based on discriminative classifiers and a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless of parsing method.We examine the impact of morphological and lexical information in detail and show that, properly used, this kind of information can improve parsing accuracy substantially. Applying the techniques presented in this article, we achieve the highest reported accuracy for parsing the Turkish Treebank

CiteSeerX

Crossref

Sabanci University Research Database

Statistical Language Modeling for Automatic Speech Recognition of Agglutinative Languages

Author: Ebru Ar&#305
Ha&#351
Janne Pylkk&#246
Mikko Kurimo
Murat Sara&#231
Tanel Alum&#228
Teemu Hirsim&#228
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Morphological Disambiguation by Voting Constraints

Author: Oflazer Kemal
Tur Gokhan
Publication venue
Publication date: 01/01/1997
Field of study

We present a constraint-based morphological disambiguation system in which individual constraints vote on matching morphological parses, and disambiguation of all the tokens in a sentence is performed at the end by selecting parses that receive the highest votes. This constraint application paradigm makes the outcome of the disambiguation independent of the rule sequence, and hence relieves the rule developer from worrying about potentially conflicting rule sequencing. Our results for disambiguating Turkish indicate that using about 500 constraint rules and some additional simple statistics, we can attain a recall of 95-96% and a precision of 94-95% with about 1.01 parses per token. Our system is implemented in Prolog and we are currently investigating an efficient implementation based on finite state transducers.Comment: 8 pages, Latex source. To appear in Proceedings of ACL/EACL'97 Compressed postscript also available as ftp://ftp.cs.bilkent.edu.tr/pub/ko/acl97.ps.

arXiv.org e-Print Archive

CiteSeerX

Crossref

A free/open-source hybrid morphological disambiguation tool for Kazakh

Author: Abduali Balzhan
Amirova Dina
Assylbekov Zhenisbek
Karibayeva Aidana
Nurkas Assulan
Sundetova Aida
Tyers Francis
Washington Jonathan
Publication venue: DOI: 10.13140/RG.2.2.12467.43045
Publication date: 01/04/2016
Field of study

This paper presents the results of developing a morphological disambiguation tool for Kazakh. Starting with a previously developed rule-based approach, we tried to cope with the complex morphology of Kazakh by breaking up lexical forms across their derivational boundaries into inflectional groups and modeling their behavior with statistical methods. A hybrid rule-based/statistical approach appears to benefit morphological disambiguation demonstrating a per-token accuracy of 91% in running text

Nazarbayev University Repository

Statistical Morphological Disambiguation for Kazakh Language

Author: Azamat Daiana
Publication venue: Nazarbayev University School of Science and Technology
Publication date: 01/01/2016
Field of study

This paper presents the results of developing a statistical model for morphological disambiguation of Kazakh text. Starting with basic assumptions we tried to cope with the complex morphology of Kazakh language by breaking up lexical forms across their derivational boundaries into inflectional groups and modeling their behavior with statistical methods. We also provide maximum likelihood estimates for the parameters and an effective way to perform disambiguation with the Viterbi algorithm

Nazarbayev University Repository