Search CORE

38 research outputs found

Multi-Script Morphological Transducers And Transcribers For Seven Turkic Languages

Author: Kuyrukçu O.
Tyers F.
Washington Jonathan North
Publication venue: 'Transformative Works and Cultures'
Publication date: 08/02/2020
Field of study

This paper describes ongoing work to augment morphological transducers for seven Turkic languages with support for multiple scripts each, as well as respective IPA transcription systems. Evaluation demonstrates that our approach yields coverage equivalent to or not much lower than that of the base transducers

Proceedings Published by the LSA (Linguistic Society of America)

Works

A prototype machine translation system between Turkmen and Turkish

Author: Adali Esref
Adalı Eşref
Oflazer Kemal
Tantug A. Cuneyd
Tantuğ A. Cüneyd
Publication venue
Publication date: 01/06/2006
Field of study

In this work, we present a prototype system for translation of Turkmen texts into Turkish. Although machine translation (MT) is a very hard task, it is easier to implement a MT system between very close language pairs which have similar syntactic structure and word order. We implement a direct translation system between Turkmen and Turkish which performs a word-to-word transfer. We also use a Turkish Language Model to find the most probable Turkish sentence among all possible candidate translations generated by our system

Sabanci University Research Database

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Author: Alos i Font Héctor
Bayatlı Sevilay
Khanna Tanmai
Pirinen Flammie
Swanson Daniel
Tang Irene
Tyers Francis Morton
Washington Jonathan North
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

This paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Delineating Turkic non-finite verb forms by syntactic function

Author: Tyers Francis Morton
Washington Jonathan North
Publication venue: 'Linguistic Society of America'
Publication date: 07/10/2019
Field of study

In this paper, we argue against the primary categories of non-finite verb used in the Turkology literature: “participle” (причастие ‹pričastije›) and “converb” (деепричастие ‹dejepričastije›). We argue that both of these terms conflate several discrete phenomena, and that they furthermore are not coherent as umbrella terms for these phenomena. Based on detailed study of the non-finite verb morphology and syntax of a wide range of Turkic languages (presented here are Turkish, Kazakh, Kyrgyz, Tatar, Tuvan, and Sakha), we instead propose delineation of these categories according to their morphological and syntactic properties. Specifically, we propose that more accurate categories are verbal noun, verbal adjective, verbal adverb, and infinitive. This approach has far-reaching implications to the study of syntactic phenomena in Turkic languages, including phenomena ranging from relative clauses to clause chaining

Proceedings Published by the LSA (Linguistic Society of America)

Rule-Based Machine Translation From Kazakh To Turkish

Author: Bayatli S.
Kurnaz S.
Salimzianov I.
Tyers F. M.
Washington Jonathan North
Publication venue: 'Transformative Works and Cultures'
Publication date: 01/01/2018
Field of study

This paper presents a shallow-transfer machine translation (MT) system for translating from Kazakh to Turkish. Background on the differences between the languages is presented, followed by how the system was designed to handle some of these differences. The system is based on the Apertium free/open-source machine translation platform. The structure of the system and how it works is described, along with an evaluation against two competing systems. Linguistic components were developed, including a Kazakh-Turkish bilingual dictionary, Constraint Grammar disambiguation rules, lexical selection rules, and structural transfer rules. With many known issues yet to be addressed, our RBMT system has reached performance comparable to publicly-available corpus-based MT systems between the languages

Works

Machine Translation for Crimean Tatar to Turkish

Author: Gökırmak M.
Tyers F. M.
Washington Jonathan North
Publication venue: 'Transformative Works and Cultures'
Publication date: 01/08/2019
Field of study

In this paper a machine translation system for Crimean Tatar to Turkish is presented. To our knowledge this is the first Machine Translation system made available for public use for Crimean Tatar, and the first such system released as free and open source software. The system was built using Apertium, a free and open source machine translation system, and is currently unidirectional from Crimean Tatar to Turkish. We describe our translation system, evaluate it on parallel corpora and compare its performance with a Neural Machine Translation system, trained on the limited amount of corpora available

Works

A set of open-source tools for Turkish natural language processing

Author: Ç Agrı Çöltekin
Publication venue
Publication date: 06/03/2020
Field of study

Abstract This paper introduces a set of freely available, open-source tools for Turkish that are built around TRmorph, a morphological analyzer introduced earlier in Çöltekin (2010a). The article first provides an update on the analyzer, which includes a complete rewrite using a different finite-state description language and tool set as well as major tagset changes to comply better with the state-of-the-art computational processing of Turkish and the user requests received so far. Besides these major changes to the analyzer, this paper introduces tools for morphological segmentation, stemming and lemmatization, guessing unknown words, grapheme to phoneme conversion, hyphenation and a morphological disambiguation

CiteSeerX

Rule-based machine translation from Kazakh to Turkish

Author: Bayatli Sevilay
Kurnaz Sefer
Salimzianov Ilnar
Tyers Francis M.
Washington Jonathan North
Publication venue: European Association for Machine Translation
Publication date: 01/01/2018
Field of study

Repositorio Institucional de la Universidad de Alicante

Works

UniMorph 4.0:Universal Morphology

Author: Aiton Grant
Anastasopoulos Antonios
Andrushko Taras
Angulo Candy
Arora Aryaman
Ataman Duygu
Ate Yustinus Ghanggo
Batsuren Khuyagbaatar
Bautista Juan López
Baxi Jatayu
Bayyr-ool Aziyana
Bella Gábor
Bernardy Jean-Philippe
Bhatt Brijesh
Budianskaya Elena
Camaiteri Delio Siticonatzi
Chodroff Eleanor
Coler Matt
Cotterell Ryan
Cruz Hilaria
Czarnowska Paula
Dirix Peter
Dolatian Hossep
Ek Adam
El-Khaissi Charbel
Francis Didier López
Ganieva Sofya
Gasser Michael
Giunchiglia Fausto
Goldman Omer
Gorman Kyle
Guriel David
Habash Nizar
Hatcher Richard J.
Hennigen Lucas Torroba
Hulden Mans
Ivanova Sardana
Karahóǧa Ritván
Khalifa Salam
Kieraś Witold
Klyachko Elena
Krizhanovskaya Natalia
Krizhanovsky Andrew
Kumar Ritesh
Lane William
Leonard Brian
Liu Zoey
Marchenko Igor
Markantonatou Stella
Mashkovtseva Polina
Maudslay Rowan Hall
McCarthy Arya D.
Mielke Sabrina J.
Nepomniashchaya Maria
Nicolai Garrett
Nikkarinen Irene
Nuriah Zahroh
Oncevay Arturo
Pavlidis George
Pimentel Tiago
Pinter Yuval
Plugaryov Matvey
Ponti Edoardo M.
Prud'hommeaux Emily
Raj Mohit
Ratan Shyam
Rodionova Daria
Rojas Esaú Zumaeta
Ryskina Maria
Salchak Aelita
Salehi Ali
Salesky Elizabeth
Samame Jaime Rafael Montoya
Scherbakov Andrey
Serova Alexandra
Sheifer Karina
Silfverberg Miikka
Stoehr Niklas
Straughn Christopher
Suhardijanto Totok
Tsarfaty Reut
Tyers Francis M.
Valvoda Josef
Vania Clara
Villegas Gema Celeste Silva
Vylomova Ekaterina
Washington Jonathan North
White Jennifer
Wolinski Marcin
Yablonskaya Anna
Yarowsky David
Yemelina Anastasia
Young Jeremiah
Zariquiey Roberto
Zmigrod Ran
Publication venue: 'Center for Open Science'
Publication date: 07/05/2022
Field of study

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements made on several fronts over the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g. missing gender and macron information. We have also amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive. In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen