Search CORE

352 research outputs found

A MT System from Turkmen to Turkish employing finite state and statistical methods

Author: Adali Esref
Adalı Eşref
Oflazer Kemal
Tantug A. Cuneyd
Tantuğ A. Cüneyd
Publication venue: European Association for Machine Translation (EAMT)
Publication date: 01/09/2007
Field of study

In this work, we present a MT system from Turkmen to Turkish. Our system exploits the similarity of the languages by using a modified version of direct translation method. However, the complex inflectional and derivational morphology of the Turkic languages necessitate special treatment for word-by-word translation model. We also employ morphology-aware multi-word processing and statistical disambiguation processes in our system. We believe that this approach is valid for most of the Turkic languages and the architecture implemented using FSTs can be easily extended to those languages

CiteSeerX

Sabanci University Research Database

Evaluating Multiway Multilingual NMT in the Turkic Languages

Author: Ataman Duygu
Babu Anoop
Chellappan Sriram
Firat Orhan
Ivanova Sardana
Kreutzer Julia
Licato John
Mirzakhalov Jamshidbek
Moydinboyev Bekhzodbek
Pulatova Shaxnoza
Tyers Francis M.
Uzokova Mokhiyakhon
Wahab Ahsan
Publication venue: The Association for Computational Linguistics
Publication date: 01/11/2021
Field of study

Despite the increasing number of large and comprehensive machine translation (MT) systems, evaluation of these methods in various languages has been restrained by the lack of high-quality parallel corpora as well as engagement with the people that speak these languages. In this study, we present an evaluation of state-of-the-art approaches to training and evaluating MT systems in 22 languages from the Turkic language family, most of which being extremely under-explored. First, we adopt the TIL Corpus with a few key improvements to the training and the evaluation sets. Then, we train 26 bilingual baselines as well as a multi-way neural MT (MNMT) model using the corpus and perform an extensive analysis using automatic metrics as well as human evaluations. We find that the MNMT model outperforms almost all bilingual baselines in the out-of-domain test sets and finetuning the model on a downstream task of a single pair also results in a huge performance boost in both low- and high-resource scenarios. Our attentive analysis of evaluation criteria for MT models in Turkic languages also points to the necessity for further research in this direction. We release the corpus splits, test sets as well as models to the public.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

A Large-Scale Study of Machine Translation in Turkic Languages

Author: Ataman Duygu
Babu Anoop
Chellappan Sriram
Firat Orhan
Hajili Mammad
Ivanova Sardana
Kariev Sherzod
Khaytbaev Abror
Laverghetta Jr. Antonio
Mirzakhalov Jamshidbek
Moydinboyev Bekhzodbek
Onal Esra
Otabek Abduraufov Otabek
Pulatova Shaxnoza
Tyers Francis M.
Wahab Ahsan
Publication venue: The Association for Computational Linguistics
Publication date: 01/11/2021
Field of study

Recent advances in neural machine translation (NMT) have pushed the quality of machine translation systems to the point where they are becoming widely adopted to build competitive systems. However, there is still a large number of languages that are yet to reap the benefits of NMT. In this paper, we provide the first large-scale case study of the practical application of MT in the Turkic language family in order to realize the gains of NMT for Turkic languages under high-resource to extremely low-resource scenarios. In addition to presenting an extensive analysis that identifies the bottlenecks towards building competitive systems to ameliorate data scarcity, our study has several key contributions, including, i) a large parallel corpus covering 22 Turkic languages consisting of common public datasets in combination with new datasets of approximately 1.4 million parallel sentences, ii) bilingual baselines for 26 language pairs, iii) novel high-quality test sets in three different translation domains and iv) human evaluation scores. All models, scripts, and data will be released to the public.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Toward Computational Processing of Less Resourced Languages: Primarily Experiments for Moroccan Amazigh Language

Author: Ataa Allah Fadoua
Boulaknadel Siham
Publication venue: 'IntechOpen'
Publication date: 21/11/2012
Field of study

IntechOpen

Crossref

China\u27s Foreign Relations: Selected Studies

Author: Chan F. Gilbert
Yip Ka-che
Publication venue: DigitalCommons@UM Carey Law
Publication date: 01/01/1980
Field of study

Digital Commons @ UM Law

Hesaplamalı Dil Bilimleri ve Uygur Dili Araştırmaları

Author: Murat Orhun
Publication venue: Nevşehir Hacı Bektaş Veli Üniversitesi
Publication date: 01/09/2017
Field of study

Bu makalede hesaplamalı dil bilimleri kısaca anlatılmıştır ve Uygurca ile ilgili yapılan güncel hesaplamalı dil bilim araştırmaları özetlenmiştir. Teknolojinin ilerlemesi ile farklı dillere yönelik bilgisayar destekli çalışmalarda büyük başarılar elde edilmiştir. Örneğin, metinlerde içerik yönetme, bilgi edinme, konuşma sistemleri, dosya kümeleme, metin madenciliği, yazı kontrolü, yazıyı sese çevirme, sesi yazıya çevirme ve farklı diller arasında otomatik (bilgisayarlı çeviri) gibi uygulamalar geliştirilmiştir ve gerçek hayata kullanılmaktadır. Gerçi Fince, Japonca, Macarca ve Türkçe gibi Ural-Altay dilleri grubuna ait bazı diller ile ilgili birçok çalışmalar yapılsa bile, ancak yine bazı diller, örneğin Uygurca, ile ilgili yapılan çalışmalar çok az bilinmektedir. Hesaplamalı dil bilimi ile ilgili araştırmaları geliştirmek ve farklı diller arasındaki ilişkileri analiz edebilmek için, bu makalede, Uygurca ile ilgili yapılan bilgisayar destekli araştırmalar, özellik ile bilgisayarlı çeviri ile ilgili yapılan en son temel niteliğindeki çalışmalar toparlanmıştır. Aynı anda dil bilimcileri ile hesaplamalı dil bilimleri arasındaki bağıntı analiz edilmiştir

Directory of Open Access Journals

Enhancing Bi-directional English-Tigrigna Machine Translation Using Hybrid Approach

Author: Assres Gebremariam Mesfin
Atsbaha Mulugeta
Berihu Zemicheal
Grønli Tor-Morten
Publication venue: NIKT Foundation
Publication date: 23/11/2020
Field of study

Machine Translation (MT) is an application area of NLP where automatic systems are used to translate text or speech from one language to another while preserving the meaning of the source language. Although there exists a large volume of literature in automatic machine translation of documents in many languages, the translation between English and Tigrigna is less explored. Therefore, we proposed the hybrid approach to address the challenges of applying syntactic reordering rules which align and capture the structural arrangement of words in the source sentence to become more like the target sentences. Two language models were developed- one for English and another for Tigrigna and about 12,000 parallel sentences in four domains and 32,000 bilingual dictionaries were collected for our experiment. The parallel collected corpus was split randomly to 10,800 sentences for training set and 1,200 sentences for testing. Moses open source statistical machine translation system has been used for the experiment to train, tune and decode. The parallel corpus was aligned using the Giza++ toolkit and SRILM was used for building the language model. Three main experiments were conducted using statistical approach, hybrid approach and post-processing technique. According to our experimental result showed good translation output as high as 32.64 BLEU points Google translator and the hybrid approach was found most promising for English-Tigrigna bi-directional translation

BIBSYS: Open Journals Systems

Proceedings of the 1st Conference on Central Asian Languages and Linguistics (ConCALL)

Author: Kent Amber Kennedy
Özçelik Öner
Publication venue
Publication date: 29/06/2015
Field of study

The Conference on Central Asian Languages and Linguistics (ConCALL) was founded in 2014 at Indiana University by Dr. Öner Özçelik, the residing director of the Center for Languages of the Central Asian Region (CeLCAR). As the nation’s sole U.S. Department of Education funded Language Resource Center focusing on the languages of the Central Asian Region, CeLCAR’s main mission is to strengthen and improve the nation’s capacity for teaching and learning Central Asian languages through teacher training, research, materials development projects, and dissemination. As part of this mission, CeLCAR has an ultimate goal to unify and fortify the Central Asian language learning community by facilitating networking between linguists and language educators, encouraging research projects that will inform language instruction, and provide opportunities for professionals in the field to both showcase their work and receive feedback from their peers. Thus ConCALL was established to be the first international academic conference to bring together linguists and language educators in the languages of the Central Asian region, including both the Altaic and Eastern Indo-European languages spoken in the region, to focus on research into how these specific languages are represented formally, as well as acquired by second/foreign language learners, and also to present research driven teaching methods. Languages served by ConCALL include, but are not limited to: Azerbaijani, Dari, Karakalpak, Kazakh, Kyrgyz, Lokaabharan, Mari, Mongolian, Pamiri, Pashto, Persian, Russian, Shughnani, Tajiki, Tibetan, Tofalar, Tungusic, Turkish, Tuvan, Uyghur, Uzbek, Wakhi and more!The Conference on Central Asian Languages and Linguistics held at Indiana University on 16-17 May 1014 was made possible through the generosity of our sponsors: Center for Languages of the Central Asian Region (CeLCAR), Ostrom Grant Programs, IU's College of Arts and Humanities Center (CAHI), Inner Asian and Uralic National Resource Center (IAUNRC), IU's School of Global and International Studies (SGIS), IU's College of Arts and Sciences, Sinor Research Institute for Inner Asian Studies (SRIFIAS), IU's Department of Central Eurasian Studies (CEUS), and IU's Department of Linguistics

IUScholarWorks (University of Indiana)

Pagodas, polities, period and place: a data led exploration of the regional and chronological context of Liao dynasty architecture

Author: Dugdale Jonathan S
Publication venue
Publication date: 10/12/2019
Field of study

The Liao dynasty (907-1125) was a dominant force in the political landscape of East Asia for a period of over two centuries. Despite this, when placed within the framework of Chinese history, the Liao polity and its associated architecture are forced to the periphery. This study aims to re-centre the Liao by exploring the pagodas constructed under this polity within a wider regional and chronological framework. To achieve this end, extant pagodas from China, North Korea, South Korea and Japan were recorded together in a database for the first time. The HEAP (Historical East Asian Pagoda) Database logs the date, location and feature set of each pagoda it contains and provides a means to compare Liao examples to those from other polities, places and periods. Through analysis and visualisations of this data, the Liao are identified as a polity that produced unique pagoda designs and a distinct visual style. While Liao pagodas played a major role in the wider design trends of the period, it is the influence they had at a more local level that may be of most significance, potentially making us rethink the way we frame and construct histories of architecture in China and East Asia

University of Birmingham Research Archive, E-theses Repository

Muslim separatism in Northwest China during the Republican period, 1911-1949.

Author: Forbes A.D.W
Publication venue: University of Leeds
Publication date: 01/01/1981
Field of study

Available from British Library Document Supply Centre-DSC:DX214678 / BLDSC - British Library Document Supply CentreSIGLEGBUnited Kingdo

White Rose E-theses Online

OpenGrey Repository