Search CORE

91 research outputs found

April 14, 1988

Author: Arcan Mihael
Buitelaar Paul
Popovic Maja
Publication venue: JMU Scholarly Commons
Publication date: 14/04/1988
Field of study

The Breeze is the student newspaper of James Madison University in Harrisonburg, Virginia

Irish Universities

James Madison University

Access to Research at National University of Ireland, Galway

September 11, 1986

Author: Arcan Mihael
Monti Johanna
Sangati Federico
Publication venue: JMU Scholarly Commons
Publication date: 11/09/1986
Field of study

The Breeze is the student newspaper of James Madison University in Harrisonburg, Virginia

Irish Universities

James Madison University

Access to Research at National University of Ireland, Galway

Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages

Author: Arcan Mihael
Chakravarthi Bharathi Raja
McCrae John P.
Publication venue: OASIcs - OpenAccess Series in Informatics. 2nd Conference on Language, Data and Knowledge (LDK 2019)
Publication date: 01/01/2019
Field of study

Under-resourced languages are a significant challenge for statistical approaches to machine translation, and recently it has been shown that the usage of training data from closely-related languages can improve machine translation quality of these languages. While languages within the same language family share many properties, many under-resourced languages are written in their own native script, which makes taking advantage of these language similarities difficult. In this paper, we propose to alleviate the problem of different scripts by transcribing the native script into common representation i.e. the Latin script or the International Phonetic Alphabet (IPA). In particular, we compare the difference between coarse-grained transliteration to the Latin script and fine-grained IPA transliteration. We performed experiments on the language pairs English-Tamil, English-Telugu, and English-Kannada translation task. Our results show improvements in terms of the BLEU, METEOR and chrF scores from transliteration and we find that the transliteration into the Latin script outperforms the fine-grained IPA transcription

ZENODO

Dagstuhl Research Online Publication Server

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Knowledge Portability with Semantic Expansion of Ontology Labels

Author: Arcan Mihael
Buitelaar Paul
Turchi Marco
Publication venue: The Association for Computer Linguistics
Publication date
Field of study

Our research focuses on the multilingual enhancement of ontologies that, often represented only in English, need to be translated in different languages to enable knowledge access across languages. Ontology translation is a rather different task then the classic document translation, because ontologies contain highly specific vocabulary and they lack contextual information. For these reasons, to improve automatic ontology translations, we first focus on identifying relevant unambiguous and domain-specific sentences from a large set of generic parallel corpora. Then, we leverage Linked Open Data resources, such as DBPedia, to isolate ontologyspecific bilingual lexical knowledge. In both cases, we take advantage of the semantic information of the labels to select relevant bilingual data with the aim of building an ontology-specific statistical machine translation system. We evaluate our approach on the translation of a medical ontology, translating from English into German. Our experiment shows a significant improvement of around 3 BLEU points compared to a generic as well as a domain-specific translation approach

Archivio della ricerca - Fondazione Bruno Kessler

Leveraging bilingual terminology to improve machine translation in a CAT environment

Author: Arcan Mihael
Buitelaar Paul
Tonelli Sara
Turchi Marco
Publication venue
Publication date: 30/05/2017
Field of study

This work focuses on the extraction and integration of automatically aligned bilingual terminology into a Statistical Machine Translation (SMT) system in a Computer Aided Translation (CAT) scenario. We evaluate the proposed framework that, taking as input a small set of parallel documents, gathers domain-specific bilingual terms and injects them into an SMT system to enhance translation quality. Therefore, we investigate several strategies to extract and align terminology across languages and to integrate it in an SMT system. We compare two terminology injection methods that can be easily used at run-time without altering the normal activity of an SMT system: XML markup and cache-based model. We test the cache-based model on two different domains (information technology and medical) in English, Italian and German, showing significant improvements ranging from 2.23 to 6.78 BLEU points over a baseline SMT system and from 0.05 to 3.03 compared to the widely-used XML markup approach

Crossref

Archivio della ricerca - Fondazione Bruno Kessler