91 research outputs found

    April 14, 1988

    Get PDF
    The Breeze is the student newspaper of James Madison University in Harrisonburg, Virginia

    September 11, 1986

    Get PDF
    The Breeze is the student newspaper of James Madison University in Harrisonburg, Virginia

    Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages

    Get PDF
    Under-resourced languages are a significant challenge for statistical approaches to machine translation, and recently it has been shown that the usage of training data from closely-related languages can improve machine translation quality of these languages. While languages within the same language family share many properties, many under-resourced languages are written in their own native script, which makes taking advantage of these language similarities difficult. In this paper, we propose to alleviate the problem of different scripts by transcribing the native script into common representation i.e. the Latin script or the International Phonetic Alphabet (IPA). In particular, we compare the difference between coarse-grained transliteration to the Latin script and fine-grained IPA transliteration. We performed experiments on the language pairs English-Tamil, English-Telugu, and English-Kannada translation task. Our results show improvements in terms of the BLEU, METEOR and chrF scores from transliteration and we find that the transliteration into the Latin script outperforms the fine-grained IPA transcription

    Knowledge Portability with Semantic Expansion of Ontology Labels

    Get PDF
    Our research focuses on the multilingual enhancement of ontologies that, often represented only in English, need to be translated in different languages to enable knowledge access across languages. Ontology translation is a rather different task then the classic document translation, because ontologies contain highly specific vocabulary and they lack contextual information. For these reasons, to improve automatic ontology translations, we first focus on identifying relevant unambiguous and domain-specific sentences from a large set of generic parallel corpora. Then, we leverage Linked Open Data resources, such as DBPedia, to isolate ontologyspecific bilingual lexical knowledge. In both cases, we take advantage of the semantic information of the labels to select relevant bilingual data with the aim of building an ontology-specific statistical machine translation system. We evaluate our approach on the translation of a medical ontology, translating from English into German. Our experiment shows a significant improvement of around 3 BLEU points compared to a generic as well as a domain-specific translation approach

    Leveraging bilingual terminology to improve machine translation in a CAT environment

    Get PDF
    This work focuses on the extraction and integration of automatically aligned bilingual terminology into a Statistical Machine Translation (SMT) system in a Computer Aided Translation (CAT) scenario. We evaluate the proposed framework that, taking as input a small set of parallel documents, gathers domain-specific bilingual terms and injects them into an SMT system to enhance translation quality. Therefore, we investigate several strategies to extract and align terminology across languages and to integrate it in an SMT system. We compare two terminology injection methods that can be easily used at run-time without altering the normal activity of an SMT system: XML markup and cache-based model. We test the cache-based model on two different domains (information technology and medical) in English, Italian and German, showing significant improvements ranging from 2.23 to 6.78 BLEU points over a baseline SMT system and from 0.05 to 3.03 compared to the widely-used XML markup approach
    • …
    corecore