Search CORE

3 research outputs found

Recommended from our members

Word Alignment for Languages with Scarce Resources

Author: Martin Joel
Mihalcea Rada, 1974-
Pedersen Ted
Publication venue
Publication date: 01/06/2005
Field of study

This paper presents the task definition, resources, participating systems, and comparative results for the shared task on word alignment which was organized as part of the Association for Computational Linguistics (ACL) 2005 Workshop on Building and Using Parallel Texts. The shared task included English-Inuktitut, Romanian-English, and English-Hindi sub-tasks, and drew the participation of ten teams from around the world with a total of 50 systems

UNT Digital Library

Improving machine translation performance using comparable corpora

Author: Andreas Eisele
Jia Xu
Jia Xu}@dfki De
{andreas Eisele
Publication venue
Publication date: 01/01/2010
Field of study

Abstract The overwhelming majority of the languages in the world are spoken by less than 50 million native speakers, and automatic translation of many of these languages is less investigated due to the lack of linguistic resources such as parallel corpora. In the ACCURAT project we will work on novel methods how comparable corpora can compensate for this shortage and improve machine translation systems of under-resourced languages. Translation systems on eighteen European language pairs will be investigated and methodologies in corpus linguistics will be greatly advanced. We will explore the use of preliminary SMT models to identify the parallel parts within comparable corpora, which will allow us to derive better SMT models via a bootstrapping loop

CiteSeerX

Aligning Words in English-Hindi Parallel Corpora

Author: Niraj Aswani
Robert Gaizauskas
Publication venue
Publication date: 01/01/2005
Field of study

In this paper, we describe a word alignment algorithm for English-Hindi parallel data. The system was developed to participate in the shared task on word alignment for languages with scarce resources at the ACL 2005 workshop, on “Building and using parallel texts: data driven machine translation and beyond”. Our word alignment algorithm is based on a hybrid method which performs local word grouping on Hindi sentences and uses other methods such as dictionary lookup, transliteration similarity, expected English words and nearest aligned neighbours. We trained our system on the training data provided to obtain a list of named entities and cognates and to collect rules for local word grouping in Hindi sentences. The system scored 77.03% precision and 60.68 % recall on the shared task unseen test data.

CiteSeerX

Crossref