7,931 research outputs found
Tracking relevant alignment characteristics for machine translation
In most statistical machine translation (SMT) systems, bilingual segments are extracted via word alignment. In this paper we compare alignments tuned directly according to alignment F-score and BLEU score in order to investigate
the alignment characteristics that are helpful in translation. We report results for two different SMT systems (a phrase-based and an n-gram-based system) on Chinese to English IWSLT data, and Spanish to English
European Parliament data. We give alignment hints to improve BLEU score, depending on the SMT system used and the type of corpus
A Pattern Matching method for finding Noun and Proper Noun Translations from Noisy Parallel Corpora
We present a pattern matching method for compiling a bilingual lexicon of
nouns and proper nouns from unaligned, noisy parallel texts of
Asian/Indo-European language pairs. Tagging information of one language is
used. Word frequency and position information for high and low frequency words
are represented in two different vector forms for pattern matching. New anchor
point finding and noise elimination techniques are introduced. We obtained a
73.1\% precision. We also show how the results can be used in the compilation
of domain-specific noun phrases.Comment: 8 pages, uuencoded compressed postscript file. To appear in the
Proceedings of the 33rd AC
Alignment-guided chunking
We introduce an adaptable monolingual chunking approach–Alignment-Guided Chunking (AGC)–which makes use of knowledge of word alignments acquired from bilingual
corpora. Our approach is motivated by the observation that a sentence should be chunked differently depending
the foreseen end-tasks. For example, given the different
requirements of translation into (say) French and German, it is inappropriate to chunk up an English string in exactly the same way as preparation for translation into one
or other of these languages. We test our chunking approach
on two language pairs: French–English and German–English, where these two bilingual corpora share the same English sentences. Two chunkers trained on French–English
(FE-Chunker) and German–English(DE-Chunker ) respectively are used to perform chunking on the same English sentences. We construct two test sets, each suitable for French–
English and German–English respectively. The performance of the two chunkers is evaluated on the appropriate test set and with one reference translation only, we report Fscores
of 32.63% for the FE-Chunker and 40.41% for the DE-Chunker
- …