Search CORE

14 research outputs found

Automatic Construction of Clean Broad-Coverage Translation Lexicons

Author: Melamed I. Dan
Publication venue
Publication date: 01/01/1996
Field of study

Word-level translational equivalences can be extracted from parallel texts by surprisingly simple statistical techniques. However, these techniques are easily fooled by {\em indirect associations} --- pairs of unrelated words whose statistical properties resemble those of mutual translations. Indirect associations pollute the resulting translation lexicons, drastically reducing their precision. This paper presents an iterative lexicon cleaning method. On each iteration, most of the remaining incorrect lexicon entries are filtered out, without significant degradation in recall. This lexicon cleaning technique can produce translation lexicons with recall and precision both exceeding 90\%, as well as dictionary-sized translation lexicons that are over 99\% correct.Comment: PostScript file, 10 pages. To appear in Proceedings of AMTA-9

arXiv.org e-Print Archive

CiteSeerX

Manual Annotation of Translational Equivalence: The Blinker Project

Author: Melamed I. Dan
Publication venue
Publication date: 01/01/1998
Field of study

Bilingual annotators were paid to link roughly sixteen thousand corresponding words between on-line versions of the Bible in modern French and modern English. These annotations are freely available to the research community from http://www.cis.upenn.edu/~melamed . The annotations can be used for several purposes. First, they can be used as a standard data set for developing and testing translation lexicons and statistical translation models. Second, researchers in lexical semantics will be able to mine the annotations for insights about cross-linguistic lexicalization patterns. Third, the annotations can be used in research into certain recently proposed methods for monolingual word-sense disambiguation. This paper describes the annotated texts, the specially-designed annotation tool, and the strategies employed to increase the consistency of the annotations. The annotation process was repeated five times by different annotators. Inter-annotator agreement rates indicate that the annotations are reasonably reliable and that the method is easy to replicate

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

Automatic Acquisition of a High-Precision Translation Lexicon from Parallel Chinese-English Corpora

Author: Gao Zhao-Ming
Publication venue: Chinese and Oriental Languages Information Processing Society
Publication date: 01/01/1998
Field of study

Waseda University Repository

Accuracy and robustness in measuring the lexical similarity of semantic role fillers for automatic semantic MT evaluation

Author: Lo Chi-kiu
Tumuluru Anand Karthik
Wu Dekai
Publication venue: 'Faculty of Computer Science, Universitas Indonesia'
Publication date: 01/01/2012
Field of study

Waseda University Repository

Translation memory and computer assisted translation tool for medieval texts

Author: Törcsvári Attila
Publication venue: 'University of Bern'
Publication date: 29/10/2015
Field of study

Translation memories (TMs), as part of Computer Assisted Translation (CAT) tools, support translators reusing portions of formerly translated text. Fencing books are good candidates for using TMs due to the high number of repeated terms. Medieval texts suffer a number of drawbacks that make hard even “simple” rewording to the modern version of the same language. The analyzed difficulties are: lack of systematic spelling, unusual word orders and typos in the original. A hypothesis is made and verified that even simple modernization increases legibility and it is feasible, also it is worthwhile to apply translation memories due to the numerous and even extremely long repeated terms. Therefore, methods and algorithms are presented 1. for automated transcription of medieval texts (when a limited training set is available), and 2. collection of repeated patterns. The efficiency of the algorithms is analyzed for recall and precision

BOP Serials

Adding domain specificity to an MT system

Author: Jessie Pinkham
Monica Corston-oliver
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2001
Field of study

In the development of a machine translation system, one important issue is being able to adapt to a specific domain without requiring time-consuming lexical work. We have experimented with using a statistical word-alignment algorithm to derive word association pairs (French-English) that complement an existing multi-purpose bilingual dictionary. This word association information is added to the system at the time of the automatic creation of our translation pattern database, thereby making this database more domain specific. This technique significantly improves the overall quality of translation, as measured in an independent blind evaluation.

CiteSeerX

Crossref

Word-to-Word Models of Translational Equivalence

Author: Melamed I. Dan
Publication venue
Publication date: 01/01/1997
Field of study

Parallel texts (bitexts) have properties that distinguish them from other kinds of parallel data. First, most words translate to only one other word. Second, bitext correspondence is noisy. This article presents methods for biasing statistical translation models to reflect these properties. Analysis of the expected behavior of these biases in the presence of sparse data predicts that they will result in more accurate models. The prediction is confirmed by evaluation with respect to a gold standard -- translation models that are biased in this fashion are significantly more accurate than a baseline knowledge-poor model. This article also shows how a statistical translation model can take advantage of various kinds of pre-existing knowledge that might be available about particular language pairs. Even the simplest kinds of language-specific knowledge, such as the distinction between content words and function words, is shown to reliably boost translation model performance on some tasks. Statistical models that are informed by pre-existing knowledge about the model domain combine the best of both the rationalist and empiricist traditions

arXiv.org e-Print Archive

CiteSeerX

Cross-Lingual Bootstrapping of Semantic Lexicons: The Case of FrameNet

Author: Lapata Maria
Padó Sebastian
Publication venue
Publication date: 01/01/2005
Field of study

Edinburgh Research Explorer

Evaluation in natural language processing

Author: Santos Diana
Publication venue
Publication date: 08/12/2008
Field of study

quot; European Summer School on Language Logic and Information(ESSLLI 2007)(Trinity College Dublin Ireland 6-17 August 2007

Repositório Comum

Automatic Construction Of Clean Broad-Coverage Translation Lexicons

Author: I. Dan Melamed
Publication venue
Publication date
Field of study

Word-level translational equivalences can be extracted from parallel texts by surprisingly simple statistical techniques. However, these techniques are easily fooled by indirect associations --- pairs of unrelated words whose statistical properties resemble those of mutual translations. Indirect associations pollute the resulting translation lexicons, drastically reducing their precision. This paper presents an iterative lexicon cleaning method. On each iteration, most of the remaining incorrect lexicon entries are filtered out, without significant degradation in recall. This lexicon cleaning technique can produce translation lexicons with recall and precision both exceeding 90%, as well as dictionary-sized translation lexicons that are over 99% correct. 1 Introduction Translation lexicons are explicit representations of translational equivalence at the word level. They are central to any machine translation system, and play a vital role in other multilingual applications, including ..

CiteSeerX