13,041 research outputs found
Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text
Parallel corpora are a valuable resource for machine translation, but at
present their availability and utility is limited by genre- and
domain-specificity, licensing restrictions, and the basic difficulty of
locating parallel texts in all but the most dominant of the world's languages.
A parallel corpus resource not yet explored is the World Wide Web, which hosts
an abundance of pages in parallel translation, offering a potential solution to
some of these problems and unique opportunities of its own. This paper presents
the necessary first step in that exploration: a method for automatically
finding parallel translated documents on the Web. The technique is conceptually
simple, fully language independent, and scalable, and preliminary evaluation
results indicate that the method may be accurate enough to apply without human
intervention.Comment: LaTeX2e, 11 pages, 7 eps figures; uses psfig, llncs.cls, theapa.sty.
An Appendix at http://umiacs.umd.edu/~resnik/amta98/amta98_appendix.html
contains test dat
Augmenting Translation Lexica by Learning Generalised Translation Patterns
Bilingual Lexicons do improve quality: of parallel corpora alignment, of newly extracted
translation pairs, of Machine Translation, of cross language information retrieval, among
other applications. In this regard, the first problem addressed in this thesis pertains to
the classification of automatically extracted translations from parallel corpora-collections
of sentence pairs that are translations of each other. The second problem is concerned
with machine learning of bilingual morphology with applications in the solution of first
problem and in the generation of Out-Of-Vocabulary translations.
With respect to the problem of translation classification, two separate classifiers for
handling multi-word and word-to-word translations are trained, using previously extracted
and manually classified translation pairs as correct or incorrect. Several insights
are useful for distinguishing the adequate multi-word candidates from those that are
inadequate such as, lack or presence of parallelism, spurious terms at translation ends
such as determiners, co-ordinated conjunctions, properties such as orthographic similarity
between translations, the occurrence and co-occurrence frequency of the translation
pairs. Morphological coverage reflecting stem and suffix agreements are explored as key
features in classifying word-to-word translations. Given that the evaluation of extracted
translation equivalents depends heavily on the human evaluator, incorporation of an
automated filter for appropriate and inappropriate translation pairs prior to human evaluation
contributes to tremendously reduce this work, thereby saving the time involved
and progressively improving alignment and extraction quality. It can also be applied
to filtering of translation tables used for training machine translation engines, and to
detect bad translation choices made by translation engines, thus enabling significative
productivity enhancements in the post-edition process of machine made translations.
An important attribute of the translation lexicon is the coverage it provides. Learning
suffixes and suffixation operations from the lexicon or corpus of a language is an extensively
researched task to tackle out-of-vocabulary terms. However, beyond mere words
or word forms are the translations and their variants, a powerful source of information
for automatic structural analysis, which is explored from the perspective of improving
word-to-word translation coverage and constitutes the second part of this thesis. In this
context, as a phase prior to the suggestion of out-of-vocabulary bilingual lexicon entries,
an approach to automatically induce segmentation and learn bilingual morph-like units by identifying and pairing word stems and suffixes is proposed, using the bilingual
corpus of translations automatically extracted from aligned parallel corpora, manually
validated or automatically classified. Minimally supervised technique is proposed to enable
bilingual morphology learning for language pairs whose bilingual lexicons are highly
defective in what concerns word-to-word translations representing inflection diversity.
Apart from the above mentioned applications in the classification of machine extracted
translations and in the generation of Out-Of-Vocabulary translations, learned bilingual
morph-units may also have a great impact on the establishment of correspondences of
sub-word constituents in the cases of word-to-multi-word and multi-word-to-multi-word
translations and in compression, full text indexing and retrieval applications
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
Corpus-driven bilingual lexicon extraction
This paper introduces some key aspects of machine translation in order to situate the role of the bilingual lexicon in transfer-based systems. It then discusses the data-driven approach to extracting bilingual knowledge automatically from bilingual texts, tracing the processes of alignment at different levels of granularity. The paper concludes with some suggestions for future work.peer-reviewe
Marketing and Advertising Translation: Humans vs Machines in the field of cosmetics
This undergraduate thesis focuses on a very specific field of specialized translation:
advertising and marketing translation. Indeed, the high degree of specialization involved
in this activity provides a testing ground for a reconsideration of the importance of the
human translator and a reformulation of their role. The constant development of new
technologies creates ever more sophisticated translation programs, which in turn revives
the long-standing machine vs human translation debate. The aim of this project is to
conduct a practical exercise targeted at verifying whether specialization in translation
always requires the supervision of humans equipped with the relevant linguistic
knowledge and technical background, or whether, on the contrary, machine translation
can at present provide valid enough results and a sufficient level of reliability.El presente Trabajo de Fin de Grado se centra en un campo muy concreto de la traducción
especializada: la traducción para la publicidad y la mercadotecnia. De hecho, el alto grado
de especialización que implica esta actividad proporciona un campo de pruebas para una
reconsideración de la importancia del traductor humano y una reformulación de su papel.
El desarrollo creciente e ininterrumpido de las nuevas tecnologÃas está produciendo
programas de traducción cada vez más sofisticados, lo que a su vez reaviva el viejo debate
que confronta la traducción humana y la traducción automática. El objetivo de este
proyecto es llevar a cabo un ejercicio práctico destinado a verificar si la especialización
en la traducción siempre requiere la supervisión de personas con la formación lingüÃstica
y los conocimientos técnicos pertinentes, o si, por el contrario, la traducción automática
puede en la actualidad proporcionar por si sola resultados suficientes y un nivel suficiente
de fiabilidad.Grado en Estudios Inglese
- …