Search CORE

13,041 research outputs found

Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text

Author: Resnik Philip
Publication venue
Publication date: 01/01/1998
Field of study

Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genre- and domain-specificity, licensing restrictions, and the basic difficulty of locating parallel texts in all but the most dominant of the world's languages. A parallel corpus resource not yet explored is the World Wide Web, which hosts an abundance of pages in parallel translation, offering a potential solution to some of these problems and unique opportunities of its own. This paper presents the necessary first step in that exploration: a method for automatically finding parallel translated documents on the Web. The technique is conceptually simple, fully language independent, and scalable, and preliminary evaluation results indicate that the method may be accurate enough to apply without human intervention.Comment: LaTeX2e, 11 pages, 7 eps figures; uses psfig, llncs.cls, theapa.sty. An Appendix at http://umiacs.umd.edu/~resnik/amta98/amta98_appendix.html contains test dat

arXiv.org e-Print Archive

CiteSeerX

Digital Repository at the University of Maryland

Augmenting Translation Lexica by Learning Generalised Translation Patterns

Author: Mahesh Kavitha Karimbi
Publication venue
Publication date: 01/06/2017
Field of study

Bilingual Lexicons do improve quality: of parallel corpora alignment, of newly extracted translation pairs, of Machine Translation, of cross language information retrieval, among other applications. In this regard, the first problem addressed in this thesis pertains to the classification of automatically extracted translations from parallel corpora-collections of sentence pairs that are translations of each other. The second problem is concerned with machine learning of bilingual morphology with applications in the solution of first problem and in the generation of Out-Of-Vocabulary translations. With respect to the problem of translation classification, two separate classifiers for handling multi-word and word-to-word translations are trained, using previously extracted and manually classified translation pairs as correct or incorrect. Several insights are useful for distinguishing the adequate multi-word candidates from those that are inadequate such as, lack or presence of parallelism, spurious terms at translation ends such as determiners, co-ordinated conjunctions, properties such as orthographic similarity between translations, the occurrence and co-occurrence frequency of the translation pairs. Morphological coverage reflecting stem and suffix agreements are explored as key features in classifying word-to-word translations. Given that the evaluation of extracted translation equivalents depends heavily on the human evaluator, incorporation of an automated filter for appropriate and inappropriate translation pairs prior to human evaluation contributes to tremendously reduce this work, thereby saving the time involved and progressively improving alignment and extraction quality. It can also be applied to filtering of translation tables used for training machine translation engines, and to detect bad translation choices made by translation engines, thus enabling significative productivity enhancements in the post-edition process of machine made translations. An important attribute of the translation lexicon is the coverage it provides. Learning suffixes and suffixation operations from the lexicon or corpus of a language is an extensively researched task to tackle out-of-vocabulary terms. However, beyond mere words or word forms are the translations and their variants, a powerful source of information for automatic structural analysis, which is explored from the perspective of improving word-to-word translation coverage and constitutes the second part of this thesis. In this context, as a phase prior to the suggestion of out-of-vocabulary bilingual lexicon entries, an approach to automatically induce segmentation and learn bilingual morph-like units by identifying and pairing word stems and suffixes is proposed, using the bilingual corpus of translations automatically extracted from aligned parallel corpora, manually validated or automatically classified. Minimally supervised technique is proposed to enable bilingual morphology learning for language pairs whose bilingual lexicons are highly defective in what concerns word-to-word translations representing inflection diversity. Apart from the above mentioned applications in the classification of machine extracted translations and in the generation of Out-Of-Vocabulary translations, learned bilingual morph-units may also have a great impact on the establishment of correspondences of sub-word constituents in the cases of word-to-multi-word and multi-word-to-multi-word translations and in compression, full text indexing and retrieval applications

Repositório da Universidade Nova de Lisboa

Bilingually motivated segmentation and generation of word translations using relatively small translation data sets

Author: Gomes Luis
Lopes Jose Gabriel P.
Mahesh Kavitha Karimbi
Publication venue
Publication date: 01/01/2015
Field of study

Waseda University Repository

Improving the translation environment for professional translators

Author: Augustinus Liesbeth
Bulté Bram
Buysschaert Joost
Coppers Sven
Daems Joke
Heyman Geert
Hoste Veronique
Lefever Els
Luyten Kris
Macken Lieve
Moens Marie-Francine
Pelemans Joris
Rigouts Terryn Ayla
Steurs Frieda
Tezcan Arda
Van den Bergh Jan
van der Lek-Ciudin Iulianna
Van Eynde Frank
Vanallemeersch Tom
Vandeghinste Vincent
Verwimp Lyan
Wambacq Patrick
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

Multidisciplinary Digital Publishing Institute

Ghent University Academic Bibliography

Corpus-driven bilingual lexicon extraction

Author: 2nd Computer Science Annual Workshop (CSAW’04)
Rosner Michael
Publication venue: University of Malta. Faculty of ICT
Publication date: 01/01/2004
Field of study

This paper introduces some key aspects of machine translation in order to situate the role of the bilingual lexicon in transfer-based systems. It then discusses the data-driven approach to extracting bilingual knowledge automatically from bilingual texts, tracing the processes of alignment at different levels of granularity. The paper concludes with some suggestions for future work.peer-reviewe

OAR@UM

Marketing and Advertising Translation: Humans vs Machines in the field of cosmetics

Author: Alonso del Caño Carmen
Publication venue
Publication date: 01/01/2019
Field of study

This undergraduate thesis focuses on a very specific field of specialized translation: advertising and marketing translation. Indeed, the high degree of specialization involved in this activity provides a testing ground for a reconsideration of the importance of the human translator and a reformulation of their role. The constant development of new technologies creates ever more sophisticated translation programs, which in turn revives the long-standing machine vs human translation debate. The aim of this project is to conduct a practical exercise targeted at verifying whether specialization in translation always requires the supervision of humans equipped with the relevant linguistic knowledge and technical background, or whether, on the contrary, machine translation can at present provide valid enough results and a sufficient level of reliability.El presente Trabajo de Fin de Grado se centra en un campo muy concreto de la traducción especializada: la traducción para la publicidad y la mercadotecnia. De hecho, el alto grado de especialización que implica esta actividad proporciona un campo de pruebas para una reconsideración de la importancia del traductor humano y una reformulación de su papel. El desarrollo creciente e ininterrumpido de las nuevas tecnologías está produciendo programas de traducción cada vez más sofisticados, lo que a su vez reaviva el viejo debate que confronta la traducción humana y la traducción automática. El objetivo de este proyecto es llevar a cabo un ejercicio práctico destinado a verificar si la especialización en la traducción siempre requiere la supervisión de personas con la formación lingüística y los conocimientos técnicos pertinentes, o si, por el contrario, la traducción automática puede en la actualidad proporcionar por si sola resultados suficientes y un nivel suficiente de fiabilidad.Grado en Estudios Inglese

Repositorio Documental de la Universidad de Valladolid