8 research outputs found

    Translation practice at the EU institutions: focus on a concordancing tool

    Get PDF
    Translation has always played a major role within the European institutions because it provides the basis for democracy and communication among the Member States and between the EU and its citizens. The enlargements brought about changes in the internal organization of the institutions – including translation services and their workflow – to respond to the new challenge of accommodating 23 official languages. A greater need for translation support was met thanks to a growing number of shared tools and resources developed over time, such as centralised web-based applications and meta-search engines. This paper focuses on one specific tool available to translators working at the EU institutions, i.e. an internally developed multilingual concordancer. Concordancers are widely used by translators but little information is available about them in terms of tool evaluation or user behaviour. This article presents a PhD research project aimed to partly fill this gap by investigating the relationship between concordance searches (seen as manifestations of translation problems) and language combination within the EU translation services

    Concordancing Software in Practice: An investigation of searches and translation problems across EU official languages

    Get PDF
    2011/2012The present work reports on an empirical study aimed at investigating translation problems across multiple language pairs. In particular, the analysis is aimed at developing a methodological approach to study concordance search logs taken as manifestations of translation problems and, in a wider perspective, information needs. As search logs are a relatively unexplored data type within translation process research, a controlled environment was needed in order to carry out this exploratory analysis without incurring in additional problems caused by an excessive amount of variables. The logs were collected at the European Commission and contain a large volume of searches from English into 20 EU languages that staff translators working for the EU translation services submitted to an internally available multilingual concordancer. The study attempts to (i) identify differences in the searches (i.e. problems) based on the language pairs; and (ii) group problems into types. Furthermore, the interactions between concordance users and the tool itself have been examined to provide a translation-oriented perspective on the domain of Human-Computer Interaction. The study draws on the literature on translation problems, Information Retrieval and Web search log analysis, moving from the assumption that in the perspective of concordance searching, translation problems are best interpreted as information needs for which the concordancer is chosen as a form of external support. The structure of a concordance search is examined in all its parts and is eventually broken down into two main components: the 'Search Strategy' component and the 'Problem Unit' component. The former was further analyzed using a mainly quantitative approach, whereas the latter was addressed from a more qualitative perspective. The analysis of the Problem Unit takes into account the length of the search strings as well as their content and linguistic form, each addressed with a different methodological approach. Based on the understanding of concordance searches as manifestations of translation problems, a user- centered classification of translation-oriented information needs is developed to account for as many "problem" scenarios as possible. According to the initial expectations, different languages should experience different problems. This assumption could not be verified: the 20 different language pairs considered in this study behaved consistently on many levels and, due to the specific research environment, no definite conclusions could be reached as regards the role of the language family criterion for problem identification. The analysis of the 'Problem Unit' component has highlighted automatized support for translating Named Entities as a possible area for further research in translation technology and the development of computer-based translation support tools. Finally, the study indicates (concordance) search logs as an additional data type to be used in experiments on the translation process and for triangulation purposes, while drawing attention on the concordancer as a type of translation aid to be further fine-tuned for the needs of professional translators. ***Il presente lavoro consiste in uno studio empirico sui problemi di traduzione che emergono quando si considerano diverse coppie di lingue e in particolare sviluppa una metodologia per analizzare i log di ricerche effettuate dai traduttori in un software di concordanza (concordancer) quali manifestazioni di problemi di traduzione che, visti in una prospettiva più ampia, si possono anche considerare dei "bisogni d'informazione" (information needs). I log di ricerca costituiscono una tipologia di dato ancora relativamente nuova e inesplorata nell'ambito delle ricerche sul processo di traduzione e pertanto è emersa la necessità di svolgere un'analisi di tipo esplorativo in un contesto controllato onde evitare le problematiche aggiuntive derivanti da un numero eccessivo di variabili. I log di ricerca sono stati raccolti presso la Commissione europea e contengono quantitativi ingenti di ricerche effettuate dai traduttori impiegati presso i servizi di traduzione dell'Unione europea in un concordancer multilingue disponibile come risorsa interna. L'analisi si propone di individuare le differenze nelle ricerche (e quindi nei problemi) a seconda della coppia di lingue selezionata e di raggruppare tali problemi in tipologie. Lo studio fornisce inoltre informazioni sulle modalità di interazione tra gli utenti e il software nell'ambito di un contesto traduttivo, contribuendo alla ricerca nel campo dell'interazione uomo-macchina (Human-Computer Interaction). Il presente studio trae spunto dalla letteratura sui problemi di traduzione, sull'estrazione d'informazioni (Information Retrieval) e sulle ricerche nel Web e si propone di considerare i problemi di traduzione associati all'impiego di uno strumento per le concordanze quali bisogni di informazione per i quali lo strumento di concordanze è stato scelto come forma di supporto esterna. Ogni singola ricerca è stata esaminata e scomposta in due elementi principali: la "strategia di ricerca" (Search Strategy) e l'"unità problematica" (Problem Unit) che vengono studiati rispettivamente usando approcci prevalentemente di tipo quantitativo e qualitativo. L'analisi dell'unità problematica prende in considerazione la lunghezza, il contenuto e la forma linguistica delle stringhe, analizzando ciascuna con una metodologia di lavoro appositamente studiata. Avendo interpretato le ricerche di concordanze quali manifestazioni di bisogni d'informazione, l'analisi prosegue con la definizione di una serie di categorie di bisogni d'informazione (o problemi) legati alla traduzione e incentrati sul singolo utente al fine di includere quanti più scenari di ricerca possibile. L'assunto iniziale in base al quale lingue diverse manifesterebbero problemi diversi non è stato verificato empiricamente in quanto le 20 coppie di lingue esaminate hanno mostrato comportamenti alquanto similari nei diversi livelli di analisi. Vista la peculiarità dei dati utilizzati e la specificità dell'Unione europea come contesto di ricerca, non è stato possibile ottenere conclusioni definitive in merito al ruolo delle famiglie linguistiche quali indicatori di problemi, rispetto ad altri criteri di classificazione. L'analisi dell'unità problematica ha evidenziato le entità denominate (Named Entities) quale possibile oggetto di futuri progetti di ricerca nell'ambito delle tecnologie della traduzione. Oltre a offrire un contributo per i futuri sviluppi nell'ambito dei supporti informatici alla traduzione, con il presente studio si è voluto altresì presentare i log delle ricerche (di concordanze) quale tipologia aggiuntiva di dati per lo studio del processo di traduzione e per la triangolazione dei risultati empirico-sperimentali, cercando anche di suggerire possibili tratti migliorativi dei software di concordanza sulla base dei bisogni di informazione riscontrati nei traduttori.XXV Ciclo198

    Statistically motivated example-based machine translation using translation memory

    Get PDF
    In this paper we present a novel way of integrating Translation Memory into an Example-based Machine translation System (EBMT) to deal with the issue of low resources. We have used a dialogue of 380 sentences as the example-base for our system. The translation units in the Translation Memories are automatically extracted based on the aligned phrases (words) of a statistical machine translation (SMT) system. We attempt to use the approach to improve translation from English to Bangla as many statistical machine translation systems have difficulty with such small amounts of training data. We have found the approach shows improvement over a baseline SMT system

    Sub-sentential alignment of translational correspondences

    Get PDF
    The focus of this thesis is sub-sentential alignment, i.e. the automatic alignment of translational correspondences below sentence level. The system that we developed takes as its input sentence-aligned parallel texts and aligns translational correspondences at the sub-sentential level, which can be words, word groups or chunks. The research described in this thesis aims to be of value to the developers of computer-assisted translation tools and to human translators in general. Two important aspects of this research are its focus on different text types and its focus on precision. In order to cover a wide range of syntactic and stylistic phenomena that emerge from different writing and translation styles, we used parallel texts of different text types. As the intended users are ultimately human translators, our explicit aim was to develop a model that aligns segments with a very high precision. This thesis consists of three major parts. The first part is introductory and focuses on the manual annotation, the resources used and the evaluation methodology. The second part forms the main contribution of this thesis and describes the sub-sentential alignment system that was developed. In the third part, two different applications are discussed. Although the global architecture of our sub-sentential alignment module is language-independent, the main focus is on the English-Dutch language pair. At the beginning of the research project, a Gold Standard was created. The manual reference corpus contains three different types of links: regular links for straightforward correspondences, fuzzy links for translation-specific shifts of various kinds, and null links for words for which no correspondence could be indicated. The different writing and translation styles in the different text types was reflected in the number of regular, fuzzy and null links. The sub-sentential alignment system is conceived as a cascaded model consisting of two phases. In the first phase, anchor chunks are linked on the basis of lexical correspondences and syntactic similarity. In the second phase, we use a bootstrapping approach to extract language-pair specific translation patterns. The alignment system is chunk-driven and requires only shallow linguistic processing tools for the source and the target languages, i.e. part-of-speech taggers and chunkers. To generate the lexical correspondences, we experimented with two different types of bilingual dictionaries: a handcrafted bilingual dictionary and probabilistic bilingual dictionaries. In the bootstrapping experiments, we started from the precise GIZA++ intersected word alignments. The proposed system improves the recall of the intersected GIZA++ word alignments without sacrificing precision, which makes the resulting alignments more useful for incorporation in CAT-tools or bilingual terminology extraction tools. Moreover, the system's ability to align discontiguous chunks makes the system useful for languages containing split verbal constructions and phrasal verbs. In the last part of this thesis, we demonstrate the usefulness of the sub-sentential alignment module in two different applications. First, we used the sub-sentential alignment module to guide bilingual terminology extraction on three different language pairs, viz. French-English, French-Italian and French-Dutch. Second, we compare the performance of our alignment system with a commercial sub-sentential translation memory system

    Placeable and localizable elements in translation memory systems

    Get PDF
    Translation memory systems (TM systems) are software packages used in computer-assisted translation (CAT) to support human translators. As an example of successful natural language processing (NLP), these applications have been discussed in monographic works, conferences, articles in specialized journals, newsletters, forums, mailing lists, etc. This thesis focuses on how TM systems deal with placeable and localizable elements, as defined in 2.1.1.1. Although these elements are mentioned in the cited sources, there is no systematic work discussing them. This thesis is aimed at filling this gap and at suggesting improvements that could be implemented in order to tackle current shortcomings. The thesis is divided into the following chapters. Chapter 1 is a general introduction to the field of TM technology. Chapter 2 presents the conducted research in detail. The chapters 3 to 12 each discuss a specific category of placeable and localizable elements. Finally, chapter 13 provides a conclusion summarizing the major findings of this research project

    Enhancing the Bilingual Concordancer TransSearch with Word-Level Alignment

    No full text

    Multilinguisme et variétés linguistiques en Europe à l’aune de l’intelligence artificielle Multilinguismo e variazioni linguistiche in Europa nell’era dell’intelligenza artificiale Multilingualism and Language Varieties in Europe in the Age of Artificial Intelligence

    Get PDF
    Il presente volume è il frutto di una riflessione interdisciplinare e multilingue maturata attorno a diversi eventi organizzati nell’ambito del panel concernente i diritti e le variazioni linguistiche in Europa nell’era dell’intelligenza artificiale all’interno del progetto Artificial Intelligence for European Integration, promosso dal Centro studi sull’Europa TO-EU dell’Università di Torino e cofinanziato dalla Commissione europea. L’interrogativo iniziale che abbiamo voluto sollevare è se l’IA potesse avere un impatto negativo sulle varietà linguistiche e sul multilinguismo, valore “aggiunto” dell’UE, o se potesse, e in che modo, divenire utile per la promozione di essi. Il volume, interamente inedito, può dirsi tra i primi ad affrontare, almeno in Europa, questo tipo di tematiche.This book is the outcome of an interdisciplinary multilingual reflection carried out on research into linguistic rights, multilingualism and language varieties in Europe in the age of artificial intelligence. It is part of the Artificial Intelligence for European Integration project, promoted by the Centre of European Studies To-EU of the University of Turin and co-financed by the European Commission. Our aim was to investigate more generally the negative and/or positive outcomes of AI on language varieties and multilingualism, the latter a key value for the EU. The result is a volume of original unpublished research being made generally available for the first time, at least in Europe.Ce livre a été élaboré à partir d’une réflexion interdisciplinaire et multilingue qui a été menée dans le cadre d’une recherche sur les droits, le multilinguisme et les variétés linguistiques en Europe à l’aune de l’intelligence artificielle à l’intérieur du projet Artificial Intelligence for European Integration promu par le Centre d’études européennes To-EU de l’Université de Turin et cofinancé par la Commission de l’Union européenne. Notre propos était de réfléchir plus généralement sur les conséquences négatives et/ou positives de l’IA sur les variétés linguistiques et le multilinguisme, ce dernier étant une valeur de l’UE. Ce que nous proposons par ce numéro est un livre inédit qui peut se vanter d’être parmi les premiers à s’occuper de ce type de thématique, du moins en Europe
    corecore