1,143 research outputs found

    Building a sign language corpus for use in machine translation

    Get PDF
    In recent years data-driven methods of machine translation (MT) have overtaken rule-based approaches as the predominant means of automatically translating between languages. A pre-requisite for such an approach is a parallel corpus of the source and target languages. Technological developments in sign language (SL) capturing, analysis and processing tools now mean that SL corpora are becoming increasingly available. With transcription and language analysis tools being mainly designed and used for linguistic purposes, we describe the process of creating a multimedia parallel corpus specifically for the purposes of English to Irish Sign Language (ISL) MT. As part of our larger project on localisation, our research is focussed on developing assistive technology for patients with limited English in the domain of healthcare. Focussing on the first point of contact a patient has with a GP’s office, the medical secretary, we sought to develop a corpus from the dialogue between the two parties when scheduling an appointment. Throughout the development process we have created one parallel corpus in six different modalities from this initial dialogue. In this paper we discuss the multi-stage process of the development of this parallel corpus as individual and interdependent entities, both for our own MT purposes and their usefulness in the wider MT and SL research domains

    Recycling texts: human evaluation of example-based machine translation subtitles for DVD

    Get PDF
    This project focuses on translation reusability in audiovisual contexts. Specifically, the project seeks to establish (1) whether target language subtitles produced by an EBMT system are considered intelligible and acceptable by viewers of movies on DVD, and (2)whether a relationship exists between the ‘profiles’ of corpora used to train an EBMT system, on the one hand, and viewers’ judgements of the intelligibility and acceptability of the subtitles produced by the system, on the other. The impact of other factors, namely: whether movie-viewing subjects have knowledge of the soundtrack language; subjects’ linguistic background; and subjects’ prior knowledge of the (Harry Potter) movie clips viewed; is also investigated. Corpus profiling is based on measurements (partly using corpus-analysis tools) of three characteristics of the corpora used to train the EBMT system: the number of source language repetitions they contain; the size of the corpus; and the homogeneity of the corpus (independent variables). As a quality control measure in this prospective profiling phase, we also elicit human judgements (through a combined questionnaire and interview) on the quality of the corpus data and on the reusability in new contexts of the TL subtitles. The intelligibility and acceptability of EBMT-produced subtitles (dependent variables) are, in turn, established through end-user evaluation sessions. In these sessions 44 native German-speaking subjects view short movie clips containing EBMT-generated German subtitles, and following each clip answer questions (again, through a combined questionnaire and interview) relating to the quality characteristics mentioned above. The findings of the study suggest that an increase in corpus size along with a concomitant increase in the number of source language repetitions and a decrease in corpus homogeneity, improves the readability of the EBMT-generated subtitles. It does not, however, have a significant effect on the comprehensibility, style or wellformedness of the EBMT-generated subtitles. Increasing corpus size and SL repetitions also results in a higher number of alternative TL translations in the corpus that are deemed acceptable by evaluators in the corpus profiling phase. The research also finds that subjects are more critical of subtitles when they do not understand the soundtrack language, while subjects’ linguistic background does not have a significant effect on their judgements of the quality of EBMT-generated subtitles. Prior knowledge of the Harry Potter genre, on the other hand, appears to have an effect on how viewing subjects rate the severity of observed errors in the subtitles, and on how they rate the style of subtitles, although this effect is training corpus-dependent. The introduction of repeated subtitles did not reduce the intelligibility or acceptability of the subtitles. Overall, the findings indicate that the subtitles deemed the most acceptable when evaluated in a non-AVT environment (albeit one in which rich contextual information was available) were the same as the subtitles deemed the most acceptable in an AVT environment, although richer data were gathered from the AVT environment

    Translation-Memory (TM) Research: What Do We Know and How Do We Know It?

    Get PDF
    It is no exaggeration to say that the advent of translation-memory (TM) systems in the translation profession has led to drastic changes in translators’ processes and workïŹ‚ow, and yet, though many professional translators nowadays depend on some form of TM system, this has not been the object of much research. Our paper attempts to ïŹnd out what we know about the nature, applications and inïŹ‚uences of TM technology, including translators’ interaction with TMs, and also how we know it. An essential part of the analysis is based on a selection of empirical TM studies, which we assume to be representative of the research ïŹeld as a whole. Our analysis suggests that, while considerable knowledge is available about the technical side of TMs, more research is needed to understand how translators interact with TM technology and how TMs inïŹ‚ uence translators’ cognitive translation processes

    Machine translation and fair access to information

    Get PDF
    This article contributes to the discussion on fairness and ethics in MT by highlighting efforts that have been made to use MT for the humanitarian purpose of increasing access to information for groups that are underserved. The article provides an overview of example projects in which MT has been implemented for this purpose in three contexts: civic participation, public health and safety, and media and culture. In addition, the article examines some of the ethical issues surrounding efforts to use MT for accessibility, including issues of quality, acceptability, and the need to involve stakeholders in development.Peer reviewe

    Analysing the use and perception of Wikipedia in the professional context of translation

    Get PDF
    ABSTRACT This paper draws on the results of an online survey conducted among professionals of the translation industry (mostly translators) to explore, from a technological and sociological perspective, how they conduct their work, the needs they experience, and the tools and resources (human or human-driven) they resort to when translating. More specifically, this interpretative and descriptive work looks at how participants use Wikipedia and analyses their perceptions of this tool. The survey results suggest that respondents made extensive use of all sorts of technologies when translating, amongst which TM and MT/post-editing were not the most popular. They also resorted to human (or human-driven) resources (translator colleagues, experts, social networks, blogs, etc.) to meet their needs (general documentation, terminological/lexicographical, visual). Respondents had a good overall opinion of Wikipedia (usefulness, reliability and ease of use) and most of them reported using it when translating. However, some results suggest the existence of some kind of controversy or censorship with regard to the use of Wikipedia in professional contexts. A discussion relating the results of this survey to other studies with similar focuses (translation tools, the translation profession, Wikipedia) could help identify trends in the way translators interact with technology in the information society

    Language technologies for a multilingual Europe

    Get PDF
    This volume of the series “Translation and Multilingual Natural Language Processing” includes most of the papers presented at the Workshop “Language Technology for a Multilingual Europe”, held at the University of Hamburg on September 27, 2011 in the framework of the conference GSCL 2011 with the topic “Multilingual Resources and Multilingual Applications”, along with several additional contributions. In addition to an overview article on Machine Translation and two contributions on the European initiatives META-NET and Multilingual Web, the volume includes six full research articles. Our intention with this workshop was to bring together various groups concerned with the umbrella topics of multilingualism and language technology, especially multilingual technologies. This encompassed, on the one hand, representatives from research and development in the field of language technologies, and, on the other hand, users from diverse areas such as, among others, industry, administration and funding agencies. The Workshop “Language Technology for a Multilingual Europe” was co-organised by the two GSCL working groups “Text Technology” and “Machine Translation” (http://gscl.info) as well as by META-NET (http://www.meta-net.eu)

    An investigation of Grammar Gender-bias Correction for Google Translate When Translating from English to French

    Get PDF
    This work investigated how to address the Google Translate\u27s gender-bias when translating from English to French. The developed solution is called GT gender-bias corrector that was built based on combining natural language processing and machine learning methods. The natural language processing was used to analyze the original sentences and their translations grammatically identifying parts of speech. The parts of speech analysis facilitated the identification of three patterns that are associated with the gender bias of Google Translate when translating from English to French. The three patterns were labeled simple, intermediate and complex to reflect the structure complexity. Samples of texts that represent the three patterns were generated. The generated texts were used to build a decision-tree-based classifier to automatically detect the pattern to which a text belongs. The GT gender-bias corrector was tested using a survey completed by participants with diverse levels of English and French fluency. The survey analysis showed the success of the corrector in addressing the Google Translate gender-bias for the three patterns identified in this work

    Mapping new translation practices into translation training: promoting collaboration through community-based localization platforms

    Get PDF
    Crowdsourcing and collaborative translation, activities emerging on the translation scene recently, are playing an increasingly important role in the world of professional translation and in the localization industry. This article focuses on a study carried out to analyze the perception of a group of translator trainees regarding these new translation practices. A total of 20 undergraduate students participated in the research and were asked to perform a collaborative localization task using an online collaborative platform. Data subjected to a quantitative and qualitative analysis suggest that online collaborative translation tasks enhance students? motivation towards collaborative translation and help consolidate their technical knowledge about specific localization tools and files.Le crowdsourcing et la traduction collaborative, des activitĂ©s rĂ©cemment apparues sur la scĂšne de la traduction, jouent un rĂŽle de plus en plus important dans le monde de la traduction professionnelle et dans l’industrie de la localisation. Cet article se concentre sur une Ă©tude rĂ©alisĂ©e pour analyser la perception d’un groupe d’étudiants en traduction Ă  l’égard de ces nouvelles pratiques de traduction. Au total 20 Ă©tudiants de premier cycle ont participĂ© Ă  la recherche et ont Ă©tĂ© invitĂ©s Ă  effectuer une tĂąche de localisation collaborative Ă  l’aide d’une plateforme collaborative en ligne. Les donnĂ©es obtenues Ă  partir d’une analyse quantitative et qualitative suggĂšrent que les tĂąches de traduction collaborative en ligne ont renforcĂ© la motivation des Ă©tudiants Ă  l’égard de la traduction collaborative et les ont aidĂ©s Ă  consolider leurs connaissances techniques sur des outils et des fichiers de localisation spĂ©cifiques

    What do professional translators think about post-editing

    Get PDF
    As part of a larger research project on productivity and quality in the post-editing of machine-translated and translation-memory outputs, 24 translators and three reviewers were asked to complete an on-line questionnaire to gather information about their professional experience but also to obtain data on their opinions about post-editing and machine translation. The participants were also debriefed after finalising the assignment to triangulate the data with the quantitative results and the questionnaire. The results show that translators have mixed experiences and feelings towards machine-translated output and post-editing, not necessarily because they are misinformed or reluctant to accept its inclusion in the localisation process but due to their previous experience with various degrees of output quality and to the characteristics of this type of projects. The translators were quite satisfied in general with the work they do as translators, but not necessarily with the payment they receive for the work done, although this was highly dependent on different customers and type of task

    Criteria for the Integration of Term Banks in the Professional Translation Environment

    Full text link
    [EN] Translation-oriented terminology management is not only limited to the study of terminology problems with regards to specialization, currency, and reliability. The integration of terminology data bases within CAT tools facilitating their use, maintenance and retrieval towards the automation of the translation process and consistency of terminology has also attracted attention from the academia and the language industry alike. However, this approach to terminology management seems to be carried out from a mostly theoretical perspective. Thus, the aim of this paper is to present the results of a survey conducted among professional translators in Spain regarding their actual experience with terminology in order to identify potential gaps between the technological offer and the specific needs of translators.Candel-Mora, MÁ. (2017). Criteria for the Integration of Term Banks in the Professional Translation Environment. Sendebar. 28:243-260. http://hdl.handle.net/10251/111703S2432602
    • 

    corecore