23 research outputs found

    Neural fuzzy repair : integrating fuzzy matches into neural machine translation

    Get PDF
    We present a simple yet powerful data augmentation method for boosting Neural Machine Translation (NMT) performance by leveraging information retrieved from a Translation Memory (TM). We propose and test two methods for augmenting NMT training data with fuzzy TM matches. Tests on the DGT-TM data set for two language pairs show consistent and substantial improvements over a range of baseline systems. The results suggest that this method is promising for any translation environment in which a sizeable TM is available and a certain amount of repetition across translations is to be expected, especially considering its ease of implementation

    Towards a better integration of fuzzy matches in neural machine translation through data augmentation

    Get PDF
    We identify a number of aspects that can boost the performance of Neural Fuzzy Repair (NFR), an easy-to-implement method to integrate translation memory matches and neural machine translation (NMT). We explore various ways of maximising the added value of retrieved matches within the NFR paradigm for eight language combinations, using Transformer NMT systems. In particular, we test the impact of different fuzzy matching techniques, sub-word-level segmentation methods and alignment-based features on overall translation quality. Furthermore, we propose a fuzzy match combination technique that aims to maximise the coverage of source words. This is supplemented with an analysis of how translation quality is affected by input sentence length and fuzzy match score. The results show that applying a combination of the tested modifications leads to a significant increase in estimated translation quality over all baselines for all language combinations

    Evaluating short-term changes in L2 complexity development

    Get PDF
    This paper reports on a study on the nature and extent of the development of English L2 writing proficiency of 45 adult ESL learners over the time of an intensive short-term EAP program as evaluated by means of objective measures targeting different components of lexical and syntactic complexity. In addition, we compare the scores on these measures with more holistic and subjective ratings of learners' overall writing quality. Results reveal that some measures, but not necessarily the most popular linguistic complexity measures (e.g., subordination ratios and lexical richness measures), can indeed adequately and validly capture development in L2 writing in short-term ESL courses. Results further suggest that different subcomponents of syntactic and lexical complexity in L2 writing develop at different rates, which stressed the importance of calculating a sufficiently wide range of complexity measures in order to obtain a comprehensive picture of L2 development

    The effects of intensive speech treatment on conversational intelligibility in Spanish speakers with Parkinson’s disease

    Get PDF
    Purpose: To examine the effects of intensive speech treatment on the conversational intelligibility of Castilian Spanish speakers with Parkinson’s disease (PD), as well as on the speakers’ self-perceptions of disability. Method: Fifteen speakers with a medical diagnosis of PD participated in this study. Speech recordings were completed twice before treatment, immediately post-treatment and at a one-month follow-up session. Conversational intelligibility was assessed in two ways—transcription accuracy scores and intelligibility ratings on a 9-point Likert scale. The Voice Handicap Index (VHI) was administered as a measure of self-perceived disability. Results: Group data revealed that transcription accuracy and median ease-of-understanding ratings increased significantly immediately post-treatment, with gains maintained at the one-month follow-up. The functional subscale of the VHI decreased significantly post-treatment, suggesting a decrease in perceived communication disability after speech treatment. Conclusion: These findings support the implementation of intensive voice treatment to improve conversational intelligibility in Spanish speakers with PD with dysarthria as well as to improve the speakers' perception of their daily communicative capabilities. Clinical and theoretical considerations are discussed

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    Evaluating the impact of integrating similar translations into neural machine translation

    No full text
    Previous research has shown that simple methods of augmenting machine translation training data and input sentences with translations of similar sentences (or fuzzy matches), retrieved from a translation memory or bilingual corpus, lead to considerable improvements in translation quality, as assessed by a limited set of automatic evaluation metrics. In this study, we extend this evaluation by calculating a wider range of automated quality metrics that tap into different aspects of translation quality and by performing manual MT error analysis. Moreover, we investigate in more detail how fuzzy matches influence translations and where potential quality improvements could still be made by carrying out a series of quantitative analyses that focus on different characteristics of the retrieved fuzzy matches. The automated evaluation shows that the quality of NFR translations is higher than the NMT baseline in terms of all metrics. However, the manual error analysis did not reveal a difference between the two systems in terms of total number of translation errors; yet, different profiles emerged when considering the types of errors made. Finally, in our analysis of how fuzzy matches influence NFR translations, we identified a number of features that could be used to improve the selection of fuzzy matches for NFR data augmentation

    Beginning L2 complexity development in CLIL and non-CLIL secondary education

    No full text
    The present study analyses the impact of a bilingual Content and Language Integrated Learning (CLIL) programme vis-à-vis a regular monolingual programme on the development of different aspects of L2 learners’ linguistic (syntactic, morphological and lexical) complexity. Five pupils enrolled in a Dutch–English CLIL programme in a secondary school in the Netherlands are compared with five peers following the mainstream programme with English as a Foreign Language (EFL) teaching. The longitudinal development of these ten pupils’ linguistic complexity in L2-English is investigated by means of six complexity measures calculated for each of eleven writing tasks collected over a period spanning their first nineteen months of secondary education. Linear mixed models are used to estimate the effects of time and programme type on the pupils’ L2 complexity. The results indicate that both groups of learners significantly increase the complexity of their L2 writing over the course of the study. Surprisingly, only limited effects of programme type (CLIL vs non-CLIL) are found, despite considerable differences in the quantity and quality of instructional exposure to the target language, suggesting that for these pupils increased and more varied instructional exposure to the L2 in the CLIL programme did not lead to significantly different L2 productions in terms of linguistic complexity. Several possible explanations for these findings are considered and the implications for CLIL research are discussed.status: publishe
    corecore