39,652 research outputs found

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    Fluency in dialogue: Turn‐taking behavior shapes perceived fluency in native and nonnative speech

    No full text
    Fluency is an important part of research on second language learning, but most research on language proficiency typically has not included oral fluency as part of interaction, even though natural communication usually occurs in conversations. The present study considered aspects of turn-taking behavior as part of the construct of fluency and investigated whether these aspects differentially influence perceived fluency ratings of native and non-native speech. Results from two experiments using acoustically manipulated speech showed that, in native speech, too ‘eager’ (interrupting a question with a fast answer) and too ‘reluctant’ answers (answering slowly after a long turn gap) negatively affected fluency ratings. However, in non-native speech, only too ‘reluctant’ answers led to lower fluency ratings. Thus, we demonstrate that acoustic properties of dialogue are perceived as part of fluency. By adding to our current understanding of dialogue fluency, these lab-based findings carry implications for language teaching and assessmen

    Statistically motivated example-based machine translation using translation memory

    Get PDF
    In this paper we present a novel way of integrating Translation Memory into an Example-based Machine translation System (EBMT) to deal with the issue of low resources. We have used a dialogue of 380 sentences as the example-base for our system. The translation units in the Translation Memories are automatically extracted based on the aligned phrases (words) of a statistical machine translation (SMT) system. We attempt to use the approach to improve translation from English to Bangla as many statistical machine translation systems have difficulty with such small amounts of training data. We have found the approach shows improvement over a baseline SMT system

    Listen up: Reflections on the CDI and HSE Speech and Language Services in Tallaght West

    Get PDF
    The primary aim of this study was to build on the previous evaluation of the Childhood Development's Initiatives (CDI) speech and language approach and carry out a comparative evaluation of speech and language therapy services for young children across the CDI and Health Services Executive (HSE) programmes. The main research questions are organised according to (i) implementation of the programme; (ii) uptake and accessibility; (iii) and outcomes

    Lexis or parsing? A corpus-based study of syntactic complexity and its effect on disfluencies in interpreting

    Get PDF
    Cognitive load is probably one of the most cited topics in research on simultaneous interpreting, but it is still poorly understood due to the lack of proper empirical tests. It is a central concept in Gile’s (2009) Efforts Model as well as Seeber’s (2011) Cognitive Load Model. Both models invariably conceptualize interpreting as a dynamic equilibrium between the cognitive resources/capacities and cognitive demands that are involved in listening and comprehension, production and memory storage. In cases when the momentary demands exceed the interpreter’s available capacities, there is an information overload which typically results in a disfluent or erroneous interpretation. While Gile (2008) denies his Efforts Model is a theory that can be tested, Seeber & Kerzel (2012) put Seeber’s Cognitive Load Model to the test using pupillometry in an experimental interpretation task. In a series of recent corpus-based studies Plevoets & Defrancq (2016, 2018) and Defrancq & Plevoets (2018) used filled pauses to investigate cognitive load in simultaneous interpreters, based on the widely shared assumption in the psycholinguistic literature that silent and filled pauses are ‘windows’ on cognitive load in monolingual speech (Arnold et al. 2000; Bortfeld et al. 2001; Clark & Fox Tree 2002; Levelt 1983; Watanabe et al. 2008). The studies found empirical support for increased cognitive load in simultaneous interpreting in the form of higher frequencies of filled pauses. However, the studies also showed that filled pauses in interpreting are caused mainly by problems with lexical retrieval. Plevoets & Defrancq (2016) observed that interpreters produce more instances of the filled pause uh(m) when the lexical density of their own output is higher. Plevoets & Defrancq (2018) demonstrated that the frequency of uh(m) in interpreting increases when the lexical density of the source text is also higher but it decreases when there are more formulaic sequences. This effect of formulaicity was found in both the source texts and the target texts. Other known obstacles in interpreting, such as the presence of numbers and rate of delivery do not significantly affect the frequency of filled pauses (although source speech delivery rate reached significance in one of the analyses). These results point to the problematic retrieval or access of lexical items as the primary source of cognitive load for interpreters. Finally, in a study of filled pauses occurring between the members of morphological compounds, Defrancq & Plevoets (2018) showed that interpreters produced more uh(m)’s than non-interpreters when the average frequency of the compounds was high as well as when the average frequency of the component members was high. This also demonstrates that lexical retrieval, which is assumed to be easier for more frequent items, is hampered in interpreting. This study critically examines the results of the previous studies by analyzing the effect of another non-lexical parameter on the production of filled pauses in interpreting, viz. syntactic complexity. Subordinating constructions are a well-known predictor of processing cost (cognitive load) in both L1 research (Gordon, Luper & Peterson 1986; Gordon & Luper 1989) and L2 research (Norris & Ortega 2009; Osborne 2011). In interpreting, however, Dillinger (1994) and Setton (1999: 270) did not find strong effects of the syntactic embedding of the source texts on the interpreters’ performance. As a consequence, this paper will take a closer look on syntactic complexity and it will do so by incorporating the number of hypotactic clauses into the analysis. The study is corpus-based and makes use of both a corpus of interpreted language and a corpus of non-mediated speech. The corpus of interpreted language is the EPICG corpus, which is compiled at Ghent University between 2010 and 2013. It consists of French, Spanish and Dutch interpreted speeches in the European Parliament from 2006 until 2008, which are transcribed according to the VALIBEL guidelines (Bachy et al. 2007). For the purposes of this study a sub-corpus of French source speeches and their Dutch interpretations is used, amounting to a total of 140 000 words. This sub-corpus is annotated for lemmas, parts-of-speech and chunks (Van de Kauter et al. 2013), and it is sentence-aligned with WinAlign (SDL Trados WinAlign 2014). The corpus of non-mediated speech is the sub-corpus of political debates of the Spoken Dutch Corpus (Oostdijk 2000). The corpus was compiled between 1998 and 2003, and it is annotated for lemmas and parts-of-speech. The political sub-corpus contains 220 000 words of Netherlandic Dutch and 140 000 words of Belgian Dutch. The data are analysed with a Generalized Additive Mixed-effects Model (Wood 2017) in which the frequency of the disfluency uh(m) is predicted in relation to delivery rate, lexical density, percentage of numbers, formulaicity and syntactic complexity. Delivery rate is measured as the number of words per minute, lexical density as the number of content words per utterance length, percentage of numbers as the numbers of numbers per utterance length and formulaicity as the number of n-grams per utterance length. The new predictor, syntactic complexity, is measured as the number of subordinate clauses per utterance length. Because all five predictors are numeric variables, their effects are modelled with smoothing splines which automatically detect potential nonlinear patterns in the data. The observations are at utterance-level and are nested within the speeches, so the possible between-speech variation is accounted for with a random factor. The preliminary results confirm the hypothesis: while lexical density and formulaicity show similar (positive, resp. negative) effects to what is reported in previous research, the syntactic complexity of the source text is ‘border-significant’ and the syntactic complexity of the target is non-significant. There are some sporadic differences among certain types of subordinate clauses, but the general conclusion is indeed that syntactic complexity is not such a strong trigger of cognitive load in interpreting in comparison to lexically-related factors. That calls for a model of interpreting in which depth of processing plays only a marginal role

    Neural Responding Machine for Short-Text Conversation

    Full text link
    We propose Neural Responding Machine (NRM), a neural network-based response generator for Short-Text Conversation. NRM takes the general encoder-decoder framework: it formalizes the generation of response as a decoding process based on the latent representation of the input text, while both encoding and decoding are realized with recurrent neural networks (RNN). The NRM is trained with a large amount of one-round conversation data collected from a microblogging service. Empirical study shows that NRM can generate grammatically correct and content-wise appropriate responses to over 75% of the input text, outperforming state-of-the-arts in the same setting, including retrieval-based and SMT-based models.Comment: accepted as a full paper at ACL 201

    The determinants of Spanish language proficiency among immigrants in Spain

    Get PDF
    This article uses micro-data from the Spanish National Immigrant Survey (Encuesta Nacional de Inmigrantes-ENI in Spanish) carried out in 2007 among immigrants in Spain. In recent years, Spain has received unprecedented immigration flows. A substantial number of immigrants cannot communicate adequately in the language of the country to which they immigrate. Among the multiple reasons for the lack of host language proficiency one can distinguish factors such as a low level of educational attainment, not having been provided with adequate opportunities to learn the host language, living in ethnic enclaves or having arrived at an older age. Language skills (including oral and written ability) play a crucial role in the determination of the immigrants’ social and economic integration in the host country. As a consequence, analyzing the source of foreign language acquisition is crucial for understanding the immigrants’ economic, social and political involvement. The results show that an increase in educational attainment is associated with a higher level of Spanish spoken proficiency. Language ability is also associated with the country or region of origin. The results show that immigrant men and women from the Maghreb and Asia, as well as men from Eastern Europe and Sub Saharan Africa show a significantly weaker command over spoken Spanish than Western Europeans.N/

    Machine translation evaluation resources and methods: a survey

    Get PDF
    We introduce the Machine Translation (MT) evaluation survey that contains both manual and automatic evaluation methods. The traditional human evaluation criteria mainly include the intelligibility, fidelity, fluency, adequacy, comprehension, and informativeness. The advanced human assessments include task-oriented measures, post-editing, segment ranking, and extended criteriea, etc. We classify the automatic evaluation methods into two categories, including lexical similarity scenario and linguistic features application. The lexical similarity methods contain edit distance, precision, recall, F-measure, and word order. The linguistic features can be divided into syntactic features and semantic features respectively. The syntactic features include part of speech tag, phrase types and sentence structures, and the semantic features include named entity, synonyms, textual entailment, paraphrase, semantic roles, and language models. The deep learning models for evaluation are very newly proposed. Subsequently, we also introduce the evaluation methods for MT evaluation including different correlation scores, and the recent quality estimation (QE) tasks for MT. This paper differs from the existing works\cite {GALEprogram2009, EuroMatrixProject2007} from several aspects, by introducing some recent development of MT evaluation measures, the different classifications from manual to automatic evaluation measures, the introduction of recent QE tasks of MT, and the concise construction of the content
    • 

    corecore