137 research outputs found
Recommended from our members
Lost and Found in Translation: Cross-Lingual Question Answering with Result Translation
Using cross-lingual question answering (CLQA), users can find information in languages that they do not know. In this thesis, we consider the broader problem of CLQA with result translation, where answers retrieved by a CLQA system must be translated back to the user's language by a machine translation (MT) system. This task is challenging because answers must be both relevant to the question and adequately translated in order to be correct. In this work, we show that integrating the MT closely with cross-lingual retrieval can improve result relevance and we further demonstrate that automatically correcting errors in the MT output can improve the adequacy of translated results. To understand the task better, we undertake detailed error analyses examining the impact of MT errors on CLQA with result translation. We identify which MT errors are most detrimental to the task and how different cross-lingual information retrieval (CLIR) systems respond to different kinds of MT errors. We describe two main types of CLQA errors caused by MT errors: lost in retrieval errors, where relevant results are not returned, and lost in translation errors, where relevant results are perceived irrelevant due to inadequate MT. To address the lost in retrieval errors, we introduce two novel models for cross-lingual information retrieval that combine complementary source-language and target-language information from MT. We show empirically that these hybrid, bilingual models outperform both monolingual models and a prior hybrid model. Even once relevant results are retrieved, if they are not translated adequately, users will not understand that they are relevant. Rather than improving a specific MT system, we take a more general approach that can be applied to the output of any MT system. Our adequacy-oriented automatic post-editors (APEs) use resources from the CLQA context and information from the MT system to automatically detect and correct phrase-level errors in MT at query time, focusing on the errors that are most likely to impact CLQA: deleted or missing content words and mistranslated named entities. Human evaluations show that these adequacy-oriented APEs can successfully adapt task-agnostic MT systems to the needs of the CLQA task. Since there is no existing test data for translingual QA or IR tasks, we create a translingual information retrieval (TLIR) evaluation corpus. Furthermore, we develop an analysis framework for isolating the impact of MT errors on CLIR and on result understanding, as well as evaluating the whole TLIR task. We use the TLIR corpus to carry out a task-embedded MT evaluation, which shows that our CLIR models address lost in retrieval errors, resulting in higher TLIR recall; and that the APEs successfully correct many lost in translation errors, leading to more adequately translated results
Recommended from our members
Where's the Verb? Correcting Machine Translation During Question Answering
When a multi-lingual question-answering (QA) system provides an answer that has been incorrectly translated, it is very likely to be
regarded as irrelevant. In this paper, we propose a novel method for correcting a deletion error that affects overall understanding of the sentence. Our post-editing technique uses information available at query time: examples drawn from related documents determined to be relevant to the query. Our results show that 4%-7% of MT sentences are
missing the main verb and on average, 79% of the modified sentences are judged to be more comprehensible. The QA performance also
benefits from the improved MT: 7% of irrelevant response sentences become relevant
Utilisation of metadata fields and query expansion in cross-lingual search of user-generated Internet video
Recent years have seen signicant eorts in the area of Cross Language Information Retrieval (CLIR) for text retrieval. This work initially focused on formally published content, but more recently research has begun to concentrate on CLIR for informal social media content. However, despite the current expansion in online multimedia archives, there has been little work on CLIR for this content. While there has been some limited work on Cross-Language Video Retrieval (CLVR) for professional videos, such as documentaries or TV news broadcasts, there has to date, been no signicant investigation of CLVR for the rapidly growing archives of informal user generated (UGC) content. Key differences between such UGC and professionally produced content are the nature and structure of the textual UGC metadata associated with it, as well as the form and quality of the content itself. In this setting, retrieval eectiveness may not only suer from translation errors common to all CLIR tasks, but also recognition errors associated with the automatic speech recognition (ASR) systems used to transcribe the spoken content of the video and with the informality and inconsistency of the associated user-created metadata for each video. This work proposes and evaluates techniques to improve CLIR effectiveness of such noisy UGC content. Our experimental investigation shows that dierent sources of evidence, e.g. the content from dierent elds of the structured metadata, significantly affect CLIR effectiveness. Results from our experiments also show that each metadata eld
has a varying robustness to query expansion (QE) and hence can have a negative impact on the CLIR eectiveness. Our work proposes a novel adaptive QE technique that predicts the most reliable source for expansion and shows how this technique can be effective for improving CLIR effectiveness for UGC content
Recommended from our members
Who, What, When, Where, Why? Comparing Multiple Approaches to the Cross-Lingual 5W Task
Cross-lingual tasks are especially difficult due to the compounding effect of errors in language processing and errors in machine translation (MT). In this paper, we present an error analysis of a new cross-lingual task: the 5W task, a sentence-level understanding task which seeks to return the English 5W's (Who, What, When, Where and Why) corresponding to a Chinese sentence. We analyze systems that we developed, identifying specific problems in language processing and MT that cause errors. The best cross-lingual 5W system was still 19% worse than the best monolingual 5W system, which shows that MT significantly degrades sentence-level understanding. Neither source-language nor target-language analysis was able to circumvent problems in MT, although each approach had advantages relative to the other. A detailed error analysis across multiple systems suggests directions for future research on the problem
Use of students’ linguistic resources in teaching English as an additional language in Norway : An empirical study
Doctoral thesis (PhD) - Nord University, 2020publishedVersio
“I don’t mix much” : language mixing in transnational Polish-British culture 2012-18
This research was supported by the University of St Andrews, Byre World, and the Santander Research and Travel Fund.Language mixing by migrants in the process of acquiring a new language is often treated as a symptom of their linguistic deficit, a stage to be overcome on the way to full bilingualism. Yet language mixing is also a creative process, a way to build community, maintain the transnational family, and restore cultural capital lost in migration. The cultural representations of the lives of post-EU accession Polish migrants in the UK discussed in this article – in an advertisement for an online shopping website, a novel for teenagers in English and Polish translation, and a series of illustrations with captions – use different strategies to tell stories of language acquisition and loss. I argue that ten years after Joanna Rostek and Dirk Uffelmann asked “Can the Polish Migrant Speak?” it is time to ask how the Polish Migrant speaks, and to offer an answer with more nuance than “in Polish” or “in English” by taking code-switching and translanguaging into account.Publisher PDFPeer reviewe
The Space of Alterity: Language and National Identity in Theodor Adorno and W.G. Sebald
The German Romantic monolingual paradigm of national identity emerged in the late eighteenth century to establish a mother tongue as a national backbone. This paradigm portrayed multilingualism as destabilizing, impoverishing, and unsuitable for aesthetics. Radicalized by the Nazis and overlooked in postwar debates over German national identity, this paradigm persists in contemporary societies and continues to conceal, belittle, and discredit multilingualism. To oppose that paradigm, this dissertation unveils the enriching and nourishing qualities of foreign languages, presents translingualism as a viable alternative to monolingualism, and reveals how translingual literature creates transnational connectedness. The limitations of the paradigm are traced from the late eighteenth century to contemporary German literature to show how the German Romantics sacralized the concept of the mother tongue through religious and ethical qualities, and to expose how the exaltation of linguistic purity spreads hostility to foreign languages and fuels violence. Theodor Adorno and W.G. Sebald secularize the notion of the mother tongue and rehabilitate multilingualism. Adorno advocates a philosophical and an aesthetic framework with one language open to foreign expressions, whereas Sebald promotes translingual literature that mixes languages to create transnational bridges. This exploration of foreign tongues in Adorno and Sebald adds an ideological and an aesthetic dimension to the scholarship on their multilingualism and refutes the invocations of linguistic purity
Annotated Bibliography of Research in the Teaching of English
Since 2003, RTE has published the annual “Annotated Bibliography of Research in the Teaching of English,” and we are proud to share these curated and annotated citations once again. The goal of the annual bibliography is to offer a synthesis of the research published in the area of English language arts within the past year that may be of interest to RTE readers. Abstracted citations and those featured in the “Other Related Research” sections were published, either in print or online, between June 2019 and June 2020. The bibliography is divided into nine subject area sections. A three-person team of scholars with diverse research interests and background experiences in preK–16 educational settings reviewed and selected the manuscripts for each section using library databases and leading empirical journals. Each team abstracted significant contributions to the body of peer-reviewed studies that addressed the current research questions and concerns in their topic area
Detecting plagiarism in the forensic linguistics turn
This study investigates plagiarism detection, with an application in forensic contexts. Two types of data were collected for the purposes of this study. Data in the form of written texts were obtained from two Portuguese Universities and from a Portuguese newspaper. These data are analysed linguistically to identify instances of verbatim, morpho-syntactical, lexical and discursive overlap. Data in the form of survey were obtained from two higher education institutions in Portugal, and another two in the United Kingdom. These data are analysed using a 2 by 2 between-groups Univariate Analysis of Variance (ANOVA), to reveal cross-cultural divergences in the perceptions of plagiarism. The study discusses the legal and social circumstances that may contribute to adopting a punitive approach to plagiarism, or, conversely, reject the punishment. The research adopts a critical approach to plagiarism detection. On the one hand, it describes the linguistic strategies adopted by plagiarists when borrowing from other sources, and, on the other hand, it discusses the relationship between these instances of plagiarism and the context in which they appear. A focus of this study is whether plagiarism involves an intention to deceive, and, in this case, whether forensic linguistic evidence can provide clues to this intentionality. It also evaluates current computational approaches to plagiarism detection, and identifies strategies that these systems fail to detect. Specifically, a method is proposed to translingual plagiarism. The findings indicate that, although cross-cultural aspects influence the different perceptions of plagiarism, a distinction needs to be made between intentional and unintentional plagiarism. The linguistic analysis demonstrates that linguistic elements can contribute to finding clues for the plagiarist’s intentionality. Furthermore, the findings show that translingual plagiarism can be detected by using the method proposed, and that plagiarism detection software can be improved using existing computer tools
- …