770 research outputs found

    A Survey of Paraphrasing and Textual Entailment Methods

    Full text link
    Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of Informatics, Athens University of Economics and Business, Greece, 201

    A Task-based Evaluation of French Morphological Resources and Tools

    Get PDF
    Morphology is a key component for many Language Technology applications. However, morphological relations, especially those relying on the derivation and compounding processes, are often addressed in a superficial manner. In this article, we focus on assessing the relevance of deep and motivated morphological knowledge in Natural Language Processing applications. We first describe an annotation experiment whose goal is to evaluate the role of morphology for one task, namely Question Answering (QA). We then highlight the kind of linguistic knowledge that is necessary for this particular task and propose a qualitative analysis of morphological phenomena in order to identify the morphological processes that are most relevant. Based on this study, we perform an intrinsic evaluation of existing tools and resources for French morphology, in order to quantify their coverage. Our conclusions provide helpful insights for using and building appropriate morphological resources and tools that could have a significant impact on the application performance

    Specialised Languages and Multimedia. Linguistic and Cross-cultural Issues

    Get PDF
    none2noThis book collects academic works focusing on scientific and technical discourse and on the ways in which this type of discourse appears in or is shaped by multimedia products. The originality of this book is to be seen in the variety of approaches used and of the specialised languages investigated in relation to multimodal and multimedia genres. Contributions will particularly focus on new multimodal or multimedia forms of specialised discourse (in institutional, academic, technical, scientific, social or popular settings), linguistic features of specialised discourse in multimodal or multimedia genres, the popularisation of specialised knowledge in multimodal or multimedia genres, the impact of multimodality and multimediality on the construction of scientific and technical discourse, the impact of multimodality/multimediality in the practice and teaching of language, the impact of multimodality/multimediality in the practice and teaching of translation, new multimedia modes of knowledge dissemination, the translation/adaptation of scientific discourse in multimedia products. This volume contributes to the theory and practice of multimodal studies and translation, with a specific focus on specialized discourse.Rivista di Classe A - Volume specialeopenManca E., Bianchi F.Manca, E.; Bianchi, F

    The Circle of Meaning: From Translation to Paraphrasing and Back

    Get PDF
    The preservation of meaning between inputs and outputs is perhaps the most ambitious and, often, the most elusive goal of systems that attempt to process natural language. Nowhere is this goal of more obvious importance than for the tasks of machine translation and paraphrase generation. Preserving meaning between the input and the output is paramount for both, the monolingual vs bilingual distinction notwithstanding. In this thesis, I present a novel, symbiotic relationship between these two tasks that I term the "circle of meaning''. Today's statistical machine translation (SMT) systems require high quality human translations for parameter tuning, in addition to large bi-texts for learning the translation units. This parameter tuning usually involves generating translations at different points in the parameter space and obtaining feedback against human-authored reference translations as to how good the translations. This feedback then dictates what point in the parameter space should be explored next. To measure this feedback, it is generally considered wise to have multiple (usually 4) reference translations to avoid unfair penalization of translation hypotheses which could easily happen given the large number of ways in which a sentence can be translated from one language to another. However, this reliance on multiple reference translations creates a problem since they are labor intensive and expensive to obtain. Therefore, most current MT datasets only contain a single reference. This leads to the problem of reference sparsity---the primary open problem that I address in this dissertation---one that has a serious effect on the SMT parameter tuning process. Bannard and Callison-Burch (2005) were the first to provide a practical connection between phrase-based statistical machine translation and paraphrase generation. However, their technique is restricted to generating phrasal paraphrases. I build upon their approach and augment a phrasal paraphrase extractor into a sentential paraphraser with extremely broad coverage. The novelty in this augmentation lies in the further strengthening of the connection between statistical machine translation and paraphrase generation; whereas Bannard and Callison-Burch only relied on SMT machinery to extract phrasal paraphrase rules and stopped there, I take it a few steps further and build a full English-to-English SMT system. This system can, as expected, ``translate'' any English input sentence into a new English sentence with the same degree of meaning preservation that exists in a bilingual SMT system. In fact, being a state-of-the-art SMT system, it is able to generate n-best "translations" for any given input sentence. This sentential paraphraser, built almost entirely from existing SMT machinery, represents the first 180 degrees of the circle of meaning. To complete the circle, I describe a novel connection in the other direction. I claim that the sentential paraphraser, once built in this fashion, can provide a solution to the reference sparsity problem and, hence, be used to improve the performance a bilingual SMT system. I discuss two different instantiations of the sentential paraphraser and show several results that provide empirical validation for this connection

    Investigating lexical simplication of Latin based loan terms in English to French legal translations : a corpus based study

    Get PDF
    This thesis investigates lexical simplification as a translation universal and how it is accounted for in the English-to-French legal translation of Latinisms. Within descriptive and functional approaches to translation, this thesis reveals that Latinisms are reproduced when they are accepted and not lexicalized in the target language or substituted by functional and semantic equivalents of the target language or system. It is posited that the lexical simplification of ST Latinisms as rendered by the English-to-French legal translator is dictated by system-specific, convention-specific, function-specific rather than translationspecific features. Of all corpus texts, source-text English uses the most Latinisms, but the French translators, unlike the non-translated French producers, tend to use Latinisms to a higher extent. Lexical simplification is hypothesized as viable when languages of similar sociolinguistic and lexical power and equal status render differently the lexical entities of the source text in simplified target text (compared to its non-translation similar text)

    Investigating Frequency and Type of Lexical Collocations in Applied Linguistics Journal Articles Written in English by Iranian and Norwegian Scholars

    Get PDF
    Master's thesis in Literacy StudiesIn today’s academic world, the research interest in corpus linguistics has shifted towards word co-occurrence rather than single words. Accordingly, a great body of literature has been devoted to investigations of recurrent word combinations in academic prose using frequency and dispersion parameters. This has resulted in analysis of corpus in different fields of study to collect comprehensive lists of academic collocations. Moreover, many contrastive studies have been conducted to compare the collocations used by native and non-native speakers of English. However, to the author’s knowledge, few studies have been conducted to compare the most frequent collocations in two corpora of research articles written by non-native speakers of English published in international journals in the field of applied linguistics. To fill this gap in the literature, the current study investigated the most frequent collocations used by Iranian and Norwegian scholars in a corpus of 17 articles published in the Journal of Pragmatics through a frequency-based approach. Nine out of 17 articles were written by Iranian scholars including 67,673 words and eight out of 17 articles were written by Norwegian scholars comprising of 64,682 words. The data of this study were collected using Collocation Extract software. The results of the study were presented in three phases. In the first phase, 15 most frequent lexical collocations in both corpora were identified which were classified under three types of lexical collocations. Based on what was obtained, Adj+N collocation type had the most proportion in the corpora while Adv+Adj type had the least proportion. In the second phase, the lexical collocations of the Iranian corpus were presented including a total of 818 collocations classified under five types. According to the results, Adj+N was the most frequent type while N+V was the least frequent one. Similar to the Iranian corpus, lexical collocations of the Norwegian corpus were identified. They were classified under four types including a total of 462, among which Adj+N was the most frequent type while Adv+Adj was the least frequent one. In the third phase, frequencies of lexical collocations were compared in the two corpora. According to the obtained results, the two corpora did not have any had significant difference in the use of all types of collocation except for Adj+N type of lexical collocations

    “Coronavirus, translated”: A proposal for Subtitling Netflix “Explained” Spin-Off into Italian

    Get PDF
    Questa tesi propone una traduzione in italiano dei sottotitoli dei primi due episodi della serie limitata Netflix “Coronavirus, explained”: l’episodio 1x01 “This Pandemic” e 1x02 “The Race of a Vaccine”. I primi due capitoli forniscono il quadro teorico entro cui si posiziona questo lavoro. Il primo capitolo riguarda la traduzione medica. Dopo un’analisi di come le lingue speciali funzionano e di come si possano rendere accessibili a un pubblico di non esperti, vengono descritte le caratteristiche dell’inglese e dell’italiano della medicina, le strategie per la loro divulgazione e per la loro traduzione interlinguale. Nel secondo capitolo si tratta dei testi audiovisivi e della loro traduzione. I traduttori devono superare molte sfide e problemi quando si occupano di questi complessi testi multimodali, perciò verrà offerta una panoramica delle principali strategie e soluzioni usate, in particolare nella traduzione di sottotitoli e nei documentari. Gli ultimi due capitoli presentano la parte pratica di questa tesi. Il terzo descrive e analizza i quattro corpora creati per questo compito. Nel quarto capitolo viene esaminata la proposta di traduzione, con particolare attenzione alle difficoltà incontrate e le strategie adottate, sia per quanto riguarda la lingua generale che le strategie di divulgazione nei sottotitoli, attraverso il commento di esempi tratti dalla proposta di traduzione. Le traduzioni complete e la composizione dei corpora sono disponibili nell’appendice. Infine, vengono tratte alcune conclusioni e commenti finali dalla letteratura consultata e dal lavoro pratico.The primary purpose of this thesis is to propose a translation into Italian of the subtitles of the first two episodes of the “Coronavirus, explained” Netflix limited series: ep. 1x01 “This Pandemic” and ep. 1x02 “The Race for a Vaccine”. The first two chapters provide the theoretical framework of this work. The first one is concerned with medical popularisation. After an analysis of how LSPs work and how to make them accessible to a lay audience, the features of medical English and Italian are described, as well as their popularisation strategies and interlingual translation. In the second chapter, the focus is on audiovisual texts and translation. There are many challenges that translators must overcome when facing such complex multimodal texts; thus, an overview of the main strategies and solutions used especially in subtitling and documentaries is provided. The last two chapters present the practical part of this thesis. The third one describes and analyses the four corpora built for this task. In the fourth chapter, the translation proposal is examined, with particular focus on the challenges faced and the solutions implemented, both in terms of general language and popularisation strategies in subtitling, by commenting on examples taken by the translation proposal. The complete translations and the corpus composition can be found in the appendix. Lastly, conclusions and final remarks are drawn from the reference literature and the practical work

    The Translation of Lexicalized Metaphors in Interlinguistic and Intercultural Communication of Financial Security Discourse: A Corpus-Based Analysis of English and Spanish Texts about Money Laundering

    Get PDF
    [EN]Financial crime is a significant factor in most transnational crime in general and is wide- reaching.Many critical stakeholders use specific metaphors in their communications to communicate security threats.Metaphors are often idiomatic speech that does not transfer easily from one language to another because they originate from cultural concepts. Within the public safety, regulatory and compliance community, key stakeholders from different linguistic backgrounds use English as a contact language to interact with their counterparts, the media, the public, and stakeholders to ensure regulatory compliance. Translating metaphors requires a special set of skills acquired through deep cultural knowledge and experience in both source and target cultures. The beginning of our research emanated from observing how language played a crucial role in relationships between everyone involved in the criminal justice process, not limited to the United States but also in a multitude of Spanish-speaking countries and geographical regions. Highly effective communication is critical for those who regulate against it, those involved in compliance initiatives, law enforcement, and the general public to better recognize and prevent money laundering. This project’s genesis came from interpreting criminal cases, translating documents in United States federal court cases, and observing how investigators followed the money trail to uncover illegal activity. The first-hand view of communications in that realm revealed how language played a crucial role in relationships between everyone involved in the criminal justice process, not only in the United States but also in many Spanish-speaking countries and geographical regions. Before this study, there has been little to no research on translating metaphors in the specialized regulatory financial compliance and enforcement language. The present study begins to fill that gap in research by providing a synchronic X-ray view of the current language spoken in that field through a corpus-based translation analysis of anti-money laundering texts. We developed a bilingual English- to-Spanish unidirectional corpus which we uploaded to Sketch Engine for analysis. Finally, we analyze and discuss translation techniques from English to Spanish and terminological findings. We found instances of intensifying metaphors from the source to target texts and adding or inserting metaphorical expressions in the target text where none were present in the source. We also found an ideological presence in translated expressions, consistent with other investigations involving security discourse. Finally, we found terminological inconsistencies in the metaphors for money laundering, tax haven, and shell company. We suggest practical implications for translators and stakeholders in the anti-money laundering discipline. We also provide pedagogical applications from custom building corpora and teaching translation of metaphors in the specialized financial regulation and compliance language. Developing specialized corpora and learning to use corpus-based translation analysis software will help translation students be better prepared for and improve the future of translation studies and their applications in specialized areas and beyond. Providing students with experience using linguistic analysis software will also help build critical technology skills that they will be able to apply across disciplines in the humanities and beyond, such as intelligence analysis and computer science. [ES]La delincuencia financiera es un factor relevante en la mayoría de los delitos transnacionales en general y tiene un gran alcance. Muchas personas interesadas utilizan metáforas específicas en sus comunicaciones para transmitir las amenazas a la seguridad. Las metáforas suelen ser expresiones idiomáticas que no se transmiten fácilmente de una lengua a otro debido a que tienen su origen en conceptos culturales. En lo que respecta a la seguridad pública, la reglamentación y el cumplimiento de la normativa, los principales interesados de diferentes orígenes lingüísticos utilizan el inglés como lengua de contacto para interactuar con sus homólogos, los medios de comunicación, el público y las partes interesadas para asegurar el cumplimiento de la normativa. La traducción de metáforas requiere un conjunto especial de habilidades adquiridas a través de un profundo conocimiento cultural y experiencia, tanto en la cultura de origen como en la de destino. El comienzo de nuestra investigación se debió a la observación de cómo el idioma desempeñaba un papel fundamental en las relaciones entre todos los implicados en el proceso de justicia penal, no solo en Estados Unidos, sino también en diversos países y regiones geográficas de habla hispana. Una comunicación altamente eficaz es esencial para que aquellos que regulan la lucha contra el blanqueo de capitales, quienes participan en iniciativas de cumplimiento de la normativa, las fuerzas y cuerpos de seguridad, así como el público en general, reconozcan y prevengan mejor el blanqueo de capitales. La génesis de este proyecto se remonta a la interpretación de causas penales, la traducción de documentos en casos de tribunales federales de Estados Unidos y la observación de cómo los investigadores seguían el rastro del dinero para descubrir actividades ilegales. La visión de primera mano de las comunicaciones en ese ámbito reveló cómo el idioma desempeñaba un papel fundamental en las relaciones entre todos los involucrados en el proceso de justicia penal, no solo en Estados Unidos, sino también en muchos países y regiones geográficas de habla hispana. Antes de este trabajo, apenas se había investigado la traducción de metáforas en el lenguaje especializado del cumplimiento y la aplicación de la normativa financiera. El presente estudio comienza a aclarar esa laguna en la investigación al ofrecer una radiografía sincrónica de la lengua que se habla actualmente en ese ámbito, a través de un análisis de la traducción de textos contra el blanqueo de capitales basado en un corpus. Desarrollamos un corpus unidireccional bilingüe inglés- español que hemos subido a Sketch Engine para su análisis. A continuación, se examinan y discuten las técnicas de traducción del inglés al español y los descubrimientos terminológicos. Encontramos casos en los que se intensifican las metáforas de los textos de origen a los de destino y se añaden o insertan expresiones metafóricas en el texto de destino en lugares en los que no se habían utilizado. Asimismo, observamos una presencia ideológica en las expresiones traducidas, de acuerdo con otras investigaciones sobre el discurso de la seguridad. Por último, nos encontramos con incongruencias terminológicas en las metáforas de blanqueo de capitales, paraíso fiscal y compañía de Shell. Nos sugerimos implicaciones prácticas para los traductores y las partes interesadas en la disciplina de la lucha contra el blanqueo de capitales. Asimismo, ofrecemos aplicaciones pedagógicas a través de la creación de corpus personalizados y la enseñanza de la traducción de metáforas en el lenguaje especializado de la regulación y el cumplimiento financiero. El desarrollo de corpus especializados y el aprendizaje de utilizar software de análisis de traducción basado en corpus ayudarán a los estudiantes de traducción a estar mejor preparados, así como también mejorarán el futuro de los estudios de traducción y sus aplicaciones en áreas especializadas y más allá. El brindar a los estudiantes experiencia en el uso de nuevos programas informáticos de análisis lingüístico también contribuirá a desarrollar aptitudes tecnológicas críticas que podrán aplicar en otras disciplinas de las humanidades y más allá, como el análisis de inteligencia y la informática
    corecore