Search CORE

5,812 research outputs found

Using sign language corpora as bilingual corpora for data mining:Contrastive linguistics and computer-assisted annotation

Author: Cleve Anthony
Crasborn Onno
Meurant Laurence
Publication venue
Publication date: 01/01/2016
Field of study

Using sign language corpora as bilingual corpora for data mining:Contrastive linguistics and computer-assisted annotation

Author: Cleve Anthony
Crasborn Onno
Meurant Laurence
Publication venue
Publication date: 01/01/2016
Field of study

Contains fulltext : 166336.pdf (publisher's version ) (Open Access)7th Workshop on the Representation and Processing of Sign Languages: Corpus Minin

Radboud Repository

Repository of the University of Namur

A Survey of Paraphrasing and Textual Entailment Methods

Author: Androutsopoulos Ion
Malakasiotis Prodromos
Publication venue: 'AI Access Foundation'
Publication date: 30/05/2010
Field of study

Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of Informatics, Athens University of Economics and Business, Greece, 201

arXiv.org e-Print Archive

Crossref

Using sign language corpora as bilingual corpora for data mining:Contrastive linguistics and computer-assisted annotation

Author: Cleve Anthony
Crasborn Onno
Meurant Laurence
Publication venue
Publication date: 01/01/2016
Field of study

Repository of the University of Namur

Automatic Construction of Discourse Corpora for Dialogue Translation

Author: Liu Qun
Tu Zhaopeng
Wang Longyue
Way Andy
Zhang Xiaojun
Publication venue: 'Museum National d''Histoire Naturelle, Paris, France'
Publication date: 13/05/2016
Field of study

In this paper, a novel approach is proposed to automatically construct parallel discourse corpus for dialogue machine translation. Firstly, the parallel subtitle data and its corresponding monolingual movie script data are crawled and collected from Internet. Then tags such as speaker and discourse boundary from the script data are projected to its subtitle data via an information retrieval approach in order to map monolingual discourse to bilingual texts. We not only evaluate the mapping results, but also integrate speaker information into the translation. Experiments show our proposed method can achieve 81.79% and 98.64% accuracy on speaker and dialogue boundary annotation, and speaker-based language model adaptation can obtain around 0.5 BLEU points improvement in translation qualities. Finally, we publicly release around 100K parallel discourse data with manual speaker and dialogue boundary annotation

arXiv.org e-Print Archive

Stirling Online Research Repository (RIOXX)

Irish Universities

DCU Online Research Access Service

Stirling Online Research Repository

XL-NBT: A Cross-lingual Neural Belief Tracking Framework

Author: Chen Jianshu
Chen Wenhu
Su Yu
Wang William Yang
Wang Xin
Yan Xifeng
Yu Dong
Publication venue
Publication date: 01/01/2018
Field of study

Task-oriented dialog systems are becoming pervasive, and many companies heavily rely on them to complement human agents for customer service in call centers. With globalization, the need for providing cross-lingual customer support becomes more urgent than ever. However, cross-lingual support poses great challenges---it requires a large amount of additional annotated data from native speakers. In order to bypass the expensive human annotation and achieve the first step towards the ultimate goal of building a universal dialog system, we set out to build a cross-lingual state tracking framework. Specifically, we assume that there exists a source language with dialog belief tracking annotations while the target languages have no annotated dialog data of any form. Then, we pre-train a state tracker for the source language as a teacher, which is able to exploit easy-to-access parallel data. We then distill and transfer its own knowledge to the student state tracker in target languages. We specifically discuss two types of common parallel resources: bilingual corpus and bilingual dictionary, and design different transfer learning strategies accordingly. Experimentally, we successfully use English state tracker as the teacher to transfer its knowledge to both Italian and German trackers and achieve promising results.Comment: 13 pages, 5 figures, 3 tables, accepted to EMNLP 2018 conferenc

arXiv.org e-Print Archive

Crossref