Search CORE

75,272 research outputs found

Cross-Language Question Re-Ranking

Author: Brown Peter F.
Cao Yunbo
Darwish Kareem
Forner Pamela
Guzmán Francisco
Jeon Jiwoon
Ji Zongcheng
Luong Thang
Mikolov Tomas
Nicosia Massimo
Severyn Aliaksei
Socher Richard
Tiedemann Jörg
Upadhyay Shyam
Zhang Kai
Zhou Guangyou
Publication venue
Publication date: 04/10/2017
Field of study

We study how to find relevant questions in community forums when the language of the new questions is different from that of the existing questions in the forum. In particular, we explore the Arabic-English language pair. We compare a kernel-based system with a feed-forward neural network in a scenario where a large parallel corpus is available for training a machine translation system, bilingual dictionaries, and cross-language word embeddings. We observe that both approaches degrade the performance of the system when working on the translated text, especially the kernel-based system, which depends heavily on a syntactic kernel. We address this issue using a cross-language tree kernel, which compares the original Arabic tree to the English trees of the related questions. We show that this kernel almost closes the performance gap with respect to the monolingual system. On the neural network side, we use the parallel corpus to train cross-language embeddings, which we then use to represent the Arabic input and the English related questions in the same space. The results also improve to close to those of the monolingual neural network. Overall, the kernel system shows a better performance compared to the neural network in all cases.Comment: SIGIR-2017; Community Question Answering; Cross-language Approaches; Question Retrieval; Kernel-based Methods; Neural Networks; Distributed Representation

arXiv.org e-Print Archive

Crossref

Adversarial Domain Adaptation for Duplicate Question Detection

Author: Lei Tao
Moschitti Alessandro
Nakov Preslav
Romeo Salvatore
Shah Darsh J
Publication venue
Publication date: 01/01/2018
Field of study

We address the problem of detecting duplicate questions in forums, which is an important step towards automating the process of answering new questions. As finding and annotating such potential duplicates manually is very tedious and costly, automatic methods based on machine learning are a viable alternative. However, many forums do not have annotated data, i.e., questions labeled by experts as duplicates, and thus a promising solution is to use domain adaptation from another forum that has such annotations. Here we focus on adversarial domain adaptation, deriving important findings about when it performs well and what properties of the domains are important in this regard. Our experiments with StackExchange data show an average improvement of 5.6% over the best baseline across multiple pairs of domains.Comment: EMNLP 2018 short paper - camera ready. 8 page

arXiv.org e-Print Archive

Crossref