Search CORE

6,487 research outputs found

Cross-language Learning with Adversarial Neural Networks: Application to Community Question Answering

Author: Jaradat Israa
Joty Shafiq
Màrquez Lluís
Nakov Preslav
Publication venue
Publication date: 21/06/2017
Field of study

We address the problem of cross-language adaptation for question-question similarity reranking in community question answering, with the objective to port a system trained on one input language to another input language given labeled training data for the first language and only unlabeled data for the second language. In particular, we propose to use adversarial training of neural networks to learn high-level features that are discriminative for the main learning task, and at the same time are invariant across the input languages. The evaluation results show sizable improvements for our cross-language adversarial neural network (CLANN) model over a strong non-adversarial system.Comment: CoNLL-2017: The SIGNLL Conference on Computational Natural Language Learning; cross-language adversarial neural network (CLANN) model; adversarial training; cross-language adaptation; community question answering; question-question similarit

arXiv.org e-Print Archive

A Call for More Rigor in Unsupervised Cross-lingual Learning

Author: Agirre Eneko
Artetxe Mikel
Labaka Gorka
Ruder Sebastian
Yogatama Dani
Publication venue
Publication date: 30/04/2020
Field of study

We review motivations, definition, approaches, and methodology for unsupervised cross-lingual learning and call for a more rigorous position in each of them. An existing rationale for such research is based on the lack of parallel data for many of the world's languages. However, we argue that a scenario without any parallel data and abundant monolingual data is unrealistic in practice. We also discuss different training signals that have been used in previous work, which depart from the pure unsupervised setting. We then describe common methodological issues in tuning and evaluation of unsupervised cross-lingual models and present best practices. Finally, we provide a unified outlook for different types of research in this area (i.e., cross-lingual word embeddings, deep multilingual pretraining, and unsupervised machine translation) and argue for comparable evaluation of these models.Comment: ACL 202

arXiv.org e-Print Archive

Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension

Author: Bai Xuanyu
Duan Nan
Fu Yan
Gong Ming
Jiang Daxin
Liang Yaobo
Shou Linjun
Yuan Fei
Publication venue
Publication date: 08/05/2020
Field of study

Multilingual pre-trained models could leverage the training data from a rich source language (such as English) to improve performance on low resource languages. However, the transfer quality for multilingual Machine Reading Comprehension (MRC) is significantly worse than sentence classification tasks mainly due to the requirement of MRC to detect the word level answer boundary. In this paper, we propose two auxiliary tasks in the fine-tuning stage to create additional phrase boundary supervision: (1) A mixed MRC task, which translates the question or passage to other languages and builds cross-lingual question-passage pairs; (2) A language-agnostic knowledge masking task by leveraging knowledge phrases mined from web. Besides, extensive experiments on two cross-lingual MRC datasets show the effectiveness of our proposed approach.Comment: Accepted to ACL 202

arXiv.org e-Print Archive

Comparative Study of Machine Learning Models and BERT on SQuAD

Author: Parikh Ratnam
Patel Devshree
Raval Param
Shastri Yesha
Publication venue
Publication date: 22/05/2020
Field of study

This study aims to provide a comparative analysis of performance of certain models popular in machine learning and the BERT model on the Stanford Question Answering Dataset (SQuAD). The analysis shows that the BERT model, which was once state-of-the-art on SQuAD, gives higher accuracy in comparison to other models. However, BERT requires a greater execution time even when only 100 samples are used. This shows that with increasing accuracy more amount of time is invested in training the data. Whereas in case of preliminary machine learning models, execution time for full data is lower but accuracy is compromised

arXiv.org e-Print Archive

Neural Machine Translation for Query Construction and Composition

Author: Esteves Diego
Marx Edgard
Moussallem Diego
Publio Gustavo
Soru Tommaso
Valdestilhas André
Publication venue
Publication date: 09/07/2018
Field of study

Research on question answering with knowledge base has recently seen an increasing use of deep architectures. In this extended abstract, we study the application of the neural machine translation paradigm for question parsing. We employ a sequence-to-sequence model to learn graph patterns in the SPARQL graph query language and their compositions. Instead of inducing the programs through question-answer pairs, we expect a semi-supervised approach, where alignments between questions and queries are built through templates. We argue that the coverage of language utterances can be expanded using late notable works in natural language generation.Comment: ICML workshop on Neural Abstract Machines & Program Induction v2 (NAMPI), extended abstrac

arXiv.org e-Print Archive

Multilingual Extractive Reading Comprehension by Runtime Machine Translation

Author: Asai Akari
Eriguchi Akiko
Hashimoto Kazuma
Tsuruoka Yoshimasa
Publication venue
Publication date: 02/11/2018
Field of study

Despite recent work in Reading Comprehension (RC), progress has been mostly limited to English due to the lack of large-scale datasets in other languages. In this work, we introduce the first RC system for languages without RC training data. Given a target language without RC training data and a pivot language with RC training data (e.g. English), our method leverages existing RC resources in the pivot language by combining a competitive RC model in the pivot language with an attentive Neural Machine Translation (NMT) model. We first translate the data from the target to the pivot language, and then obtain an answer using the RC model in the pivot language. Finally, we recover the corresponding answer in the original language using soft-alignment attention scores from the NMT model. We create evaluation sets of RC data in two non-English languages, namely Japanese and French, to evaluate our method. Experimental results on these datasets show that our method significantly outperforms a back-translation baseline of a state-of-the-art product-level machine translation system

arXiv.org e-Print Archive

Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech

Author: Chuang Galen
Glass James
Harwath David
Publication venue
Publication date: 09/04/2018
Field of study

In this paper, we explore the learning of neural network embeddings for natural images and speech waveforms describing the content of those images. These embeddings are learned directly from the waveforms without the use of linguistic transcriptions or conventional speech recognition technology. While prior work has investigated this setting in the monolingual case using English speech data, this work represents the first effort to apply these techniques to languages beyond English. Using spoken captions collected in English and Hindi, we show that the same model architecture can be successfully applied to both languages. Further, we demonstrate that training a multilingual model simultaneously on both languages offers improved performance over the monolingual models. Finally, we show that these models are capable of performing semantic cross-lingual speech-to-speech retrieval.Comment: to appear at ICASSP 201

arXiv.org e-Print Archive

A Joint Model for Question Answering and Question Generation

Author: Trischler Adam
Wang Tong
Yuan Xingdi
Publication venue
Publication date: 05/06/2017
Field of study

We propose a generative machine comprehension model that learns jointly to ask and answer questions based on documents. The proposed model uses a sequence-to-sequence framework that encodes the document and generates a question (answer) given an answer (question). Significant improvement in model performance is observed empirically on the SQuAD corpus, confirming our hypothesis that the model benefits from jointly learning to perform both tasks. We believe the joint model's novelty offers a new perspective on machine comprehension beyond architectural engineering, and serves as a first step towards autonomous information seeking

arXiv.org e-Print Archive

Learning to Represent Words in Context with Multilingual Supervision

Author: Dyer Chris
Kawakami Kazuya
Publication venue
Publication date: 19/11/2015
Field of study

We present a neural network architecture based on bidirectional LSTMs to compute representations of words in the sentential contexts. These context-sensitive word representations are suitable for, e.g., distinguishing different word senses and other context-modulated variations in meaning. To learn the parameters of our model, we use cross-lingual supervision, hypothesizing that a good representation of a word in context will be one that is sufficient for selecting the correct translation into a second language. We evaluate the quality of our representations as features in three downstream tasks: prediction of semantic supersenses (which assign nouns and verbs into a few dozen semantic classes), low resource machine translation, and a lexical substitution task, and obtain state-of-the-art results on all of these

arXiv.org e-Print Archive

Deep Learning for Sentiment Analysis : A Survey

Author: Liu Bing
Wang Shuai
Zhang Lei
Publication venue
Publication date: 30/01/2018
Field of study

Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.Comment: 34 pages, 9 figures, 2 table

arXiv.org e-Print Archive