Search CORE

1,899 research outputs found

A Comparative Study of Word Embeddings for Reading Comprehension

Author: Cohen William W.
Dhingra Bhuwan
Liu Hanxiao
Salakhutdinov Ruslan
Publication venue
Publication date: 02/03/2017
Field of study

The focus of past machine learning research for Reading Comprehension tasks has been primarily on the design of novel deep learning architectures. Here we show that seemingly minor choices made on (1) the use of pre-trained word embeddings, and (2) the representation of out-of-vocabulary tokens at test time, can turn out to have a larger impact than architectural choices on the final performance. We systematically explore several options for these choices, and provide recommendations to researchers working in this area

arXiv.org e-Print Archive

Explicit Utilization of General Knowledge in Machine Reading Comprehension

Author: Jiang Hui
Wang Chao
Publication venue
Publication date: 20/05/2019
Field of study

To bridge the gap between Machine Reading Comprehension (MRC) models and human beings, which is mainly reflected in the hunger for data and the robustness to noise, in this paper, we explore how to integrate the neural networks of MRC models with the general knowledge of human beings. On the one hand, we propose a data enrichment method, which uses WordNet to extract inter-word semantic connections as general knowledge from each given passage-question pair. On the other hand, we propose an end-to-end MRC model named as Knowledge Aided Reader (KAR), which explicitly uses the above extracted general knowledge to assist its attention mechanisms. Based on the data enrichment method, KAR is comparable in performance with the state-of-the-art MRC models, and significantly more robust to noise than them. When only a subset (20%-80%) of the training examples are available, KAR outperforms the state-of-the-art MRC models by a large margin, and is still reasonably robust to noise.Comment: ACL 201

arXiv.org e-Print Archive

Learning to Compute Word Embeddings On the Fly

Author: Bahdanau Dzmitry
Bengio Yoshua
Bosc Tom
Grefenstette Edward
Jastrzębski Stanisław
Vincent Pascal
Publication venue
Publication date: 07/03/2018
Field of study

Words in natural language follow a Zipfian distribution whereby some words are frequent but most are rare. Learning representations for words in the "long tail" of this distribution requires enormous amounts of data. Representations of rare words trained directly on end tasks are usually poor, requiring us to pre-train embeddings on external data, or treat all rare words as out-of-vocabulary words with a unique representation. We provide a method for predicting embeddings of rare words on the fly from small amounts of auxiliary data with a network trained end-to-end for the downstream task. We show that this improves results against baselines where embeddings are trained on the end task for reading comprehension, recognizing textual entailment and language modeling

arXiv.org e-Print Archive

Comparing Attention-based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading Comprehension

Author: Blohm Matthias
Jagfeld Glorianna
Sood Ekta
Vu Ngoc Thang
Yu Xiang
Publication venue
Publication date: 27/08/2018
Field of study

We propose a machine reading comprehension model based on the compare-aggregate framework with two-staged attention that achieves state-of-the-art results on the MovieQA question answering dataset. To investigate the limitations of our model as well as the behavioral difference between convolutional and recurrent neural networks, we generate adversarial examples to confuse the model and compare to human performance. Furthermore, we assess the generalizability of our model by analyzing its differences to human inference,Comment: CoNLL 201

arXiv.org e-Print Archive

Comparative Study of Machine Learning Models and BERT on SQuAD

Author: Parikh Ratnam
Patel Devshree
Raval Param
Shastri Yesha
Publication venue
Publication date: 22/05/2020
Field of study

This study aims to provide a comparative analysis of performance of certain models popular in machine learning and the BERT model on the Stanford Question Answering Dataset (SQuAD). The analysis shows that the BERT model, which was once state-of-the-art on SQuAD, gives higher accuracy in comparison to other models. However, BERT requires a greater execution time even when only 100 samples are used. This shows that with increasing accuracy more amount of time is invested in training the data. Whereas in case of preliminary machine learning models, execution time for full data is lower but accuracy is compromised

arXiv.org e-Print Archive

A Deterministic Algorithm for Bridging Anaphora Resolution

Author: Hou Yufang
Publication venue
Publication date: 14/11/2018
Field of study

Previous work on bridging anaphora resolution (Poesio et al., 2004; Hou et al., 2013b) use syntactic preposition patterns to calculate word relatedness. However, such patterns only consider NPs' head nouns and hence do not fully capture the semantics of NPs. Recently, Hou (2018) created word embeddings (embeddings_PP) to capture associative similarity (ie, relatedness) between nouns by exploring the syntactic structure of noun phrases. But embeddings_PP only contains word representations for nouns. In this paper, we create new word vectors by combining embeddings_PP with GloVe. This new word embeddings (embeddings_bridging) are a more general lexical knowledge resource for bridging and allow us to represent the meaning of an NP beyond its head easily. We therefore develop a deterministic approach for bridging anaphora resolution, which represents the semantics of an NP based on its head noun and modifications. We show that this simple approach achieves the competitive results compared to the best system in Hou et al.(2013b) which explores Markov Logic Networks to model the problem. Additionally, we further improve the results for bridging anaphora resolution reported in Hou (2018) by combining our simple deterministic approach with Hou et al.(2013b)'s best system MLN II.Comment: 11 page

arXiv.org e-Print Archive

Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches

Author: Chai Joyce Y.
Gao Qiaozi
Storks Shane
Publication venue
Publication date: 26/02/2020
Field of study

In the NLP community, recent years have seen a surge of research activities that address machines' ability to perform deep language understanding which goes beyond what is explicitly stated in text, rather relying on reasoning and knowledge of the world. Many benchmark tasks and datasets have been created to support the development and evaluation of such natural language inference ability. As these benchmarks become instrumental and a driving force for the NLP research community, this paper aims to provide an overview of recent benchmarks, relevant knowledge resources, and state-of-the-art learning and inference approaches in order to support a better understanding of this growing field

arXiv.org e-Print Archive

The emergent algebraic structure of RNNs and embeddings in NLP

Author: Cantrell Sean A.
Publication venue
Publication date: 07/03/2018
Field of study

We examine the algebraic and geometric properties of a uni-directional GRU and word embeddings trained end-to-end on a text classification task. A hyperparameter search over word embedding dimension, GRU hidden dimension, and a linear combination of the GRU outputs is performed. We conclude that words naturally embed themselves in a Lie group and that RNNs form a nonlinear representation of the group. Appealing to these results, we propose a novel class of recurrent-like neural networks and a word embedding scheme.Comment: 24 pages, 16 figure

arXiv.org e-Print Archive

Doc2Im: document to image conversion through self-attentive embedding

Author: Gupta Mithun Das
Publication venue
Publication date: 08/11/2018
Field of study

Text classification is a fundamental task in NLP applications. Latest research in this field has largely been divided into two major sub-fields. Learning representations is one sub-field and learning deeper models, both sequential and convolutional, which again connects back to the representation is the other side. We posit the idea that the stronger the representation is, the simpler classifier models are needed to achieve higher performance. In this paper we propose a completely novel direction to text classification research, wherein we convert text to a representation very similar to images, such that any deep network able to handle images is equally able to handle text. We take a deeper look at the representation of documents as an image and subsequently utilize very simple convolution based models taken as is from computer vision domain. This image can be cropped, re-scaled, re-sampled and augmented just like any other image to work with most of the state-of-the-art large convolution based models which have been designed to handle large image datasets. We show impressive results with some of the latest benchmarks in the related fields. We perform transfer learning experiments, both from text to text domain and also from image to text domain. We believe this is a paradigm shift from the way document understanding and text classification has been traditionally done, and will drive numerous novel research ideas in the community

arXiv.org e-Print Archive

Semi-Supervised Few-Shot Learning for Dual Question-Answer Extraction

Author: Chen Ke
Mehrotra Sharad
Shou Lidan
Wang Jue
Wu Sai
Publication venue
Publication date: 08/04/2019
Field of study

This paper addresses the problem of key phrase extraction from sentences. Existing state-of-the-art supervised methods require large amounts of annotated data to achieve good performance and generalization. Collecting labeled data is, however, often expensive. In this paper, we redefine the problem as question-answer extraction, and present SAMIE: Self-Asking Model for Information Ixtraction, a semi-supervised model which dually learns to ask and to answer questions by itself. Briefly, given a sentence

s

and an answer

a

, the model needs to choose the most appropriate question

\hat q

; meanwhile, for the given sentence

s

and same question

\hat q

selected in the previous step, the model will predict an answer

\hat a

. The model can support few-shot learning with very limited supervision. It can also be used to perform clustering analysis when no supervision is provided. Experimental results show that the proposed method outperforms typical supervised methods especially when given little labeled data.Comment: 7 pages, 5 figures, submission to IJCAI1

arXiv.org e-Print Archive