59 research outputs found

    Sentence Embeddings in NLI with Iterative Refinement Encoders

    Get PDF
    Sentence-level representations are necessary for various NLP tasks. Recurrent neural networks have proven to be very effective in learning distributed representations and can be trained efficiently on natural language inference tasks. We build on top of one such model and propose a hierarchy of BiLSTM and max pooling layers that implements an iterative refinement strategy and yields state of the art results on the SciTail dataset as well as strong results for SNLI and MultiNLI. We can show that the sentence embeddings learned in this way can be utilized in a wide variety of transfer learning tasks, outperforming InferSent on 7 out of 10 and SkipThought on 8 out of 9 SentEval sentence embedding evaluation tasks. Furthermore, our model beats the InferSent model in 8 out of 10 recently published SentEval probing tasks designed to evaluate sentence embeddings' ability to capture some of the important linguistic properties of sentences.Peer reviewe

    Combining contextualized and non-contextualized embeddings for domain adaptation and beyond

    Get PDF

    Enhancing the Reasoning Capabilities of Natural Language Inference Models with Attention Mechanisms and External Knowledge

    Get PDF
    Natural Language Inference (NLI) is fundamental to natural language understanding. The task summarises the natural language understanding capabilities within a simple formulation of determining whether a natural language hypothesis can be inferred from a given natural language premise. NLI requires an inference system to address the full complexity of linguistic as well as real-world commonsense knowledge and, hence, the inferencing and reasoning capabilities of an NLI system are utilised in other complex language applications such as summarisation and machine comprehension. Consequently, NLI has received significant recent attention from both academia and industry. Despite extensive research, contemporary neural NLI models face challenges arising from the sole reliance on training data to comprehend all the linguistic and real-world commonsense knowledge. Further, different attention mechanisms, crucial to the success of neural NLI models, present the prospects of better utilisation when employed in combination. In addition, the NLI research field lacks a coherent set of guidelines for the application of one of the most crucial regularisation hyper-parameters in the RNN-based NLI models -- dropout. In this thesis, we present neural models capable of leveraging the attention mechanisms and the models that utilise external knowledge to reason about inference. First, a combined attention model to leverage different attention mechanisms is proposed. Experimentation demonstrates that the proposed model is capable of better modelling the semantics of long and complex sentences. Second, to address the limitation of the sole reliance on the training data, two novel neural frameworks utilising real-world commonsense and domain-specific external knowledge are introduced. Employing the rule-based external knowledge retrieval from the knowledge graphs, the first model takes advantage of the convolutional encoders and factorised bilinear pooling to augment the reasoning capabilities of the state-of-the-art NLI models. Utilising the significant advances in the research of contextual word representations, the second model, addresses the existing crucial challenges of external knowledge retrieval, learning the encoding of the retrieved knowledge and the fusion of the learned encodings to the NLI representations, in unique ways. Experimentation demonstrates the efficacy and superiority of the proposed models over previous state-of-the-art approaches. Third, for the limitation on dropout investigations, formulated on exhaustive evaluation, analysis and validation on the proposed RNN-based NLI models, a coherent set of guidelines is introduced

    Efficient Beam Tree Recursion

    Full text link
    Beam Tree Recursive Neural Network (BT-RvNN) was recently proposed as a simple extension of Gumbel Tree RvNN and it was shown to achieve state-of-the-art length generalization performance in ListOps while maintaining comparable performance on other tasks. However, although not the worst in its kind, BT-RvNN can be still exorbitantly expensive in memory usage. In this paper, we identify the main bottleneck in BT-RvNN's memory usage to be the entanglement of the scorer function and the recursive cell function. We propose strategies to remove this bottleneck and further simplify its memory usage. Overall, our strategies not only reduce the memory usage of BT-RvNN by 1010-1616 times but also create a new state-of-the-art in ListOps while maintaining similar performance in other tasks. In addition, we also propose a strategy to utilize the induced latent-tree node representations produced by BT-RvNN to turn BT-RvNN from a sentence encoder of the form f:Rnร—dโ†’Rdf:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{d} into a sequence contextualizer of the form f:Rnร—dโ†’Rnร—df:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{n \times d}. Thus, our proposals not only open up a path for further scalability of RvNNs but also standardize a way to use BT-RvNNs as another building block in the deep learning toolkit that can be easily stacked or interfaced with other popular models such as Transformers and Structured State Space models

    FarsTail: A Persian Natural Language Inference Dataset

    Full text link
    Natural language inference (NLI) is known as one of the central tasks in natural language processing (NLP) which encapsulates many fundamental aspects of language understanding. With the considerable achievements of data-hungry deep learning methods in NLP tasks, a great amount of effort has been devoted to develop more diverse datasets for different languages. In this paper, we present a new dataset for the NLI task in the Persian language, also known as Farsi, which is one of the dominant languages in the Middle East. This dataset, named FarsTail, includes 10,367 samples which are provided in both the Persian language as well as the indexed format to be useful for non-Persian researchers. The samples are generated from 3,539 multiple-choice questions with the least amount of annotator interventions in a way similar to the SciTail dataset. A carefully designed multi-step process is adopted to ensure the quality of the dataset. We also present the results of traditional and state-of-the-art methods on FarsTail including different embedding methods such as word2vec, fastText, ELMo, BERT, and LASER, as well as different modeling approaches such as DecompAtt, ESIM, HBMP, and ULMFiT to provide a solid baseline for the future research. The best obtained test accuracy is 83.38% which shows that there is a big room for improving the current methods to be useful for real-world NLP applications in different languages. We also investigate the extent to which the models exploit superficial clues, also known as dataset biases, in FarsTail, and partition the test set into easy and hard subsets according to the success of biased models. The dataset is available at https://github.com/dml-qom/FarsTai

    Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks

    Full text link
    Textual entailment is a fundamental task in natural language processing. Most approaches for solving the problem use only the textual content present in training data. A few approaches have shown that information from external knowledge sources like knowledge graphs (KGs) can add value, in addition to the textual content, by providing background knowledge that may be critical for a task. However, the proposed models do not fully exploit the information in the usually large and noisy KGs, and it is not clear how it can be effectively encoded to be useful for entailment. We present an approach that complements text-based entailment models with information from KGs by (1) using Personalized PageR- ank to generate contextual subgraphs with reduced noise and (2) encoding these subgraphs using graph convolutional networks to capture KG structure. Our technique extends the capability of text models exploiting structural and semantic information found in KGs. We evaluate our approach on multiple textual entailment datasets and show that the use of external knowledge helps improve prediction accuracy. This is particularly evident in the challenging BreakingNLI dataset, where we see an absolute improvement of 5-20% over multiple text-based entailment models

    ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜์˜ ๋ฌธ์žฅ ์ธ์ฝ”๋”๋ฅผ ์ด์šฉํ•œ ๋ฌธ์žฅ ๊ฐ„ ๊ด€๊ณ„ ๋ชจ๋ธ๋ง

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ์ด์ƒ๊ตฌ.๋ฌธ์žฅ ๋งค์นญ์ด๋ž€ ๋‘ ๋ฌธ์žฅ ๊ฐ„ ์˜๋ฏธ์ ์œผ๋กœ ์ผ์น˜ํ•˜๋Š” ์ •๋„๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ์ด๋‹ค. ์–ด๋–ค ๋ชจ๋ธ์ด ๋‘ ๋ฌธ์žฅ ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๋ฐํ˜€๋‚ด๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋†’์€ ์ˆ˜์ค€์˜ ์ž์—ฐ์–ด ํ…์ŠคํŠธ ์ดํ•ด ๋Šฅ๋ ฅ์ด ํ•„์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋ฌธ์žฅ ๋งค์นญ์€ ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์‘์šฉ์˜ ์„ฑ๋Šฅ์— ์ค‘์š”ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฌธ์žฅ ์ธ์ฝ”๋”, ๋งค์นญ ํ•จ์ˆ˜, ์ค€์ง€๋„ ํ•™์Šต์ด๋ผ๋Š” ์„ธ ๊ฐ€์ง€ ์ธก๋ฉด์—์„œ ๋ฌธ์žฅ ๋งค์นญ์˜ ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ๋ชจ์ƒ‰ํ•œ๋‹ค. ๋ฌธ์žฅ ์ธ์ฝ”๋”๋ž€ ๋ฌธ์žฅ์œผ๋กœ๋ถ€ํ„ฐ ์œ ์šฉํ•œ ํŠน์งˆ๋“ค์„ ์ถ”์ถœํ•˜๋Š” ์—ญํ• ์„ ํ•˜๋Š” ๊ตฌ์„ฑ ์š”์†Œ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฌธ์žฅ ์ธ์ฝ”๋”์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•˜์—ฌ Gumbel Tree-LSTM๊ณผ Cell-aware Stacked LSTM์ด๋ผ๋Š” ๋‘ ๊ฐœ์˜ ์ƒˆ๋กœ์šด ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค. Gumbel Tree-LSTM์€ ์žฌ๊ท€์  ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ(recursive neural network) ๊ตฌ์กฐ์— ๊ธฐ๋ฐ˜ํ•œ ์•„ํ‚คํ…์ฒ˜์ด๋‹ค. ๊ตฌ์กฐ ์ •๋ณด๊ฐ€ ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜๋˜ ๊ธฐ์กด์˜ ์žฌ๊ท€์  ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๋ชจ๋ธ๊ณผ ๋‹ฌ๋ฆฌ, Gumbel Tree-LSTM์€ ๊ตฌ์กฐ๊ฐ€ ์—†๋Š” ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํŠน์ • ๋ฌธ์ œ์— ๋Œ€ํ•œ ์„ฑ๋Šฅ์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ํŒŒ์‹ฑ ์ „๋žต์„ ํ•™์Šตํ•œ๋‹ค. Cell-aware Stacked LSTM์€ LSTM ๊ตฌ์กฐ๋ฅผ ๊ฐœ์„ ํ•œ ์•„ํ‚คํ…์ฒ˜๋กœ, ์—ฌ๋Ÿฌ LSTM ๋ ˆ์ด์–ด๋ฅผ ์ค‘์ฒฉํ•˜์—ฌ ์‚ฌ์šฉํ•  ๋•Œ ๋ง๊ฐ ๊ฒŒ์ดํŠธ(forget gate)๋ฅผ ์ถ”๊ฐ€์ ์œผ๋กœ ๋„์ž…ํ•˜์—ฌ ์ˆ˜์ง ๋ฐฉํ–ฅ์˜ ์ •๋ณด ํ๋ฆ„์„ ๋” ํšจ์œจ์ ์œผ๋กœ ์ œ์–ดํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ํ•œํŽธ, ์ƒˆ๋กœ์šด ๋งค์นญ ํ•จ์ˆ˜๋กœ์„œ ์šฐ๋ฆฌ๋Š” ์š”์†Œ๋ณ„ ์Œ์„ ํ˜• ๋ฌธ์žฅ ๋งค์นญ(element-wise bilinear sentence matching, ElBiS) ํ•จ์ˆ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ElBiS ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํŠน์ • ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ์— ์ ํ•ฉํ•œ ๋ฐฉ์‹์œผ๋กœ ๋‘ ๋ฌธ์žฅ ํ‘œํ˜„์„ ํ•˜๋‚˜์˜ ๋ฒกํ„ฐ๋กœ ํ•ฉ์น˜๋Š” ๋ฐฉ๋ฒ•์„ ์ž๋™์œผ๋กœ ์ฐพ๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•œ๋‹ค. ๋ฌธ์žฅ ํ‘œํ˜„์„ ์–ป์„ ๋•Œ์— ์„œ๋กœ ๊ฐ™์€ ๋ฌธ์žฅ ์ธ์ฝ”๋”๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ์‚ฌ์‹ค๋กœ๋ถ€ํ„ฐ ์šฐ๋ฆฌ๋Š” ๋ฒกํ„ฐ์˜ ๊ฐ ์š”์†Œ ๊ฐ„ ์Œ์„ ํ˜•(bilinear) ์ƒํ˜ธ ์ž‘์šฉ๋งŒ์„ ๊ณ ๋ คํ•˜์—ฌ๋„ ๋‘ ๋ฌธ์žฅ ๋ฒกํ„ฐ ๊ฐ„ ๋น„๊ต๋ฅผ ์ถฉ๋ถ„ํžˆ ์ž˜ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฐ€์„ค์„ ์ˆ˜๋ฆฝํ•˜๊ณ  ์ด๋ฅผ ์‹คํ—˜์ ์œผ๋กœ ๊ฒ€์ฆํ•œ๋‹ค. ์ƒํ˜ธ ์ž‘์šฉ์˜ ๋ฒ”์œ„๋ฅผ ์ œํ•œํ•จ์œผ๋กœ์จ, ์ž๋™์œผ๋กœ ์œ ์šฉํ•œ ๋ณ‘ํ•ฉ ๋ฐฉ๋ฒ•์„ ์ฐพ๋Š”๋‹ค๋Š” ์ด์ ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๋ชจ๋“  ์ƒํ˜ธ ์ž‘์šฉ์„ ๊ณ ๋ คํ•˜๋Š” ์Œ์„ ํ˜• ํ’€๋ง ๋ฐฉ๋ฒ•์— ๋น„ํ•ด ํ•„์š”ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์ˆ˜๋ฅผ ํฌ๊ฒŒ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ํ•™์Šต ์‹œ ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ์™€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋Š” ์ค€์ง€๋„ ํ•™์Šต์„ ์œ„ํ•ด ์šฐ๋ฆฌ๋Š” ๊ต์ฐจ ๋ฌธ์žฅ ์ž ์žฌ ๋ณ€์ˆ˜ ๋ชจ๋ธ(cross-sentence latent variable model, CS-LVM)์„ ์ œ์•ˆํ•œ๋‹ค. CS-LVM์˜ ์ƒ์„ฑ ๋ชจ๋ธ์€ ์ถœ์ฒ˜ ๋ฌธ์žฅ(source sentence)์˜ ์ž ์žฌ ํ‘œํ˜„ ๋ฐ ์ถœ์ฒ˜ ๋ฌธ์žฅ๊ณผ ๋ชฉํ‘œ ๋ฌธ์žฅ(target sentence) ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ณ€์ˆ˜๋กœ๋ถ€ํ„ฐ ๋ชฉํ‘œ ๋ฌธ์žฅ์ด ์ƒ์„ฑ๋œ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. CS-LVM์—์„œ๋Š” ๋‘ ๋ฌธ์žฅ์ด ํ•˜๋‚˜์˜ ๋ชจ๋ธ ์•ˆ์—์„œ ๋ชจ๋‘ ๊ณ ๋ ค๋˜๊ธฐ ๋•Œ๋ฌธ์—, ํ•™์Šต์— ์‚ฌ์šฉ๋˜๋Š” ๋ชฉ์  ํ•จ์ˆ˜๊ฐ€ ๋” ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ •์˜๋œ๋‹ค. ๋˜ํ•œ, ์šฐ๋ฆฌ๋Š” ์ƒ์„ฑ ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋” ์˜๋ฏธ์ ์œผ๋กœ ์ ํ•ฉํ•œ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•˜๋„๋ก ์œ ๋„ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ผ๋ จ์˜ ์˜๋ฏธ ์ œ์•ฝ๋“ค์„ ์ •์˜ํ•œ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ๊ฐœ์„  ๋ฐฉ์•ˆ๋“ค์€ ๋ฌธ์žฅ ๋งค์นญ ๊ณผ์ •์„ ํฌํ•จํ•˜๋Š” ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์‘์šฉ์˜ ํšจ์šฉ์„ฑ์„ ๋†’์ผ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.Sentence matching is a task of predicting the degree of match between two sentences. Since high level of understanding natural language text is needed for a model to identify the relationship between two sentences, it is an important component for various natural language processing applications. In this dissertation, we seek for the improvement of the sentence matching module from the following three ingredients: sentence encoder, matching function, and semi-supervised learning. To enhance a sentence encoder network which takes responsibility of extracting useful features from a sentence, we propose two new sentence encoder architectures: Gumbel Tree-LSTM and Cell-aware Stacked LSTM (CAS-LSTM). Gumbel Tree-LSTM is based on a recursive neural network (RvNN) architecture, however unlike typical RvNN architectures it does not need a structured input. Instead, it learns from data a parsing strategy that is optimized for a specific task. The latter, CAS-LSTM, extends the stacked long short-term memory (LSTM) architecture by introducing an additional forget gate for better handling of vertical information flow. And then, as a new matching function, we present the element-wise bilinear sentence matching (ElBiS) function. It aims to automatically find an aggregation scheme that fuses two sentence representations into a single one suitable for a specific task. From the fact that a sentence encoder is shared across inputs, we hypothesize and empirically prove that considering only the element-wise bilinear interaction is sufficient for comparing two sentence vectors. By restricting the interaction, we can largely reduce the number of required parameters compared with full bilinear pooling methods without losing the advantage of automatically discovering useful aggregation schemes. Finally, to facilitate semi-supervised training, i.e. to make use of both labeled and unlabeled data in training, we propose the cross-sentence latent variable model (CS-LVM). Its generative model assumes that a target sentence is generated from the latent representation of a source sentence and the variable indicating the relationship between the source and the target sentence. As it considers the two sentences in a pair together in a single model, the training objectives are defined more naturally than prior approaches based on the variational auto-encoder (VAE). We also define semantic constraints that force the generator to generate semantically more plausible sentences. We believe that the improvements proposed in this dissertation would advance the effectiveness of various natural language processing applications containing modeling sentence pairs.Chapter 1 Introduction 1 1.1 Sentence Matching 1 1.2 Deep Neural Networks for Sentence Matching 2 1.3 Scope of the Dissertation 4 Chapter 2 Background and Related Work 9 2.1 Sentence Encoders 9 2.2 Matching Functions 11 2.3 Semi-Supervised Training 13 Chapter 3 Sentence Encoder: Gumbel Tree-LSTM 15 3.1 Motivation 15 3.2 Preliminaries 16 3.2.1 Recursive Neural Networks 16 3.2.2 Training RvNNs without Tree Information 17 3.3 Model Description 19 3.3.1 Tree-LSTM 19 3.3.2 Gumbel-Softmax 20 3.3.3 Gumbel Tree-LSTM 22 3.4 Implementation Details 25 3.5 Experiments 27 3.5.1 Natural Language Inference 27 3.5.2 Sentiment Analysis 32 3.5.3 Qualitative Analysis 33 3.6 Summary 36 Chapter 4 Sentence Encoder: Cell-aware Stacked LSTM 38 4.1 Motivation 38 4.2 Related Work 40 4.3 Model Description 43 4.3.1 Stacked LSTMs 43 4.3.2 Cell-aware Stacked LSTMs 44 4.3.3 Sentence Encoders 46 4.4 Experiments 47 4.4.1 Natural Language Inference 47 4.4.2 Paraphrase Identification 50 4.4.3 Sentiment Classification 52 4.4.4 Machine Translation 53 4.4.5 Forget Gate Analysis 55 4.4.6 Model Variations 56 4.5 Summary 59 Chapter 5 Matching Function: Element-wise Bilinear Sentence Matching 60 5.1 Motivation 60 5.2 Proposed Method: ElBiS 61 5.3 Experiments 63 5.3.1 Natural language inference 64 5.3.2 Paraphrase Identification 66 5.4 Summary and Discussion 68 Chapter 6 Semi-Supervised Training: Cross-Sentence Latent Variable Model 70 6.1 Motivation 70 6.2 Preliminaries 71 6.2.1 Variational Auto-Encoders 71 6.2.2 von Misesโ€“Fisher Distribution 73 6.3 Proposed Framework: CS-LVM 74 6.3.1 Cross-Sentence Latent Variable Model 75 6.3.2 Architecture 78 6.3.3 Optimization 79 6.4 Experiments 84 6.4.1 Natural Language Inference 84 6.4.2 Paraphrase Identification 85 6.4.3 Ablation Study 86 6.4.4 Generated Sentences 88 6.4.5 Implementation Details 89 6.5 Summary and Discussion 90 Chapter 7 Conclusion 92 Appendix A Appendix 96 A.1 Sentences Generated from CS-LVM 96Docto
    • โ€ฆ
    corecore