Search CORE

59 research outputs found

Sentence Embeddings in NLI with Iterative Refinement Encoders

Author: Talman Aarne Johannes
Tiedemann Jörg
Yli-Jyrä Anssi
Publication venue
Publication date: 03/06/2019
Field of study

Sentence-level representations are necessary for various NLP tasks. Recurrent neural networks have proven to be very effective in learning distributed representations and can be trained efficiently on natural language inference tasks. We build on top of one such model and propose a hierarchy of BiLSTM and max pooling layers that implements an iterative refinement strategy and yields state of the art results on the SciTail dataset as well as strong results for SNLI and MultiNLI. We can show that the sentence embeddings learned in this way can be utilized in a wide variety of transfer learning tasks, outperforming InferSent on 7 out of 10 and SkipThought on 8 out of 9 SentEval sentence embedding evaluation tasks. Furthermore, our model beats the InferSent model in 8 out of 10 recently published SentEval probing tasks designed to evaluate sentence embeddings' ability to capture some of the important linguistic properties of sentences.Peer reviewe

arXiv.org e-Print Archive

Helsingin yliopiston digitaalinen arkisto

Combining contextualized and non-contextualized embeddings for domain adaptation and beyond

Author: Pörner Nina Mareike
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 04/03/2021
Field of study

Digitale Hochschulschriften der LMU

Enhancing the Reasoning Capabilities of Natural Language Inference Models with Attention Mechanisms and External Knowledge

Author: GAJBHIYE AMIT
Publication venue
Publication date: 01/01/2020
Field of study

Natural Language Inference (NLI) is fundamental to natural language understanding. The task summarises the natural language understanding capabilities within a simple formulation of determining whether a natural language hypothesis can be inferred from a given natural language premise. NLI requires an inference system to address the full complexity of linguistic as well as real-world commonsense knowledge and, hence, the inferencing and reasoning capabilities of an NLI system are utilised in other complex language applications such as summarisation and machine comprehension. Consequently, NLI has received significant recent attention from both academia and industry. Despite extensive research, contemporary neural NLI models face challenges arising from the sole reliance on training data to comprehend all the linguistic and real-world commonsense knowledge. Further, different attention mechanisms, crucial to the success of neural NLI models, present the prospects of better utilisation when employed in combination. In addition, the NLI research field lacks a coherent set of guidelines for the application of one of the most crucial regularisation hyper-parameters in the RNN-based NLI models -- dropout. In this thesis, we present neural models capable of leveraging the attention mechanisms and the models that utilise external knowledge to reason about inference. First, a combined attention model to leverage different attention mechanisms is proposed. Experimentation demonstrates that the proposed model is capable of better modelling the semantics of long and complex sentences. Second, to address the limitation of the sole reliance on the training data, two novel neural frameworks utilising real-world commonsense and domain-specific external knowledge are introduced. Employing the rule-based external knowledge retrieval from the knowledge graphs, the first model takes advantage of the convolutional encoders and factorised bilinear pooling to augment the reasoning capabilities of the state-of-the-art NLI models. Utilising the significant advances in the research of contextual word representations, the second model, addresses the existing crucial challenges of external knowledge retrieval, learning the encoding of the retrieved knowledge and the fusion of the learned encodings to the NLI representations, in unique ways. Experimentation demonstrates the efficacy and superiority of the proposed models over previous state-of-the-art approaches. Third, for the limitation on dropout investigations, formulated on exhaustive evaluation, analysis and validation on the proposed RNN-based NLI models, a coherent set of guidelines is introduced

Durham e-Theses

Efficient Beam Tree Recursion

Author: Caragea Cornelia
Chowdhury Jishnu Ray
Publication venue
Publication date: 20/07/2023
Field of study

Beam Tree Recursive Neural Network (BT-RvNN) was recently proposed as a simple extension of Gumbel Tree RvNN and it was shown to achieve state-of-the-art length generalization performance in ListOps while maintaining comparable performance on other tasks. However, although not the worst in its kind, BT-RvNN can be still exorbitantly expensive in memory usage. In this paper, we identify the main bottleneck in BT-RvNN's memory usage to be the entanglement of the scorer function and the recursive cell function. We propose strategies to remove this bottleneck and further simplify its memory usage. Overall, our strategies not only reduce the memory usage of BT-RvNN by

10

16

times but also create a new state-of-the-art in ListOps while maintaining similar performance in other tasks. In addition, we also propose a strategy to utilize the induced latent-tree node representations produced by BT-RvNN to turn BT-RvNN from a sentence encoder of the form

f:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{d}

into a sequence contextualizer of the form

f:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{n \times d}

. Thus, our proposals not only open up a path for further scalability of RvNNs but also standardize a way to use BT-RvNNs as another building block in the deep learning toolkit that can be easily stacked or interfaced with other popular models such as Transformers and Structured State Space models

arXiv.org e-Print Archive

FarsTail: A Persian Natural Language Inference Dataset

Author: Amirak Azadeh
Amirkhani Hossein
AzariJafari Mohammad
Faridan-Jahromi Soroush
Kouhkan Zeinab
Pourjafari Zohreh
Publication venue
Publication date: 08/07/2021
Field of study

Natural language inference (NLI) is known as one of the central tasks in natural language processing (NLP) which encapsulates many fundamental aspects of language understanding. With the considerable achievements of data-hungry deep learning methods in NLP tasks, a great amount of effort has been devoted to develop more diverse datasets for different languages. In this paper, we present a new dataset for the NLI task in the Persian language, also known as Farsi, which is one of the dominant languages in the Middle East. This dataset, named FarsTail, includes 10,367 samples which are provided in both the Persian language as well as the indexed format to be useful for non-Persian researchers. The samples are generated from 3,539 multiple-choice questions with the least amount of annotator interventions in a way similar to the SciTail dataset. A carefully designed multi-step process is adopted to ensure the quality of the dataset. We also present the results of traditional and state-of-the-art methods on FarsTail including different embedding methods such as word2vec, fastText, ELMo, BERT, and LASER, as well as different modeling approaches such as DecompAtt, ESIM, HBMP, and ULMFiT to provide a solid baseline for the future research. The best obtained test accuracy is 83.38% which shows that there is a big room for improving the current methods to be useful for real-world NLP applications in different languages. We also investigate the extent to which the models exploit superficial clues, also known as dataset biases, in FarsTail, and partition the test set into easy and hard subsets according to the success of biased models. The dataset is available at https://github.com/dml-qom/FarsTai

arXiv.org e-Print Archive

Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks

Author: Abdelaziz Ibrahim
Balakrishnan Avinash
Chang Maria
Fadnis Kshitij
Fokoue Achille
Gunasekara Chulaka
Kapanipathi Pavan
Makni Bassem
Mattei Nicholas
Patel Siva Sankalp
Talamadupula Kartik
Thost Veronika
Whitehead Spencer
Publication venue
Publication date: 21/11/2019
Field of study

Textual entailment is a fundamental task in natural language processing. Most approaches for solving the problem use only the textual content present in training data. A few approaches have shown that information from external knowledge sources like knowledge graphs (KGs) can add value, in addition to the textual content, by providing background knowledge that may be critical for a task. However, the proposed models do not fully exploit the information in the usually large and noisy KGs, and it is not clear how it can be effectively encoded to be useful for entailment. We present an approach that complements text-based entailment models with information from KGs by (1) using Personalized PageR- ank to generate contextual subgraphs with reduced noise and (2) encoding these subgraphs using graph convolutional networks to capture KG structure. Our technique extends the capability of text models exploiting structural and semantic information found in KGs. We evaluate our approach on multiple textual entailment datasets and show that the use of external knowledge helps improve prediction accuracy. This is particularly evident in the challenging BreakingNLI dataset, where we see an absolute improvement of 5-20% over multiple text-based entailment models

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

딥 뉴럴 네트워크 기반의 문장 인코더를 이용한 문장 간 관계 모델링

Author: 최지헌
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :공과대학 컴퓨터공학부,2020. 2. 이상구.문장 매칭이란 두 문장 간 의미적으로 일치하는 정도를 예측하는 문제이다. 어떤 모델이 두 문장 사이의 관계를 효과적으로 밝혀내기 위해서는 높은 수준의 자연어 텍스트 이해 능력이 필요하기 때문에, 문장 매칭은 다양한 자연어 처리 응용의 성능에 중요한 영향을 미친다. 본 학위 논문에서는 문장 인코더, 매칭 함수, 준지도 학습이라는 세 가지 측면에서 문장 매칭의 성능 개선을 모색한다. 문장 인코더란 문장으로부터 유용한 특질들을 추출하는 역할을 하는 구성 요소로, 본 논문에서는 문장 인코더의 성능 향상을 위하여 Gumbel Tree-LSTM과 Cell-aware Stacked LSTM이라는 두 개의 새로운 아키텍처를 제안한다. Gumbel Tree-LSTM은 재귀적 뉴럴 네트워크(recursive neural network) 구조에 기반한 아키텍처이다. 구조 정보가 포함된 데이터를 입력으로 사용하던 기존의 재귀적 뉴럴 네트워크 모델과 달리, Gumbel Tree-LSTM은 구조가 없는 데이터로부터 특정 문제에 대한 성능을 최대화하는 파싱 전략을 학습한다. Cell-aware Stacked LSTM은 LSTM 구조를 개선한 아키텍처로, 여러 LSTM 레이어를 중첩하여 사용할 때 망각 게이트(forget gate)를 추가적으로 도입하여 수직 방향의 정보 흐름을 더 효율적으로 제어할 수 있도록 한다. 한편, 새로운 매칭 함수로서 우리는 요소별 쌍선형 문장 매칭(element-wise bilinear sentence matching, ElBiS) 함수를 제안한다. ElBiS 알고리즘은 특정 문제를 해결하는 데에 적합한 방식으로 두 문장 표현을 하나의 벡터로 합치는 방법을 자동으로 찾는 것을 목적으로 한다. 문장 표현을 얻을 때에 서로 같은 문장 인코더를 사용한다는 사실로부터 우리는 벡터의 각 요소 간 쌍선형(bilinear) 상호 작용만을 고려하여도 두 문장 벡터 간 비교를 충분히 잘 수행할 수 있다는 가설을 수립하고 이를 실험적으로 검증한다. 상호 작용의 범위를 제한함으로써, 자동으로 유용한 병합 방법을 찾는다는 이점을 유지하면서 모든 상호 작용을 고려하는 쌍선형 풀링 방법에 비해 필요한 파라미터의 수를 크게 줄일 수 있다. 마지막으로, 학습 시 레이블이 있는 데이터와 레이블이 없는 데이터를 함께 사용하는 준지도 학습을 위해 우리는 교차 문장 잠재 변수 모델(cross-sentence latent variable model, CS-LVM)을 제안한다. CS-LVM의 생성 모델은 출처 문장(source sentence)의 잠재 표현 및 출처 문장과 목표 문장(target sentence) 간의 관계를 나타내는 변수로부터 목표 문장이 생성된다고 가정한다. CS-LVM에서는 두 문장이 하나의 모델 안에서 모두 고려되기 때문에, 학습에 사용되는 목적 함수가 더 자연스럽게 정의된다. 또한, 우리는 생성 모델의 파라미터가 더 의미적으로 적합한 문장을 생성하도록 유도하기 위하여 일련의 의미 제약들을 정의한다. 본 학위 논문에서 제안된 개선 방안들은 문장 매칭 과정을 포함하는 다양한 자연어 처리 응용의 효용성을 높일 것으로 기대된다.Sentence matching is a task of predicting the degree of match between two sentences. Since high level of understanding natural language text is needed for a model to identify the relationship between two sentences, it is an important component for various natural language processing applications. In this dissertation, we seek for the improvement of the sentence matching module from the following three ingredients: sentence encoder, matching function, and semi-supervised learning. To enhance a sentence encoder network which takes responsibility of extracting useful features from a sentence, we propose two new sentence encoder architectures: Gumbel Tree-LSTM and Cell-aware Stacked LSTM (CAS-LSTM). Gumbel Tree-LSTM is based on a recursive neural network (RvNN) architecture, however unlike typical RvNN architectures it does not need a structured input. Instead, it learns from data a parsing strategy that is optimized for a specific task. The latter, CAS-LSTM, extends the stacked long short-term memory (LSTM) architecture by introducing an additional forget gate for better handling of vertical information flow. And then, as a new matching function, we present the element-wise bilinear sentence matching (ElBiS) function. It aims to automatically find an aggregation scheme that fuses two sentence representations into a single one suitable for a specific task. From the fact that a sentence encoder is shared across inputs, we hypothesize and empirically prove that considering only the element-wise bilinear interaction is sufficient for comparing two sentence vectors. By restricting the interaction, we can largely reduce the number of required parameters compared with full bilinear pooling methods without losing the advantage of automatically discovering useful aggregation schemes. Finally, to facilitate semi-supervised training, i.e. to make use of both labeled and unlabeled data in training, we propose the cross-sentence latent variable model (CS-LVM). Its generative model assumes that a target sentence is generated from the latent representation of a source sentence and the variable indicating the relationship between the source and the target sentence. As it considers the two sentences in a pair together in a single model, the training objectives are defined more naturally than prior approaches based on the variational auto-encoder (VAE). We also define semantic constraints that force the generator to generate semantically more plausible sentences. We believe that the improvements proposed in this dissertation would advance the effectiveness of various natural language processing applications containing modeling sentence pairs.Chapter 1 Introduction 1 1.1 Sentence Matching 1 1.2 Deep Neural Networks for Sentence Matching 2 1.3 Scope of the Dissertation 4 Chapter 2 Background and Related Work 9 2.1 Sentence Encoders 9 2.2 Matching Functions 11 2.3 Semi-Supervised Training 13 Chapter 3 Sentence Encoder: Gumbel Tree-LSTM 15 3.1 Motivation 15 3.2 Preliminaries 16 3.2.1 Recursive Neural Networks 16 3.2.2 Training RvNNs without Tree Information 17 3.3 Model Description 19 3.3.1 Tree-LSTM 19 3.3.2 Gumbel-Softmax 20 3.3.3 Gumbel Tree-LSTM 22 3.4 Implementation Details 25 3.5 Experiments 27 3.5.1 Natural Language Inference 27 3.5.2 Sentiment Analysis 32 3.5.3 Qualitative Analysis 33 3.6 Summary 36 Chapter 4 Sentence Encoder: Cell-aware Stacked LSTM 38 4.1 Motivation 38 4.2 Related Work 40 4.3 Model Description 43 4.3.1 Stacked LSTMs 43 4.3.2 Cell-aware Stacked LSTMs 44 4.3.3 Sentence Encoders 46 4.4 Experiments 47 4.4.1 Natural Language Inference 47 4.4.2 Paraphrase Identification 50 4.4.3 Sentiment Classification 52 4.4.4 Machine Translation 53 4.4.5 Forget Gate Analysis 55 4.4.6 Model Variations 56 4.5 Summary 59 Chapter 5 Matching Function: Element-wise Bilinear Sentence Matching 60 5.1 Motivation 60 5.2 Proposed Method: ElBiS 61 5.3 Experiments 63 5.3.1 Natural language inference 64 5.3.2 Paraphrase Identification 66 5.4 Summary and Discussion 68 Chapter 6 Semi-Supervised Training: Cross-Sentence Latent Variable Model 70 6.1 Motivation 70 6.2 Preliminaries 71 6.2.1 Variational Auto-Encoders 71 6.2.2 von Mises–Fisher Distribution 73 6.3 Proposed Framework: CS-LVM 74 6.3.1 Cross-Sentence Latent Variable Model 75 6.3.2 Architecture 78 6.3.3 Optimization 79 6.4 Experiments 84 6.4.1 Natural Language Inference 84 6.4.2 Paraphrase Identification 85 6.4.3 Ablation Study 86 6.4.4 Generated Sentences 88 6.4.5 Implementation Details 89 6.5 Summary and Discussion 90 Chapter 7 Conclusion 92 Appendix A Appendix 96 A.1 Sentences Generated from CS-LVM 96Docto

SNU Open Repository and Archive

Recommended from our members

Analysis and Applications of Cross-Lingual Models in Natural Language Processing

Author: Fujinuma Yoshinari
Publication venue: University of Colorado Boulder
Publication date: 17/11/2021
Field of study

Human languages vary in terms of both typologically and data availability. A typical machine learning-based approach for natural language processing (NLP) requires training data from the language of interest. However, because machine learning-based approaches heavily rely on the amount of data available in each language, the quality of trained model languages without a large amount of data is poor. One way to overcome the lack of data in each language is to conduct cross-lingual transfer learning from resource-rich languages to resource-scarce languages. Cross-lingual word embeddings and multilingual contextualized embeddings are commonly used to conduct cross-lingual transfer learning. However, the lack of resources still makes it challenging to either evaluate or improve such models. This dissertation first proposes a graph-based method to overcome the lack of evaluation data in low-resource languages by focusing on the structure of cross-lingual word embeddings, further discussing approaches to improve cross-lingual transfer learning by using retrofitting methods and by focusing on a specific task. Finally, it provides an analysis of the effect of adding different languages when pretraining multilingual models

CU Scholar Institutional Repository