59 research outputs found
Sentence Embeddings in NLI with Iterative Refinement Encoders
Sentence-level representations are necessary for various NLP tasks. Recurrent neural networks have proven to be very effective in learning distributed representations and can be trained efficiently on natural language inference tasks. We build on top of one such model and propose a hierarchy of BiLSTM and max pooling layers that implements an iterative refinement strategy and yields state of the art results on the SciTail dataset as well as strong results for SNLI and MultiNLI. We can show that the sentence embeddings learned in this way can be utilized in a wide variety of transfer learning tasks, outperforming InferSent on 7 out of 10 and SkipThought on 8 out of 9 SentEval sentence embedding evaluation tasks. Furthermore, our model beats the InferSent model in 8 out of 10 recently published SentEval probing tasks designed to evaluate sentence embeddings' ability to capture some of the important linguistic properties of sentences.Peer reviewe
Enhancing the Reasoning Capabilities of Natural Language Inference Models with Attention Mechanisms and External Knowledge
Natural Language Inference (NLI) is fundamental to natural language understanding. The task summarises the natural language understanding capabilities within a simple formulation of determining whether a natural language hypothesis can be inferred from a given natural language premise. NLI requires an inference system to address the full complexity of linguistic as well as real-world commonsense knowledge and, hence, the inferencing and reasoning capabilities of an NLI system are utilised in other complex language applications such as summarisation and machine comprehension. Consequently, NLI has received significant recent attention from both academia and industry. Despite extensive research, contemporary neural NLI models face challenges arising from the sole reliance on training data to comprehend all the linguistic and real-world commonsense knowledge. Further, different attention mechanisms, crucial to the success of neural NLI models, present the prospects of better utilisation when employed in combination. In addition, the NLI research field lacks a coherent set of guidelines for the application of one of the most crucial regularisation hyper-parameters in the RNN-based NLI models -- dropout.
In this thesis, we present neural models capable of leveraging the attention mechanisms and the models that utilise external knowledge to reason about inference. First, a combined attention model to leverage different attention mechanisms is proposed. Experimentation demonstrates that the proposed model is capable of better modelling the semantics of long and complex sentences. Second, to address the limitation of the sole reliance on the training data, two novel neural frameworks utilising real-world commonsense and domain-specific external knowledge are introduced. Employing the rule-based external knowledge retrieval from the knowledge graphs, the first model takes advantage of the convolutional encoders and factorised bilinear pooling to augment the reasoning capabilities of the state-of-the-art NLI models. Utilising the significant advances in the research of contextual word representations, the second model, addresses the existing crucial challenges of external knowledge retrieval, learning the encoding of the retrieved knowledge and the fusion of the learned encodings to the NLI representations, in unique ways. Experimentation demonstrates the efficacy and superiority of the proposed models over previous state-of-the-art approaches. Third, for the limitation on dropout investigations, formulated on exhaustive evaluation, analysis and validation on the proposed RNN-based NLI models, a coherent set of guidelines is introduced
Efficient Beam Tree Recursion
Beam Tree Recursive Neural Network (BT-RvNN) was recently proposed as a
simple extension of Gumbel Tree RvNN and it was shown to achieve
state-of-the-art length generalization performance in ListOps while maintaining
comparable performance on other tasks. However, although not the worst in its
kind, BT-RvNN can be still exorbitantly expensive in memory usage. In this
paper, we identify the main bottleneck in BT-RvNN's memory usage to be the
entanglement of the scorer function and the recursive cell function. We propose
strategies to remove this bottleneck and further simplify its memory usage.
Overall, our strategies not only reduce the memory usage of BT-RvNN by
- times but also create a new state-of-the-art in ListOps while
maintaining similar performance in other tasks. In addition, we also propose a
strategy to utilize the induced latent-tree node representations produced by
BT-RvNN to turn BT-RvNN from a sentence encoder of the form into a sequence contextualizer of the
form . Thus, our
proposals not only open up a path for further scalability of RvNNs but also
standardize a way to use BT-RvNNs as another building block in the deep
learning toolkit that can be easily stacked or interfaced with other popular
models such as Transformers and Structured State Space models
FarsTail: A Persian Natural Language Inference Dataset
Natural language inference (NLI) is known as one of the central tasks in
natural language processing (NLP) which encapsulates many fundamental aspects
of language understanding. With the considerable achievements of data-hungry
deep learning methods in NLP tasks, a great amount of effort has been devoted
to develop more diverse datasets for different languages. In this paper, we
present a new dataset for the NLI task in the Persian language, also known as
Farsi, which is one of the dominant languages in the Middle East. This dataset,
named FarsTail, includes 10,367 samples which are provided in both the Persian
language as well as the indexed format to be useful for non-Persian
researchers. The samples are generated from 3,539 multiple-choice questions
with the least amount of annotator interventions in a way similar to the
SciTail dataset. A carefully designed multi-step process is adopted to ensure
the quality of the dataset. We also present the results of traditional and
state-of-the-art methods on FarsTail including different embedding methods such
as word2vec, fastText, ELMo, BERT, and LASER, as well as different modeling
approaches such as DecompAtt, ESIM, HBMP, and ULMFiT to provide a solid
baseline for the future research. The best obtained test accuracy is 83.38%
which shows that there is a big room for improving the current methods to be
useful for real-world NLP applications in different languages. We also
investigate the extent to which the models exploit superficial clues, also
known as dataset biases, in FarsTail, and partition the test set into easy and
hard subsets according to the success of biased models. The dataset is
available at https://github.com/dml-qom/FarsTai
Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks
Textual entailment is a fundamental task in natural language processing. Most
approaches for solving the problem use only the textual content present in
training data. A few approaches have shown that information from external
knowledge sources like knowledge graphs (KGs) can add value, in addition to the
textual content, by providing background knowledge that may be critical for a
task. However, the proposed models do not fully exploit the information in the
usually large and noisy KGs, and it is not clear how it can be effectively
encoded to be useful for entailment. We present an approach that complements
text-based entailment models with information from KGs by (1) using
Personalized PageR- ank to generate contextual subgraphs with reduced noise and
(2) encoding these subgraphs using graph convolutional networks to capture KG
structure. Our technique extends the capability of text models exploiting
structural and semantic information found in KGs. We evaluate our approach on
multiple textual entailment datasets and show that the use of external
knowledge helps improve prediction accuracy. This is particularly evident in
the challenging BreakingNLI dataset, where we see an absolute improvement of
5-20% over multiple text-based entailment models
๋ฅ ๋ด๋ด ๋คํธ์ํฌ ๊ธฐ๋ฐ์ ๋ฌธ์ฅ ์ธ์ฝ๋๋ฅผ ์ด์ฉํ ๋ฌธ์ฅ ๊ฐ ๊ด๊ณ ๋ชจ๋ธ๋ง
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ,2020. 2. ์ด์๊ตฌ.๋ฌธ์ฅ ๋งค์นญ์ด๋ ๋ ๋ฌธ์ฅ ๊ฐ ์๋ฏธ์ ์ผ๋ก ์ผ์นํ๋ ์ ๋๋ฅผ ์์ธกํ๋ ๋ฌธ์ ์ด๋ค.
์ด๋ค ๋ชจ๋ธ์ด ๋ ๋ฌธ์ฅ ์ฌ์ด์ ๊ด๊ณ๋ฅผ ํจ๊ณผ์ ์ผ๋ก ๋ฐํ๋ด๊ธฐ ์ํด์๋ ๋์ ์์ค์ ์์ฐ์ด ํ
์คํธ ์ดํด ๋ฅ๋ ฅ์ด ํ์ํ๊ธฐ ๋๋ฌธ์, ๋ฌธ์ฅ ๋งค์นญ์ ๋ค์ํ ์์ฐ์ด ์ฒ๋ฆฌ ์์ฉ์ ์ฑ๋ฅ์ ์ค์ํ ์ํฅ์ ๋ฏธ์น๋ค.
๋ณธ ํ์ ๋
ผ๋ฌธ์์๋ ๋ฌธ์ฅ ์ธ์ฝ๋, ๋งค์นญ ํจ์, ์ค์ง๋ ํ์ต์ด๋ผ๋ ์ธ ๊ฐ์ง ์ธก๋ฉด์์ ๋ฌธ์ฅ ๋งค์นญ์ ์ฑ๋ฅ ๊ฐ์ ์ ๋ชจ์ํ๋ค.
๋ฌธ์ฅ ์ธ์ฝ๋๋ ๋ฌธ์ฅ์ผ๋ก๋ถํฐ ์ ์ฉํ ํน์ง๋ค์ ์ถ์ถํ๋ ์ญํ ์ ํ๋ ๊ตฌ์ฑ ์์๋ก, ๋ณธ ๋
ผ๋ฌธ์์๋ ๋ฌธ์ฅ ์ธ์ฝ๋์ ์ฑ๋ฅ ํฅ์์ ์ํ์ฌ Gumbel Tree-LSTM๊ณผ Cell-aware Stacked LSTM์ด๋ผ๋ ๋ ๊ฐ์ ์๋ก์ด ์ํคํ
์ฒ๋ฅผ ์ ์ํ๋ค.
Gumbel Tree-LSTM์ ์ฌ๊ท์ ๋ด๋ด ๋คํธ์ํฌ(recursive neural network) ๊ตฌ์กฐ์ ๊ธฐ๋ฐํ ์ํคํ
์ฒ์ด๋ค.
๊ตฌ์กฐ ์ ๋ณด๊ฐ ํฌํจ๋ ๋ฐ์ดํฐ๋ฅผ ์
๋ ฅ์ผ๋ก ์ฌ์ฉํ๋ ๊ธฐ์กด์ ์ฌ๊ท์ ๋ด๋ด ๋คํธ์ํฌ ๋ชจ๋ธ๊ณผ ๋ฌ๋ฆฌ, Gumbel Tree-LSTM์ ๊ตฌ์กฐ๊ฐ ์๋ ๋ฐ์ดํฐ๋ก๋ถํฐ ํน์ ๋ฌธ์ ์ ๋ํ ์ฑ๋ฅ์ ์ต๋ํํ๋ ํ์ฑ ์ ๋ต์ ํ์ตํ๋ค.
Cell-aware Stacked LSTM์ LSTM ๊ตฌ์กฐ๋ฅผ ๊ฐ์ ํ ์ํคํ
์ฒ๋ก, ์ฌ๋ฌ LSTM ๋ ์ด์ด๋ฅผ ์ค์ฒฉํ์ฌ ์ฌ์ฉํ ๋ ๋ง๊ฐ ๊ฒ์ดํธ(forget gate)๋ฅผ ์ถ๊ฐ์ ์ผ๋ก ๋์
ํ์ฌ ์์ง ๋ฐฉํฅ์ ์ ๋ณด ํ๋ฆ์ ๋ ํจ์จ์ ์ผ๋ก ์ ์ดํ ์ ์๋๋ก ํ๋ค.
ํํธ, ์๋ก์ด ๋งค์นญ ํจ์๋ก์ ์ฐ๋ฆฌ๋ ์์๋ณ ์์ ํ ๋ฌธ์ฅ ๋งค์นญ(element-wise bilinear sentence matching, ElBiS) ํจ์๋ฅผ ์ ์ํ๋ค.
ElBiS ์๊ณ ๋ฆฌ์ฆ์ ํน์ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๋ ๋ฐ์ ์ ํฉํ ๋ฐฉ์์ผ๋ก ๋ ๋ฌธ์ฅ ํํ์ ํ๋์ ๋ฒกํฐ๋ก ํฉ์น๋ ๋ฐฉ๋ฒ์ ์๋์ผ๋ก ์ฐพ๋ ๊ฒ์ ๋ชฉ์ ์ผ๋ก ํ๋ค.
๋ฌธ์ฅ ํํ์ ์ป์ ๋์ ์๋ก ๊ฐ์ ๋ฌธ์ฅ ์ธ์ฝ๋๋ฅผ ์ฌ์ฉํ๋ค๋ ์ฌ์ค๋ก๋ถํฐ ์ฐ๋ฆฌ๋ ๋ฒกํฐ์ ๊ฐ ์์ ๊ฐ ์์ ํ(bilinear) ์ํธ ์์ฉ๋ง์ ๊ณ ๋ คํ์ฌ๋ ๋ ๋ฌธ์ฅ ๋ฒกํฐ ๊ฐ ๋น๊ต๋ฅผ ์ถฉ๋ถํ ์ ์ํํ ์ ์๋ค๋ ๊ฐ์ค์ ์๋ฆฝํ๊ณ ์ด๋ฅผ ์คํ์ ์ผ๋ก ๊ฒ์ฆํ๋ค.
์ํธ ์์ฉ์ ๋ฒ์๋ฅผ ์ ํํจ์ผ๋ก์จ, ์๋์ผ๋ก ์ ์ฉํ ๋ณํฉ ๋ฐฉ๋ฒ์ ์ฐพ๋๋ค๋ ์ด์ ์ ์ ์งํ๋ฉด์ ๋ชจ๋ ์ํธ ์์ฉ์ ๊ณ ๋ คํ๋ ์์ ํ ํ๋ง ๋ฐฉ๋ฒ์ ๋นํด ํ์ํ ํ๋ผ๋ฏธํฐ์ ์๋ฅผ ํฌ๊ฒ ์ค์ผ ์ ์๋ค.
๋ง์ง๋ง์ผ๋ก, ํ์ต ์ ๋ ์ด๋ธ์ด ์๋ ๋ฐ์ดํฐ์ ๋ ์ด๋ธ์ด ์๋ ๋ฐ์ดํฐ๋ฅผ ํจ๊ป ์ฌ์ฉํ๋ ์ค์ง๋ ํ์ต์ ์ํด ์ฐ๋ฆฌ๋ ๊ต์ฐจ ๋ฌธ์ฅ ์ ์ฌ ๋ณ์ ๋ชจ๋ธ(cross-sentence latent variable model, CS-LVM)์ ์ ์ํ๋ค.
CS-LVM์ ์์ฑ ๋ชจ๋ธ์ ์ถ์ฒ ๋ฌธ์ฅ(source sentence)์ ์ ์ฌ ํํ ๋ฐ ์ถ์ฒ ๋ฌธ์ฅ๊ณผ ๋ชฉํ ๋ฌธ์ฅ(target sentence) ๊ฐ์ ๊ด๊ณ๋ฅผ ๋ํ๋ด๋ ๋ณ์๋ก๋ถํฐ ๋ชฉํ ๋ฌธ์ฅ์ด ์์ฑ๋๋ค๊ณ ๊ฐ์ ํ๋ค.
CS-LVM์์๋ ๋ ๋ฌธ์ฅ์ด ํ๋์ ๋ชจ๋ธ ์์์ ๋ชจ๋ ๊ณ ๋ ค๋๊ธฐ ๋๋ฌธ์, ํ์ต์ ์ฌ์ฉ๋๋ ๋ชฉ์ ํจ์๊ฐ ๋ ์์ฐ์ค๋ฝ๊ฒ ์ ์๋๋ค.
๋ํ, ์ฐ๋ฆฌ๋ ์์ฑ ๋ชจ๋ธ์ ํ๋ผ๋ฏธํฐ๊ฐ ๋ ์๋ฏธ์ ์ผ๋ก ์ ํฉํ ๋ฌธ์ฅ์ ์์ฑํ๋๋ก ์ ๋ํ๊ธฐ ์ํ์ฌ ์ผ๋ จ์ ์๋ฏธ ์ ์ฝ๋ค์ ์ ์ํ๋ค.
๋ณธ ํ์ ๋
ผ๋ฌธ์์ ์ ์๋ ๊ฐ์ ๋ฐฉ์๋ค์ ๋ฌธ์ฅ ๋งค์นญ ๊ณผ์ ์ ํฌํจํ๋ ๋ค์ํ ์์ฐ์ด ์ฒ๋ฆฌ ์์ฉ์ ํจ์ฉ์ฑ์ ๋์ผ ๊ฒ์ผ๋ก ๊ธฐ๋๋๋ค.Sentence matching is a task of predicting the degree of match between two sentences.
Since high level of understanding natural language text is needed for a model to identify the relationship between two sentences,
it is an important component for various natural language processing applications.
In this dissertation, we seek for the improvement of the sentence matching module from the following three ingredients: sentence encoder, matching function, and semi-supervised learning.
To enhance a sentence encoder network which takes responsibility of extracting useful features from a sentence, we propose two new sentence encoder architectures: Gumbel Tree-LSTM and Cell-aware Stacked LSTM (CAS-LSTM).
Gumbel Tree-LSTM is based on a recursive neural network (RvNN) architecture, however unlike typical RvNN architectures it does not need a structured input.
Instead, it learns from data a parsing strategy that is optimized for a specific task.
The latter, CAS-LSTM, extends the stacked long short-term memory (LSTM) architecture by introducing an additional forget gate for better handling of vertical information flow.
And then, as a new matching function, we present the element-wise bilinear sentence matching (ElBiS) function.
It aims to automatically find an aggregation scheme that fuses two sentence representations into a single one suitable for a specific task.
From the fact that a sentence encoder is shared across inputs, we hypothesize and empirically prove that considering only the element-wise bilinear interaction is sufficient for comparing two sentence vectors.
By restricting the interaction, we can largely reduce the number of required parameters compared with full bilinear pooling methods without losing the advantage of automatically discovering useful aggregation schemes.
Finally, to facilitate semi-supervised training, i.e. to make use of both labeled and unlabeled data in training, we propose the cross-sentence latent variable model (CS-LVM).
Its generative model assumes that a target sentence is generated from the latent representation of a source sentence and the variable indicating the relationship between the source and the target sentence.
As it considers the two sentences in a pair together in a single model, the training objectives are defined more naturally than prior approaches based on the variational auto-encoder (VAE).
We also define semantic constraints that force the generator to generate semantically more plausible sentences.
We believe that the improvements proposed in this dissertation would advance the effectiveness of various natural language processing applications containing modeling sentence pairs.Chapter 1 Introduction 1
1.1 Sentence Matching 1
1.2 Deep Neural Networks for Sentence Matching 2
1.3 Scope of the Dissertation 4
Chapter 2 Background and Related Work 9
2.1 Sentence Encoders 9
2.2 Matching Functions 11
2.3 Semi-Supervised Training 13
Chapter 3 Sentence Encoder: Gumbel Tree-LSTM 15
3.1 Motivation 15
3.2 Preliminaries 16
3.2.1 Recursive Neural Networks 16
3.2.2 Training RvNNs without Tree Information 17
3.3 Model Description 19
3.3.1 Tree-LSTM 19
3.3.2 Gumbel-Softmax 20
3.3.3 Gumbel Tree-LSTM 22
3.4 Implementation Details 25
3.5 Experiments 27
3.5.1 Natural Language Inference 27
3.5.2 Sentiment Analysis 32
3.5.3 Qualitative Analysis 33
3.6 Summary 36
Chapter 4 Sentence Encoder: Cell-aware Stacked LSTM 38
4.1 Motivation 38
4.2 Related Work 40
4.3 Model Description 43
4.3.1 Stacked LSTMs 43
4.3.2 Cell-aware Stacked LSTMs 44
4.3.3 Sentence Encoders 46
4.4 Experiments 47
4.4.1 Natural Language Inference 47
4.4.2 Paraphrase Identification 50
4.4.3 Sentiment Classification 52
4.4.4 Machine Translation 53
4.4.5 Forget Gate Analysis 55
4.4.6 Model Variations 56
4.5 Summary 59
Chapter 5 Matching Function: Element-wise Bilinear Sentence Matching 60
5.1 Motivation 60
5.2 Proposed Method: ElBiS 61
5.3 Experiments 63
5.3.1 Natural language inference 64
5.3.2 Paraphrase Identification 66
5.4 Summary and Discussion 68
Chapter 6 Semi-Supervised Training: Cross-Sentence Latent Variable Model 70
6.1 Motivation 70
6.2 Preliminaries 71
6.2.1 Variational Auto-Encoders 71
6.2.2 von MisesโFisher Distribution 73
6.3 Proposed Framework: CS-LVM 74
6.3.1 Cross-Sentence Latent Variable Model 75
6.3.2 Architecture 78
6.3.3 Optimization 79
6.4 Experiments 84
6.4.1 Natural Language Inference 84
6.4.2 Paraphrase Identification 85
6.4.3 Ablation Study 86
6.4.4 Generated Sentences 88
6.4.5 Implementation Details 89
6.5 Summary and Discussion 90
Chapter 7 Conclusion 92
Appendix A Appendix 96
A.1 Sentences Generated from CS-LVM 96Docto
Recommended from our members
Analysis and Applications of Cross-Lingual Models in Natural Language Processing
Human languages vary in terms of both typologically and data availability. A typical machine learning-based approach for natural language processing (NLP) requires training data from the language of interest. However, because machine learning-based approaches heavily rely on the amount of data available in each language, the quality of trained model languages without a large amount of data is poor. One way to overcome the lack of data in each language is to conduct cross-lingual transfer learning from resource-rich languages to resource-scarce languages. Cross-lingual word embeddings and multilingual contextualized embeddings are commonly used to conduct cross-lingual transfer learning. However, the lack of resources still makes it challenging to either evaluate or improve such models. This dissertation first proposes a graph-based method to overcome the lack of evaluation data in low-resource languages by focusing on the structure of cross-lingual word embeddings, further discussing approaches to improve cross-lingual transfer learning by using retrofitting methods and by focusing on a specific task. Finally, it provides an analysis of the effect of adding different languages when pretraining multilingual models
- โฆ