487 research outputs found
Multi-turn Inference Matching Network for Natural Language Inference
Natural Language Inference (NLI) is a fundamental and challenging task in
Natural Language Processing (NLP). Most existing methods only apply one-pass
inference process on a mixed matching feature, which is a concatenation of
different matching features between a premise and a hypothesis. In this paper,
we propose a new model called Multi-turn Inference Matching Network (MIMN) to
perform multi-turn inference on different matching features. In each turn, the
model focuses on one particular matching feature instead of the mixed matching
feature. To enhance the interaction between different matching features, a
memory component is employed to store the history inference information. The
inference of each turn is performed on the current matching feature and the
memory. We conduct experiments on three different NLI datasets. The
experimental results show that our model outperforms or achieves the
state-of-the-art performance on all the three datasets
Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms
In NLP, convolutional neural networks (CNNs) have benefited less than
recurrent neural networks (RNNs) from attention mechanisms. We hypothesize that
this is because the attention in CNNs has been mainly implemented as attentive
pooling (i.e., it is applied to pooling) rather than as attentive convolution
(i.e., it is integrated into convolution). Convolution is the differentiator of
CNNs in that it can powerfully model the higher-level representation of a word
by taking into account its local fixed-size context in the input text t^x. In
this work, we propose an attentive convolution network, ATTCONV. It extends the
context scope of the convolution operation, deriving higher-level features for
a word not only from local context, but also information extracted from
nonlocal context by the attention mechanism commonly used in RNNs. This
nonlocal context can come (i) from parts of the input text t^x that are distant
or (ii) from extra (i.e., external) contexts t^y. Experiments on sentence
modeling with zero-context (sentiment analysis), single-context (textual
entailment) and multiple-context (claim verification) demonstrate the
effectiveness of ATTCONV in sentence representation learning with the
incorporation of context. In particular, attentive convolution outperforms
attentive pooling and is a strong competitor to popular attentive RNNs.Comment: Camera-ready for TACL. 16 page
A Hybrid Siamese Neural Network for Natural Language Inference in Cyber-Physical Systems
Cyber-Physical Systems (CPS), as a multi-dimensional complex system that connects the physical world and the cyber world, has a strong demand for processing large amounts of heterogeneous data. These tasks also include Natural Language Inference (NLI) tasks based on text from different sources. However, the current research on natural language processing in CPS does not involve exploration in this field. Therefore, this study proposes a Siamese Network structure that combines Stacked Residual Long Short-Term Memory (bidirectional) with the Attention mechanism and Capsule Network for the NLI module in CPS, which is used to infer the relationship between text/language data from different sources. This model is mainly used to implement NLI tasks and conduct a detailed evaluation in three main NLI benchmarks as the basic semantic understanding module in CPS. Comparative experiments prove that the proposed method achieves competitive performance, has a certain generalization ability, and can balance the performance and the number of trained parameters
Enhancing the Reasoning Capabilities of Natural Language Inference Models with Attention Mechanisms and External Knowledge
Natural Language Inference (NLI) is fundamental to natural language understanding. The task summarises the natural language understanding capabilities within a simple formulation of determining whether a natural language hypothesis can be inferred from a given natural language premise. NLI requires an inference system to address the full complexity of linguistic as well as real-world commonsense knowledge and, hence, the inferencing and reasoning capabilities of an NLI system are utilised in other complex language applications such as summarisation and machine comprehension. Consequently, NLI has received significant recent attention from both academia and industry. Despite extensive research, contemporary neural NLI models face challenges arising from the sole reliance on training data to comprehend all the linguistic and real-world commonsense knowledge. Further, different attention mechanisms, crucial to the success of neural NLI models, present the prospects of better utilisation when employed in combination. In addition, the NLI research field lacks a coherent set of guidelines for the application of one of the most crucial regularisation hyper-parameters in the RNN-based NLI models -- dropout.
In this thesis, we present neural models capable of leveraging the attention mechanisms and the models that utilise external knowledge to reason about inference. First, a combined attention model to leverage different attention mechanisms is proposed. Experimentation demonstrates that the proposed model is capable of better modelling the semantics of long and complex sentences. Second, to address the limitation of the sole reliance on the training data, two novel neural frameworks utilising real-world commonsense and domain-specific external knowledge are introduced. Employing the rule-based external knowledge retrieval from the knowledge graphs, the first model takes advantage of the convolutional encoders and factorised bilinear pooling to augment the reasoning capabilities of the state-of-the-art NLI models. Utilising the significant advances in the research of contextual word representations, the second model, addresses the existing crucial challenges of external knowledge retrieval, learning the encoding of the retrieved knowledge and the fusion of the learned encodings to the NLI representations, in unique ways. Experimentation demonstrates the efficacy and superiority of the proposed models over previous state-of-the-art approaches. Third, for the limitation on dropout investigations, formulated on exhaustive evaluation, analysis and validation on the proposed RNN-based NLI models, a coherent set of guidelines is introduced
Recommended from our members
Improving and Understanding Deep Models for Natural Language Comprehension
Natural Language Comprehension is a challenging domain of Natural Language Processing. To improve a model’s language comprehension/understanding, one approach would be to enrich the structure of the model to enhance its capability in learning the latent rules of the language.
In this dissertation, we will first introduce several deep models for a variety of natural language comprehension tasks including natural language inference and question answering. Previous approaches employ reading mechanisms that do not fully exploit the interdependencies between the input sources like “premise and hypothesis” or “document and query”. In contrast, we explore more sophisticated reading mechanisms to efficiently model the relationships between input sources (e.g. “premise and hypothesis” or “document and query”). These mechanisms and models yield better empirical performances, however, due to the black-box nature of deep learning, it is difficult to assess whether the improved models indeed acquire a better understanding of language. Meanwhile, data is often plagued by meaningless or even harmful statistical biases and deep models might achieve high performance by focusing on the biases. This motivates us to study methods for “peaking inside” the black-box deep models to provide explanation and understanding of the models’ behavior. The proposed method (a.k.a. saliency) takes a step toward explaining deep learning-based models based on gradient of the model output with respect to different components like the input layer and inter-mediate layers. Saliency reveals interesting insights and identifies critical information contributing to the model decisions. Besides proposing a model-agnostic interpretation method (saliency), we study model-dependent interpretation solutions and propose two interpretable designs and structures. Finally, we introduce a novel mechanism (saliency learning), which learns from ground-truth explanation signal such that the learned model will not only make the right prediction but also for the right reason. Our experimental results on multiple tasks and datasets demonstrate the effectiveness of the proposed methods, which produce more faithful to right reasons and evidences predictions while delivering better results compared to traditionally trained models
False textual information detection, a deep learning approach
Many approaches exist for analysing fact checking for fake news identification, which is the focus of this thesis. Current approaches still perform badly on a large scale due to a lack of authority, or insufficient evidence, or in certain cases reliance on a single piece of evidence.
To address the lack of evidence and the inability of models to generalise across domains, we propose a style-aware model for detecting false information and improving existing performance. We discovered that our model was effective at detecting false information when we evaluated its generalisation ability using news articles and Twitter corpora.
We then propose to improve fact checking performance by incorporating warrants. We developed a highly efficient prediction model based on the results and demonstrated that incorporating is beneficial for fact checking. Due to a lack of external warrant data, we develop a novel model for generating warrants that aid in determining the credibility of a claim. The results indicate that when a pre-trained language model is combined with a multi-agent model, high-quality, diverse warrants are generated that contribute to task performance improvement.
To resolve a biased opinion and making rational judgments, we propose a model that can generate multiple perspectives on the claim. Experiments confirm that our Perspectives Generation model allows for the generation of diverse perspectives with a higher degree of quality and diversity than any other baseline model.
Additionally, we propose to improve the model's detection capability by generating an explainable alternative factual claim assisting the reader in identifying subtle issues that result in factual errors. The examination demonstrates that it does indeed increase the veracity of the claim.
Finally, current research has focused on stance detection and fact checking separately, we propose a unified model that integrates both tasks. Classification results demonstrate that our proposed model outperforms state-of-the-art methods
Representation of linguistic form and function in recurrent neural networks
We present novel methods for analyzing the activation patterns of recurrent neural networks from a linguistic point of view and explore the types of linguistic structure they learn. As a case study, we use a standard standalone language model, and a multi-task gated recurrent network architecture consisting of two parallel pathways with shared word embeddings: The Visual pathway is trained on predicting the representations of the visual scene corresponding to an input sentence, and the Textual pathway is trained to predict the next word in the same sentence. We propose a method for estimating the amount of contribution of individual tokens in the input to the final prediction of the networks. Using this method, we show that the Visual pathway pays selective attention to lexical categories and grammatical functions that carry semantic information, and learns to treat word types differently depending on their grammatical function and their position in the sequential structure of the sentence. In contrast, the language models are comparatively more sensitive to words with a syntactic function. Further analysis of the most informative n-gram contexts for each model shows that in comparison with the Visual pathway, the language models react more strongly to abstract contexts that represent syntactic constructions
- …