Search CORE

64,328 research outputs found

Recognizing textual entailment using deep learning techniques

Author: Yang Han
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2017
Field of study

Textual Entailment (TE) or Natural Language Inference (NLI) refers to the problem of determining a directional relation between two text fragments. To specify, given a sentence pair (a, b), the task is to predict whether b is entailed by a, b is contradicted to a, or whether the relation between a and b is neutral. NLI is a central problem in natural language understanding. Recently, the dominating trend of works for NLI is based on artificial neural networks, which aims at building deep and complex encoder to transform a sentence into encoded vectors. End-to-end artificial neural networks have reached state-ofthe-art performance in NLI field. For instance, there are recurrent neural network based encoders, which recursively concatenate each word with its previous memory, until the whole information of a sentence has been derived. The most common RNN encoders are Long Short-Term Memory Networks (LSTM; Hochreiter and Schmidhuber, 1997) and Gated Recurrent Unit (Cho et al., 2014). RNNs have surpassed the performance of traditional baselines in many NLP tasks (Dai et al., 2015). There are also convolutional neural network (LeCun et al., 1989) based encoders, which concatenate the sentence information by applying multiple convolving filters over the sentence. CNNs have achieved state-ofthe-art results on computer vision (Krizhevsky et al., 2012), machine translation (Costa-jussà M.R., 2016) and also on various NLP tasks (Collobert et al., 2011). In this paper, we use the model introduced by (Adina Williams et al., 2017) as the baseline model for the NLI task. The baseline model is consisted with a word-level embedding layer and a BiLSTM encoder. We augment the baseline model and propose our Character-level Intra Attention Networks (CIAN). In our CIAN model, we use the character-level convolutional network to replace the standard word-level embedding layer, and we use the intra attention layer to capture the intra-sentence semantics. One contribution of our CIAN model is that we implement the character-level convolutional network introduced by (Kim et al., 2016). Most of the sequence encoders use word-level embedding layer initialized with pre-trained word vectors such as GloVe (Pennington et al., 2014). In that way, the words in a sentence are not independent anymore, which helps the encoders to catch more internal information of a sentence. However, as the growth of vocabulary size in the modern corpus, there will be more and more out-of-vocabulary (OOV) words that are not presented in the pre-trained word embedding vector. As the word-level embedding is blind to subword information (e.g. morphemes), it leads to high perplexities for those OOV words. We use the character-level convolutional network in our model to exploit the character-level information, which will be computed from the characters of corresponding word. By doing so, our model gains the ability to learn rich semantic and orthographic features from the encoding of characters. Another contribution of our CIAN model is that we implement the intra attention mechanism introduced by (Z. Yang et al., 2017). The major advantage of attention mechanism is the ability to efficiently encode long sentences. As the size of the input grows, models that do not use attention will miss information and precision if they only use the final representation. Attention is a clever way to fix this issue and experiments indeed confirm the intuition. Another advantage of attention mechanism is that we can enhance the interpretability of the model by visualizing the attention weights of an encoded sentence. We conduct the visualization of the attention weights in chapter 5, which helps us to understand how the model judges the textual entailment relation between two sentences. The proposed CIAN is implemented using Keras and evaluated upon a newly published MNLI corpus in the RepEval 2017 workshop. The test accuracy for the CIAN model upon matched test dataset is improved with 0.9 percent compared with the baseline model. Based on the improved result, we published a paper with title Character-level Intra Attention Networks for Natural Language Inference in the RepEval 2017 workshop, as an achievement of this thesis. To summarize, the CIAN model presented in this paper is a sequence encoder that has the ability to to encode long sentence in character-level with rich semantic and orthographic features. Also, the attention mechanism provides high interpretability of the model that allows people to understand how the model doing its task. As it’s an end-to-end neural network that does not need any specific pre-processing or outside data like pre-trained word embeddings. It can be easily applied to other encoder architecture tasks such as language modeling, sentiment analysis and question answering

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A Context-aware Attention Network for Interactive Question Answering

Author: Ge Yong
Kadav Asim
Li Huayu
Min Martin Renqiang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/09/2017
Field of study

Neural network based sequence-to-sequence models in an encoder-decoder framework have been successfully applied to solve Question Answering (QA) problems, predicting answers from statements and questions. However, almost all previous models have failed to consider detailed context information and unknown states under which systems do not have enough information to answer given questions. These scenarios with incomplete or ambiguous information are very common in the setting of Interactive Question Answering (IQA). To address this challenge, we develop a novel model, employing context-dependent word-level attention for more accurate statement representations and question-guided sentence-level attention for better context modeling. We also generate unique IQA datasets to test our model, which will be made publicly available. Employing these attention mechanisms, our model accurately understands when it can output an answer or when it requires generating a supplementary question for additional input depending on different contexts. When available, user's feedback is encoded and directly applied to update sentence-level attention to infer an answer. Extensive experiments on QA and IQA datasets quantitatively demonstrate the effectiveness of our model with significant improvement over state-of-the-art conventional QA models.Comment: 9 page

arXiv.org e-Print Archive

Crossref