2,302 research outputs found

    Training Input-Output Recurrent Neural Networks through Spectral Methods

    Get PDF
    We consider the problem of training input-output recurrent neural networks (RNN) for sequence labeling tasks. We propose a novel spectral approach for learning the network parameters. It is based on decomposition of the cross-moment tensor between the output and a non-linear transformation of the input, based on score functions. We guarantee consistent learning with polynomial sample and computational complexity under transparent conditions such as non-degeneracy of model parameters, polynomial activations for the neurons, and a Markovian evolution of the input sequence. We also extend our results to Bidirectional RNN which uses both previous and future information to output the label at each time point, and is employed in many NLP tasks such as POS tagging

    Neural Vector Spaces for Unsupervised Information Retrieval

    Get PDF
    We propose the Neural Vector Space Model (NVSM), a method that learns representations of documents in an unsupervised manner for news article retrieval. In the NVSM paradigm, we learn low-dimensional representations of words and documents from scratch using gradient descent and rank documents according to their similarity with query representations that are composed from word representations. We show that NVSM performs better at document ranking than existing latent semantic vector space methods. The addition of NVSM to a mixture of lexical language models and a state-of-the-art baseline vector space model yields a statistically significant increase in retrieval effectiveness. Consequently, NVSM adds a complementary relevance signal. Next to semantic matching, we find that NVSM performs well in cases where lexical matching is needed. NVSM learns a notion of term specificity directly from the document collection without feature engineering. We also show that NVSM learns regularities related to Luhn significance. Finally, we give advice on how to deploy NVSM in situations where model selection (e.g., cross-validation) is infeasible. We find that an unsupervised ensemble of multiple models trained with different hyperparameter values performs better than a single cross-validated model. Therefore, NVSM can safely be used for ranking documents without supervised relevance judgments.Comment: TOIS 201

    Measuring Short Text Semantic Similarity with Deep Learning Models

    Get PDF
    Natural language processing (NLP) is the ability of a computer program to understand human language as it is spoken, which is a subfield of artificial intelligence (AI). The development of NLP applications is challenging because computers traditionally require humans to speak" to them in a programming language that is precise, unambiguous and highly structured, or through a limited number of clearly enunciated voice commands. We study the use of deep learning models, the state-of-the-art artificial intelligence (AI) method, for the problem of measuring short text semantic similarity in NLP area. In particular, we propose a novel deep neural network architecture to identify semantic similarity for pairs of question sentence. In the proposed network, multiple channels of knowledge for pairs of question text can be utilized to improve the representation of text. Then a dense layer is used to learn a classifier for classifying duplicated question pairs. Through extensive experiments on the Quora test collection, our proposed approach has shown remarkable and significant improvement over strong baselines, which verifies the effectiveness of the deep models as well as the proposed deep multi-channel framework

    Cross-lingual transfer learning and multitask learning for capturing multiword expressions

    Get PDF
    This is an accepted manuscript of an article published by Association for Computational Linguistics in Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), available online: https://www.aclweb.org/anthology/W19-5119 The accepted version of the publication may differ from the final published version.Recent developments in deep learning have prompted a surge of interest in the application of multitask and transfer learning to NLP problems. In this study, we explore for the first time, the application of transfer learning (TRL) and multitask learning (MTL) to the identification of Multiword Expressions (MWEs). For MTL, we exploit the shared syntactic information between MWE and dependency parsing models to jointly train a single model on both tasks. We specifically predict two types of labels: MWE and dependency parse. Our neural MTL architecture utilises the supervision of dependency parsing in lower layers and predicts MWE tags in upper layers. In the TRL scenario, we overcome the scarcity of data by learning a model on a larger MWE dataset and transferring the knowledge to a resource-poor setting in another language. In both scenarios, the resulting models achieved higher performance compared to standard neural approaches

    A Hybrid Siamese Neural Network for Natural Language Inference in Cyber-Physical Systems

    Get PDF
    Cyber-Physical Systems (CPS), as a multi-dimensional complex system that connects the physical world and the cyber world, has a strong demand for processing large amounts of heterogeneous data. These tasks also include Natural Language Inference (NLI) tasks based on text from different sources. However, the current research on natural language processing in CPS does not involve exploration in this field. Therefore, this study proposes a Siamese Network structure that combines Stacked Residual Long Short-Term Memory (bidirectional) with the Attention mechanism and Capsule Network for the NLI module in CPS, which is used to infer the relationship between text/language data from different sources. This model is mainly used to implement NLI tasks and conduct a detailed evaluation in three main NLI benchmarks as the basic semantic understanding module in CPS. Comparative experiments prove that the proposed method achieves competitive performance, has a certain generalization ability, and can balance the performance and the number of trained parameters

    A Bi-Encoder LSTM Model for Learning Unstructured Dialogs

    Get PDF
    Creating a data-driven model that is trained on a large dataset of unstructured dialogs is a crucial step in developing a Retrieval-based Chatbot systems. This thesis presents a Long Short Term Memory (LSTM) based Recurrent Neural Network architecture that learns unstructured multi-turn dialogs and provides implementation results on the task of selecting the best response from a collection of given responses. Ubuntu Dialog Corpus Version 2 (UDCv2) was used as the corpus for training. Ryan et al. (2015) explored learning models such as TF-IDF (Term Frequency-Inverse Document Frequency), Recurrent Neural Network (RNN) and a Dual Encoder (DE) based on Long Short Term Memory (LSTM) model suitable to learn from the Ubuntu Dialog Corpus Version 1 (UDCv1). We use this same architecture but on UDCv2 as a benchmark and introduce a new LSTM based architecture called the Bi-Encoder LSTM model (BE) that achieves 0.8%, 1.0% and 0.3% higher accuracy for Recall@1, Recall@2 and Recall@5 respectively than the DE model. In contrast to the DE model, the proposed BE model has separate encodings for utterances and responses. The BE model also has a different similarity measure for utterance and response matching than that of the benchmark model. We further explore the BE model by performing various experiments. We also show results on experiments performed by using several similarity functions, model hyper-parameters and word embeddings on the proposed architecture
    corecore