8 research outputs found

    Compositional Distributional Semantics with Long Short Term Memory

    Get PDF
    We are proposing an extension of the recursive neural network that makes use of a variant of the long short-term memory architecture. The extension allows information low in parse trees to be stored in a memory register (the `memory cell') and used much later higher up in the parse tree. This provides a solution to the vanishing gradient problem and allows the network to capture long range dependencies. Experimental results show that our composition outperformed the traditional neural-network composition on the Stanford Sentiment Treebank.Comment: 10 pages, 7 figure

    Grammar Variational Autoencoder

    Get PDF
    Deep generative models have been wildly successful at learning coherent latent representations for continuous data such as video and audio. However, generative modeling of discrete data such as arithmetic expressions and molecular structures still poses significant challenges. Crucially, state-of-the-art methods often produce outputs that are not valid. We make the key observation that frequently, discrete data can be represented as a parse tree from a context-free grammar. We propose a variational autoencoder which encodes and decodes directly to and from these parse trees, ensuring the generated outputs are always valid. Surprisingly, we show that not only does our model more often generate valid outputs, it also learns a more coherent latent space in which nearby points decode to similar discrete outputs. We demonstrate the effectiveness of our learned models by showing their improved performance in Bayesian optimization for symbolic regression and molecular synthesis

    A Framework for End-to-End Learning on Semantic Tree-Structured Data

    Get PDF
    While learning models are typically studied for inputs in the form of a fixed dimensional feature vector, real world data is rarely found in this form. In order to meet the basic requirement of traditional learning models, structural data generally have to be converted into fix-length vectors in a handcrafted manner, which is tedious and may even incur information loss. A common form of structured data is what we term "semantic tree-structures", corresponding to data where rich semantic information is encoded in a compositional manner, such as those expressed in JavaScript Object Notation (JSON) and eXtensible Markup Language (XML). For tree-structured data, several learning models have been studied to allow for working directly on raw tree-structure data, However such learning models are limited to either a specific tree-topology or a specific tree-structured data format, e.g., synthetic parse trees. In this paper, we propose a novel framework for end-to-end learning on generic semantic tree-structured data of arbitrary topology and heterogeneous data types, such as data expressed in JSON, XML and so on. Motivated by the works in recursive and recurrent neural networks, we develop exemplar neural implementations of our framework for the JSON format. We evaluate our approach on several UCI benchmark datasets, including ablation and data-efficiency studies, and on a toy reinforcement learning task. Experimental results suggest that our framework yields comparable performance to use of standard models with dedicated feature-vectors in general, and even exceeds baseline performance in cases where compositional nature of the data is particularly important. The source code for a JSON-based implementation of our framework along with experiments can be downloaded at https://github.com/EndingCredits/json2vec.Comment: This is a preliminary version of our work. The project is still ongoing. The source code for a JSON-based implementation of our framework along with experiments can be downloaded at https://github.com/EndingCredits/json2ve

    Leveraging contextual embeddings and self-attention neural networks with bi-attention for sentiment analysis

    Get PDF
    People express their opinions and views in different and often ambiguous ways, hence the meaning of their words is often not explicitly stated and frequently depends on the context. Therefore, it is difficult for machines to process and understand the information conveyed in human languages. This work addresses the problem of sentiment analysis (SA). We propose a simple yet comprehensive method which uses contextual embeddings and a self-attention mechanism to detect and classify sentiment. We perform experiments on reviews from different domains, as well as on languages from three different language families, including morphologically rich Polish and German. We show that our approach is on a par with state-of-the-art models or even outperforms them in several cases. Our work also demonstrates the superiority of models leveraging contextual embeddings. In sum, in this paper we make a step towards building a universal, multilingual sentiment classifier.Peer ReviewedPostprint (published version

    Detection of Dementia-Related Abnormal Behaviour Using Recursive Auto-Encoders.

    Get PDF
    Age-related health issues have been increasing with the rise of life expectancy all over the world. One of these problems is cognitive impairment, which causes elderly people to have problems performing their daily activities. Detection of cognitive impairment at an early stage would enable medical doctors to deepen diagnosis and follow-up on patient status. Recent studies show that daily activities can be used to assess the cognitive status of elderly people. Additionally, the intrinsic structure of activities and the relationships between their sub-activities are important clues for capturing the cognitive abilities of seniors. Existing methods perceive each activity as a stand-alone unit while ignoring their inner structural relationships. This study investigates such relationships by modelling activities hierarchically from their sub-activities, with the overall goal of detecting abnormal activities linked to cognitive impairment. For this purpose, recursive auto-encoders (RAE) and their linear vs. greedy and supervised vs. semi-supervised variants are adopted to model the activities. Then, abnormal activities are systematically detected using RAE's reconstruction error. Moreover, to apply RAEs for this problem, we introduce a new sensor representation called raw sensor measurement (RSM) that captures the intrinsic structure of activities, such as the frequency and the order of sensor activations. As real-world data are not accessible, we generated data by simulating abnormal behaviour, which reflects on cognitive impairment. Extensive experiments show that RAEs can be used as a decision-supporting tool, especially when the training set is not labelled to detect early indicators of dementia

    Automatic inference of causal reasoning chains from student essays

    Get PDF
    While there has been an increasing focus on higher-level thinking skills arising from the Common Core Standards, many high-school and middle-school students struggle to combine and integrate information from multiple sources when writing essays. Writing is an important learning skill, and there is increasing evidence that writing about a topic develops a deeper understanding in the student. However, grading essays is time consuming for teachers, resulting in an increasing focus on shallower forms of assessment that are easier to automate, such as multiple-choice tests. Existing essay grading software has attempted to ease this burden but relies on shallow lexico-syntactic features and is unable to understand the structure or validity of a student’s arguments or explanations. Without the ability to understand a student’s reasoning processes, it is impossible to write automated formative assessment systems to assist students with improving their thinking skills through essay writing. In order to understand the arguments put forth in an explanatory essay in the science domain, we need a method of representing the causal structure of a piece of explanatory text. Psychologists use a representation called a causal model to represent a student\u27s understanding of an explanatory text. This consists of a number of core concepts, and a set of causal relations linking them into one or more causal chains, forming a causal model. In this thesis I present a novel system for automatically constructing causal models from student scientific essays using Natural Language Processing (NLP) techniques. The problem was decomposed into 4 sub-problems - assigning essay concepts to words, detecting causal-relations between these concepts, resolving coreferences within each essay, and using the structure of the whole essay to reconstruct a causal model. Solutions to each of these sub-problems build upon the predictions from the solutions to earlier problems, forming a sequential pipeline of models. Designing a system in this way allows later models to correct for false positive predictions from downstream models. However, this also has the disadvantage that errors made in earlier models can propagate through the system, negatively impacting the upstream models, and limiting their accuracy. Producing robust solutions for the initial 2 sub problems, detecting concepts, and parsing causal relations between them, was critical in building a robust system. A number of sequence labeling models were trained to classify the concepts associated with each word, with the most effective approach being a bidirectional recurrent neural network (RNN), a deep learning model commonly applied to word labeling problems. This is because the RNN used pre-trained word embeddings to better generalize to rarer words, and was able to use information from both ends of each sentence to infer a word\u27s concept. The concepts predicted by this model were then used to develop causal relation parsing models for detecting causal connections between these concepts. A shift-reduce dependency parsing model was trained using the SEARN algorithm and out-performed a number of other approaches by better utilizing the structure of the problem and directly optimizing the error metric used. Two pre-trained coreference resolution systems were used to resolve coreferences within the essays. However a word tagging model trained to predict anaphors combined with a heuristic for determining the antecedent out-performed these two systems. Finally, a model was developed for parsing a causal model from an entire essay, utilizing the solutions to the three previous problems. A beam search algorithm was used to produce multiple parses for each sentence, which in turn were combined to generate multiple candidate causal models for each student essay. A reranking algorithm was then used to select the optimal causal model from all of the generated candidates. An important contribution of this work is that it represents a system for parsing a complete causal model of a scientific essay from a student\u27s written answer. Existing systems have been developed to parse individual causal relations, but no existing system attempts to parse a sequence of linked causal relations forming a causal model from an explanatory scientific essay. It is hoped that this work can lead to the development of more robust essay grading software and formative assessment tools, and can be extended to build solutions for extracting causality from text in other domains. In addition, I also present 2 novel approaches for optimizing the micro-F1 score within the design of two of the algorithms studied: the dependency parser and the reranking algorithm. The dependency parser uses a custom cost function to estimate the impact of parsing mistakes on the overall micro-F1 score, while the reranking algorithm allows the micro-F1 score to be optimized by tuning the beam search parameter to balance recall and precision
    corecore