52,660 research outputs found

    Higher Order Recurrent Neural Networks

    Full text link
    In this paper, we study novel neural network structures to better model long term dependency in sequential data. We propose to use more memory units to keep track of more preceding states in recurrent neural networks (RNNs), which are all recurrently fed to the hidden layers as feedback through different weighted paths. By extending the popular recurrent structure in RNNs, we provide the models with better short-term memory mechanism to learn long term dependency in sequences. Analogous to digital filters in signal processing, we call these structures as higher order RNNs (HORNNs). Similar to RNNs, HORNNs can also be learned using the back-propagation through time method. HORNNs are generally applicable to a variety of sequence modelling tasks. In this work, we have examined HORNNs for the language modeling task using two popular data sets, namely the Penn Treebank (PTB) and English text8 data sets. Experimental results have shown that the proposed HORNNs yield the state-of-the-art performance on both data sets, significantly outperforming the regular RNNs as well as the popular LSTMs.Comment: 9 page

    Generating equilibrium molecules with deep neural networks

    Full text link
    Discovery of atomistic systems with desirable properties is a major challenge in chemistry and material science. Here we introduce a novel, autoregressive, convolutional deep neural network architecture that generates molecular equilibrium structures by sequentially placing atoms in three-dimensional space. The model estimates the joint probability over molecular configurations with tractable conditional probabilities which only depend on distances between atoms and their nuclear charges. It combines concepts from state-of-the-art atomistic neural networks with auto-regressive generative models for images and speech. We demonstrate that the architecture is capable of generating molecules close to equilibrium for constitutional isomers of C7_7O2_2H10_{10}

    A model of cortical cognitive function using hierarchical interactions of gating matrices in internal agents coding relational representations

    Full text link
    Flexible cognition requires the ability to rapidly detect systematic functions of variables and guide future behavior based on predictions. The model described here proposes a potential framework for patterns of neural activity to detect systematic functions and relations between components of sensory input and apply them in a predictive manner. This model includes multiple internal gating agents that operate within the state space of neural activity, in analogy to external agents behaving in the external environment. The multiple internal gating agents represent patterns of neural activity that detect and gate patterns of matrix connectivity representing the relations between different neural populations. The patterns of gating matrix connectivity represent functions that can be used to predict future components of a series of sensory inputs or the relationship between different features of a static sensory stimulus. The model is applied to the prediction of dynamical trajectories, the internal relationship between features of different sensory stimuli and to the prediction of affine transformations that could be useful for solving cognitive tasks such as the Ravens progressive matrices task.Comment: 6 figures, version 2 simplifies notation and changes notation from row vector to column vector for clarity in the equations, and fixes some typo

    StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing

    Full text link
    Semantic parsing is the task of transducing natural language (NL) utterances into formal meaning representations (MRs), commonly represented as tree structures. Annotating NL utterances with their corresponding MRs is expensive and time-consuming, and thus the limited availability of labeled data often becomes the bottleneck of data-driven, supervised models. We introduce StructVAE, a variational auto-encoding model for semisupervised semantic parsing, which learns both from limited amounts of parallel data, and readily-available unlabeled NL utterances. StructVAE models latent MRs not observed in the unlabeled data as tree-structured latent variables. Experiments on semantic parsing on the ATIS domain and Python code generation show that with extra unlabeled data, StructVAE outperforms strong supervised models.Comment: ACL 201

    Automatic Configuration of Deep Neural Networks with EGO

    Full text link
    Designing the architecture for an artificial neural network is a cumbersome task because of the numerous parameters to configure, including activation functions, layer types, and hyper-parameters. With the large number of parameters for most networks nowadays, it is intractable to find a good configuration for a given task by hand. In this paper an Efficient Global Optimization (EGO) algorithm is adapted to automatically optimize and configure convolutional neural network architectures. A configurable neural network architecture based solely on convolutional layers is proposed for the optimization. Without using any knowledge on the target problem and not using any data augmentation techniques, it is shown that on several image classification tasks this approach is able to find competitive network architectures in terms of prediction accuracy, compared to the best hand-crafted ones in literature. In addition, a very small training budget (200 evaluations and 10 epochs in training) is spent on each optimized architectures in contrast to the usual long training time of hand-crafted networks. Moreover, instead of the standard sequential evaluation in EGO, several candidate architectures are proposed and evaluated in parallel, which saves the execution overheads significantly and leads to an efficient automation for deep neural network design

    A Data-Driven CO2 Leakage Detection Using Seismic Data and Spatial-Temporal Densely Connected Convolutional Neural Networks

    Full text link
    In carbon capture and sequestration, developing effective monitoring methods is needed to detect and respond to CO2 leakage. CO2 leakage detection methods rely on geophysical observations and monitoring sensor network. However, traditional methods usually require the development of site-specific physical models and expert interpretation, and the effectiveness of these methods can be limited to different application locations, operational scenarios, and conditions. In this paper, we developed a novel data-driven leakage detection method based on densely connected convolutional neural networks. Our method differs from conventional leakage monitoring methods by directly learning a mapping relationship between seismic data and the CO2 leakage mass. To account for the spatial and temporal characteristics of seismic data, our novel networks architecture combines 1D and 2D convolutional neural networks. To overcome the computational expense of solving optimization problems, we apply a densely-connecting strategy in our network architecture that reduces the number of network parameters. Based on the features generated by our convolutional neural networks, we further incorporate a long short-term memory network to utilize time-sequential information, which further improves the detection accuracy. Finally, we employ our detection method to synthetic seismic datasets generated based on flow simulations of a hypothetical CO2 storage scenario with injection into a partially compartmentalized sandstone storage reservoir. To evaluate method performance, we conducted multiple experiments including a random leakage test, a sequential test, and a robustness test. Numerical results show that our CO2 leakage detection method successfully detects the leakage and accurately predicts the leakage mass, suggesting that it has the potential for application in monitoring of real CO2 storage sites

    Deep Learning on Graphs: A Survey

    Full text link
    Deep learning has been shown to be successful in a number of domains, ranging from acoustics, images, to natural language processing. However, applying deep learning to the ubiquitous graph data is non-trivial because of the unique characteristics of graphs. Recently, substantial research efforts have been devoted to applying deep learning methods to graphs, resulting in beneficial advances in graph analysis techniques. In this survey, we comprehensively review the different types of deep learning methods on graphs. We divide the existing methods into five categories based on their model architectures and training strategies: graph recurrent neural networks, graph convolutional networks, graph autoencoders, graph reinforcement learning, and graph adversarial methods. We then provide a comprehensive overview of these methods in a systematic manner mainly by following their development history. We also analyze the differences and compositions of different methods. Finally, we briefly outline the applications in which they have been used and discuss potential future research directions.Comment: Accepted by Transactions on Knowledge and Data Engineering. 24 pages, 11 figure

    A Recurrent Latent Variable Model for Sequential Data

    Full text link
    In this paper, we explore the inclusion of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder. We argue that through the use of high-level latent random variables, the variational RNN (VRNN)1 can model the kind of variability observed in highly structured sequential data such as natural speech. We empirically evaluate the proposed model against related sequential models on four speech datasets and one handwriting dataset. Our results show the important roles that latent random variables can play in the RNN dynamic hidden state

    Design optimization of structures including repetitive patterns

    Get PDF
    It is becoming a common practice to use surrogate models instead of finite element (FE) models in most of the structural optimization problems. The main advantage of these surrogate models is to reduce computation time as well as to make design optimization of complex structures possible. For surrogate modeling, firstly input-target pairs (training set) are required which are obtained by running the FE model for varying values of the design parameter set. Then the relationship between these pairs is defined via curve fitting where the created curve is named as a surrogate model. Once the surrogate model is found, it replaces the FE model in the optimization problem. Finally the optimization is performed using suitably chosen algorithm(s). Since solving an FE model may take very long time for certain applications, gathering the training set is usually the most time consuming part in the overall optimization process. Therefore, in this research the merits of the Component Mode Synthesis (CMS) method are utilized to gather this set for structures including repetitive patterns (e.g. fan inlet case). The reduced FE model of only one repeating pattern is created using CMS and the obtained information is shared with the rest of the repeating patterns. Therefore, the model of an entire structure is obtained without modeling all the repetitive patterns. In the developed design optimization strategy Backpropagation Neural Networks are used for surrogate modeling. The optimization is performed using two techniques. Genetic Algorithms (GAs) are utilized to increase the chance of finding the location of the global optimum. Since the optimum attained by GAs may not be exact, Sequential Quadratic Programming is employed afterwards to improve the solution. An academic test problem is used to demonstrate the strategy

    Constrained Bayesian Optimization for Automatic Chemical Design

    Full text link
    Automatic Chemical Design is a framework for generating novel molecules with optimized properties. The original scheme, featuring Bayesian optimization over the latent space of a variational autoencoder, suffers from the pathology that it tends to produce invalid molecular structures. First, we demonstrate empirically that this pathology arises when the Bayesian optimization scheme queries latent points far away from the data on which the variational autoencoder has been trained. Secondly, by reformulating the search procedure as a constrained Bayesian optimization problem, we show that the effects of this pathology can be mitigated, yielding marked improvements in the validity of the generated molecules. We posit that constrained Bayesian optimization is a good approach for solving this class of training set mismatch in many generative tasks involving Bayesian optimization over the latent space of a variational autoencoder.Comment: Previous versions accepted to the NIPS 2017 Workshop on Bayesian Optimization (BayesOpt 2017) and the NIPS 2017 Workshop on Machine Learning for Molecules and Material
    • …
    corecore