52,660 research outputs found
Higher Order Recurrent Neural Networks
In this paper, we study novel neural network structures to better model long
term dependency in sequential data. We propose to use more memory units to keep
track of more preceding states in recurrent neural networks (RNNs), which are
all recurrently fed to the hidden layers as feedback through different weighted
paths. By extending the popular recurrent structure in RNNs, we provide the
models with better short-term memory mechanism to learn long term dependency in
sequences. Analogous to digital filters in signal processing, we call these
structures as higher order RNNs (HORNNs). Similar to RNNs, HORNNs can also be
learned using the back-propagation through time method. HORNNs are generally
applicable to a variety of sequence modelling tasks. In this work, we have
examined HORNNs for the language modeling task using two popular data sets,
namely the Penn Treebank (PTB) and English text8 data sets. Experimental
results have shown that the proposed HORNNs yield the state-of-the-art
performance on both data sets, significantly outperforming the regular RNNs as
well as the popular LSTMs.Comment: 9 page
Generating equilibrium molecules with deep neural networks
Discovery of atomistic systems with desirable properties is a major challenge
in chemistry and material science. Here we introduce a novel, autoregressive,
convolutional deep neural network architecture that generates molecular
equilibrium structures by sequentially placing atoms in three-dimensional
space. The model estimates the joint probability over molecular configurations
with tractable conditional probabilities which only depend on distances between
atoms and their nuclear charges. It combines concepts from state-of-the-art
atomistic neural networks with auto-regressive generative models for images and
speech. We demonstrate that the architecture is capable of generating molecules
close to equilibrium for constitutional isomers of COH
A model of cortical cognitive function using hierarchical interactions of gating matrices in internal agents coding relational representations
Flexible cognition requires the ability to rapidly detect systematic
functions of variables and guide future behavior based on predictions. The
model described here proposes a potential framework for patterns of neural
activity to detect systematic functions and relations between components of
sensory input and apply them in a predictive manner. This model includes
multiple internal gating agents that operate within the state space of neural
activity, in analogy to external agents behaving in the external environment.
The multiple internal gating agents represent patterns of neural activity that
detect and gate patterns of matrix connectivity representing the relations
between different neural populations. The patterns of gating matrix
connectivity represent functions that can be used to predict future components
of a series of sensory inputs or the relationship between different features of
a static sensory stimulus. The model is applied to the prediction of dynamical
trajectories, the internal relationship between features of different sensory
stimuli and to the prediction of affine transformations that could be useful
for solving cognitive tasks such as the Ravens progressive matrices task.Comment: 6 figures, version 2 simplifies notation and changes notation from
row vector to column vector for clarity in the equations, and fixes some
typo
StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing
Semantic parsing is the task of transducing natural language (NL) utterances
into formal meaning representations (MRs), commonly represented as tree
structures. Annotating NL utterances with their corresponding MRs is expensive
and time-consuming, and thus the limited availability of labeled data often
becomes the bottleneck of data-driven, supervised models. We introduce
StructVAE, a variational auto-encoding model for semisupervised semantic
parsing, which learns both from limited amounts of parallel data, and
readily-available unlabeled NL utterances. StructVAE models latent MRs not
observed in the unlabeled data as tree-structured latent variables. Experiments
on semantic parsing on the ATIS domain and Python code generation show that
with extra unlabeled data, StructVAE outperforms strong supervised models.Comment: ACL 201
Automatic Configuration of Deep Neural Networks with EGO
Designing the architecture for an artificial neural network is a cumbersome
task because of the numerous parameters to configure, including activation
functions, layer types, and hyper-parameters. With the large number of
parameters for most networks nowadays, it is intractable to find a good
configuration for a given task by hand. In this paper an Efficient Global
Optimization (EGO) algorithm is adapted to automatically optimize and configure
convolutional neural network architectures. A configurable neural network
architecture based solely on convolutional layers is proposed for the
optimization. Without using any knowledge on the target problem and not using
any data augmentation techniques, it is shown that on several image
classification tasks this approach is able to find competitive network
architectures in terms of prediction accuracy, compared to the best
hand-crafted ones in literature. In addition, a very small training budget (200
evaluations and 10 epochs in training) is spent on each optimized architectures
in contrast to the usual long training time of hand-crafted networks. Moreover,
instead of the standard sequential evaluation in EGO, several candidate
architectures are proposed and evaluated in parallel, which saves the execution
overheads significantly and leads to an efficient automation for deep neural
network design
A Data-Driven CO2 Leakage Detection Using Seismic Data and Spatial-Temporal Densely Connected Convolutional Neural Networks
In carbon capture and sequestration, developing effective monitoring methods
is needed to detect and respond to CO2 leakage. CO2 leakage detection methods
rely on geophysical observations and monitoring sensor network. However,
traditional methods usually require the development of site-specific physical
models and expert interpretation, and the effectiveness of these methods can be
limited to different application locations, operational scenarios, and
conditions. In this paper, we developed a novel data-driven leakage detection
method based on densely connected convolutional neural networks. Our method
differs from conventional leakage monitoring methods by directly learning a
mapping relationship between seismic data and the CO2 leakage mass. To account
for the spatial and temporal characteristics of seismic data, our novel
networks architecture combines 1D and 2D convolutional neural networks. To
overcome the computational expense of solving optimization problems, we apply a
densely-connecting strategy in our network architecture that reduces the number
of network parameters. Based on the features generated by our convolutional
neural networks, we further incorporate a long short-term memory network to
utilize time-sequential information, which further improves the detection
accuracy. Finally, we employ our detection method to synthetic seismic datasets
generated based on flow simulations of a hypothetical CO2 storage scenario with
injection into a partially compartmentalized sandstone storage reservoir. To
evaluate method performance, we conducted multiple experiments including a
random leakage test, a sequential test, and a robustness test. Numerical
results show that our CO2 leakage detection method successfully detects the
leakage and accurately predicts the leakage mass, suggesting that it has the
potential for application in monitoring of real CO2 storage sites
Deep Learning on Graphs: A Survey
Deep learning has been shown to be successful in a number of domains, ranging
from acoustics, images, to natural language processing. However, applying deep
learning to the ubiquitous graph data is non-trivial because of the unique
characteristics of graphs. Recently, substantial research efforts have been
devoted to applying deep learning methods to graphs, resulting in beneficial
advances in graph analysis techniques. In this survey, we comprehensively
review the different types of deep learning methods on graphs. We divide the
existing methods into five categories based on their model architectures and
training strategies: graph recurrent neural networks, graph convolutional
networks, graph autoencoders, graph reinforcement learning, and graph
adversarial methods. We then provide a comprehensive overview of these methods
in a systematic manner mainly by following their development history. We also
analyze the differences and compositions of different methods. Finally, we
briefly outline the applications in which they have been used and discuss
potential future research directions.Comment: Accepted by Transactions on Knowledge and Data Engineering. 24 pages,
11 figure
A Recurrent Latent Variable Model for Sequential Data
In this paper, we explore the inclusion of latent random variables into the
dynamic hidden state of a recurrent neural network (RNN) by combining elements
of the variational autoencoder. We argue that through the use of high-level
latent random variables, the variational RNN (VRNN)1 can model the kind of
variability observed in highly structured sequential data such as natural
speech. We empirically evaluate the proposed model against related sequential
models on four speech datasets and one handwriting dataset. Our results show
the important roles that latent random variables can play in the RNN dynamic
hidden state
Design optimization of structures including repetitive patterns
It is becoming a common practice to use surrogate models instead of finite element (FE) models in most of the structural optimization problems. The main advantage of these surrogate models is to reduce computation time as well as to make design optimization of complex structures possible. For surrogate modeling, firstly input-target pairs (training set) are required which are obtained by running the FE model for varying values of the design parameter set. Then the relationship between these pairs is defined via curve fitting where the created curve is named as a surrogate model. Once the surrogate model is found, it replaces the FE model in the optimization problem. Finally the optimization is performed using suitably chosen algorithm(s). Since solving an FE model may take very long time for certain applications, gathering the training set is usually the most time consuming part in the overall optimization process. Therefore, in this research the merits of the Component Mode Synthesis (CMS) method are utilized to gather this set for structures including repetitive patterns (e.g. fan inlet case). The reduced FE model of only one repeating pattern is created using CMS and the obtained information is shared with the rest of the repeating patterns. Therefore, the model of an entire structure is obtained without modeling all the repetitive patterns. In the developed design optimization strategy Backpropagation Neural Networks are used for surrogate modeling. The optimization is performed using two techniques. Genetic Algorithms (GAs) are utilized to increase the chance of finding the location of the global optimum. Since the optimum attained by GAs may not be exact, Sequential Quadratic Programming is employed afterwards to improve the solution. An academic test problem is used to demonstrate the strategy
Constrained Bayesian Optimization for Automatic Chemical Design
Automatic Chemical Design is a framework for generating novel molecules with
optimized properties. The original scheme, featuring Bayesian optimization over
the latent space of a variational autoencoder, suffers from the pathology that
it tends to produce invalid molecular structures. First, we demonstrate
empirically that this pathology arises when the Bayesian optimization scheme
queries latent points far away from the data on which the variational
autoencoder has been trained. Secondly, by reformulating the search procedure
as a constrained Bayesian optimization problem, we show that the effects of
this pathology can be mitigated, yielding marked improvements in the validity
of the generated molecules. We posit that constrained Bayesian optimization is
a good approach for solving this class of training set mismatch in many
generative tasks involving Bayesian optimization over the latent space of a
variational autoencoder.Comment: Previous versions accepted to the NIPS 2017 Workshop on Bayesian
Optimization (BayesOpt 2017) and the NIPS 2017 Workshop on Machine Learning
for Molecules and Material
- …