1,735 research outputs found
A hierarchy of recurrent networks for speech recognition
Generative models for sequential data based on directed graphs of Restricted Boltzmann Machines (RBMs) are able to accurately model high dimensional sequences as recently shown. In these models, temporal dependencies in the input are discovered by either buffering previous visible variables or by recurrent connections of the hidden variables. Here we propose a modification of these models, the Temporal Reservoir Machine (TRM). It utilizes a recurrent artificial neural network (ANN) for integrating information from the input over
time. This information is then fed into a RBM at each time step. To avoid difficulties of recurrent network learning, the ANN remains untrained and hence can be thought of as a random feature extractor. Using the architecture of multi-layer RBMs (Deep Belief Networks), the TRMs can be used as a building block for complex hierarchical models. This approach unifies RBM-based approaches for sequential data modeling and the Echo State Network, a powerful approach for black-box system identification. The TRM is tested on a spoken digits task under noisy conditions, and competitive performances compared to previous models are observed
Morphological Priors for Probabilistic Neural Word Embeddings
Word embeddings allow natural language processing systems to share
statistical information across related words. These embeddings are typically
based on distributional statistics, making it difficult for them to generalize
to rare or unseen words. We propose to improve word embeddings by incorporating
morphological information, capturing shared sub-word features. Unlike previous
work that constructs word embeddings directly from morphemes, we combine
morphological and distributional information in a unified probabilistic
framework, in which the word embedding is a latent variable. The morphological
information provides a prior distribution on the latent word embeddings, which
in turn condition a likelihood function over an observed corpus. This approach
yields improvements on intrinsic word similarity evaluations, and also in the
downstream task of part-of-speech tagging.Comment: Appeared at the Conference on Empirical Methods in Natural Language
Processing (EMNLP 2016, Austin
Efficient Optimization of Echo State Networks for Time Series Datasets
Echo State Networks (ESNs) are recurrent neural networks that only train
their output layer, thereby precluding the need to backpropagate gradients
through time, which leads to significant computational gains. Nevertheless, a
common issue in ESNs is determining its hyperparameters, which are crucial in
instantiating a well performing reservoir, but are often set manually or using
heuristics. In this work we optimize the ESN hyperparameters using Bayesian
optimization which, given a limited budget of function evaluations, outperforms
a grid search strategy. In the context of large volumes of time series data,
such as light curves in the field of astronomy, we can further reduce the
optimization cost of ESNs. In particular, we wish to avoid tuning
hyperparameters per individual time series as this is costly; instead, we want
to find ESNs with hyperparameters that perform well not just on individual time
series but rather on groups of similar time series without sacrificing
predictive performance significantly. This naturally leads to a notion of
clusters, where each cluster is represented by an ESN tuned to model a group of
time series of similar temporal behavior. We demonstrate this approach both on
synthetic datasets and real world light curves from the MACHO survey. We show
that our approach results in a significant reduction in the number of ESN
models required to model a whole dataset, while retaining predictive
performance for the series in each cluster
Architectural designs of Echo State Network
It investigates systematically the reservoir construction of Echo State Network (ESN). This thesis proposes two very simple deterministic ESN organisation (Simple Cycle reservoir (SCR) and Cycle Reservoir with Jumps (CRJ). Simple Cycle reservoir (SCR) is sufficient to obtain performances comparable to those of the classical ESN. While Cycle Reservoir with Jumps (CRJ) significantly outperform the those of the classical ESN.
This thesis also studies and discusses three reservoir characterisations - short-term memory capacity (MC), eigen-spectrum of the reservoir weight matrix and Lyapunov Exponent with their relation to the ESN performance. It also designs and utilises an ensemble of ESNs with diverse reservoirs whose collective readout is obtained through Negative Correlation Learning (NCL) of ensemble of Multi-Layer Perceptrons (MLP), where each individual MPL realises the readout from a single ESN.
Finally, this thesis investigates the relation between two quantitative measures characterising short term memory in input driven dynamical systems, namely the short term memory capacity (MC), and the Fisher memory curve (FMC)
RadiX-Net: Structured Sparse Matrices for Deep Neural Networks
The sizes of deep neural networks (DNNs) are rapidly outgrowing the capacity
of hardware to store and train them. Research over the past few decades has
explored the prospect of sparsifying DNNs before, during, and after training by
pruning edges from the underlying topology. The resulting neural network is
known as a sparse neural network. More recent work has demonstrated the
remarkable result that certain sparse DNNs can train to the same precision as
dense DNNs at lower runtime and storage cost. An intriguing class of these
sparse DNNs is the X-Nets, which are initialized and trained upon a sparse
topology with neither reference to a parent dense DNN nor subsequent pruning.
We present an algorithm that deterministically generates RadiX-Nets: sparse DNN
topologies that, as a whole, are much more diverse than X-Net topologies, while
preserving X-Nets' desired characteristics. We further present a
functional-analytic conjecture based on the longstanding observation that
sparse neural network topologies can attain the same expressive power as dense
counterpartsComment: 7 pages, 8 figures, accepted at IEEE IPDPS 2019 GrAPL workshop. arXiv
admin note: substantial text overlap with arXiv:1809.0524
Concentric ESN: Assessing the Effect of Modularity in Cycle Reservoirs
The paper introduces concentric Echo State Network, an approach to design
reservoir topologies that tries to bridge the gap between deterministically
constructed simple cycle models and deep reservoir computing approaches. We
show how to modularize the reservoir into simple unidirectional and concentric
cycles with pairwise bidirectional jump connections between adjacent loops. We
provide a preliminary experimental assessment showing how concentric reservoirs
yield to superior predictive accuracy and memory capacity with respect to
single cycle reservoirs and deep reservoir models
- …