5,625 research outputs found

    The Power of Linear Recurrent Neural Networks

    Full text link
    Recurrent neural networks are a powerful means to cope with time series. We show how a type of linearly activated recurrent neural networks, which we call predictive neural networks, can approximate any time-dependent function f(t) given by a number of function values. The approximation can effectively be learned by simply solving a linear equation system; no backpropagation or similar methods are needed. Furthermore, the network size can be reduced by taking only most relevant components. Thus, in contrast to others, our approach not only learns network weights but also the network architecture. The networks have interesting properties: They end up in ellipse trajectories in the long run and allow the prediction of further values and compact representations of functions. We demonstrate this by several experiments, among them multiple superimposed oscillators (MSO), robotic soccer, and predicting stock prices. Predictive neural networks outperform the previous state-of-the-art for the MSO task with a minimal number of units.Comment: 22 pages, 14 figures and tables, revised implementatio

    Nonparametric Weight Initialization of Neural Networks via Integral Representation

    Full text link
    A new initialization method for hidden parameters in a neural network is proposed. Derived from the integral representation of the neural network, a nonparametric probability distribution of hidden parameters is introduced. In this proposal, hidden parameters are initialized by samples drawn from this distribution, and output parameters are fitted by ordinary linear regression. Numerical experiments show that backpropagation with proposed initialization converges faster than uniformly random initialization. Also it is shown that the proposed method achieves enough accuracy by itself without backpropagation in some cases.Comment: For ICLR2014, revised into 9 pages; revised into 12 pages (with supplements

    Layer-wise learning of deep generative models

    Full text link
    When using deep, multi-layered architectures to build generative models of data, it is difficult to train all layers at once. We propose a layer-wise training procedure admitting a performance guarantee compared to the global optimum. It is based on an optimistic proxy of future performance, the best latent marginal. We interpret auto-encoders in this setting as generative models, by showing that they train a lower bound of this criterion. We test the new learning procedure against a state of the art method (stacked RBMs), and find it to improve performance. Both theory and experiments highlight the importance, when training deep architectures, of using an inference model (from data to hidden variables) richer than the generative model (from hidden variables to data)
    corecore