1,275 research outputs found

    ADADELTA: An Adaptive Learning Rate Method

    Full text link
    We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning of a learning rate and appears robust to noisy gradient information, different model architecture choices, various data modalities and selection of hyperparameters. We show promising results compared to other methods on the MNIST digit classification task using a single machine and on a large scale voice dataset in a distributed cluster environment.Comment: 6 page

    Unsupervised Generative Modeling Using Matrix Product States

    Full text link
    Generative modeling, which learns joint probability distribution from data and generates samples according to it, is an important task in machine learning and artificial intelligence. Inspired by probabilistic interpretation of quantum physics, we propose a generative model using matrix product states, which is a tensor network originally proposed for describing (particularly one-dimensional) entangled quantum states. Our model enjoys efficient learning analogous to the density matrix renormalization group method, which allows dynamically adjusting dimensions of the tensors and offers an efficient direct sampling approach for generative tasks. We apply our method to generative modeling of several standard datasets including the Bars and Stripes, random binary patterns and the MNIST handwritten digits to illustrate the abilities, features and drawbacks of our model over popular generative models such as Hopfield model, Boltzmann machines and generative adversarial networks. Our work sheds light on many interesting directions of future exploration on the development of quantum-inspired algorithms for unsupervised machine learning, which are promisingly possible to be realized on quantum devices.Comment: 11 pages, 12 figures (not including the TNs) GitHub Page: https://congzlwag.github.io/UnsupGenModbyMPS

    Modal Learning Neural Networks

    Get PDF
    This paper will explore the integration of learning modes into a single neural network structure in which layers of neurons or individual neurons adopt different modes. There are several reasons to explore modal learning. One motivation is to overcome the inherent limitations of any given mode (for example some modes memorise specific features, others average across features, and both approaches may be relevant according to the circumstances); another is inspiration from neuroscience, cognitive science and human learning, where it is impossible to build a serious model without consideration of multiple modes; and a third reason is non-stationary input data, or time-variant learning objectives, where the required mode is a function of time. Two modal learning ideas are presented: The Snap-Drift Neural Network (SDNN) which toggles its learning between two modes, is incorporated into an on-line system to provide carefully targeted guidance and feedback to students; and an adaptive function neural network (ADFUNN), in which adaptation applies simultaneously to both the weights and the individual neuron activation functions. The combination of the two modal learning methods, in the form of Snap-drift ADaptive FUnction Neural Network (SADFUNN) is then applied to optical and pen-based recognition of handwritten digits with results that demonstrate the effectiveness of the approach

    VIGAN: Missing View Imputation with Generative Adversarial Networks

    Full text link
    In an era when big data are becoming the norm, there is less concern with the quantity but more with the quality and completeness of the data. In many disciplines, data are collected from heterogeneous sources, resulting in multi-view or multi-modal datasets. The missing data problem has been challenging to address in multi-view data analysis. Especially, when certain samples miss an entire view of data, it creates the missing view problem. Classic multiple imputations or matrix completion methods are hardly effective here when no information can be based on in the specific view to impute data for such samples. The commonly-used simple method of removing samples with a missing view can dramatically reduce sample size, thus diminishing the statistical power of a subsequent analysis. In this paper, we propose a novel approach for view imputation via generative adversarial networks (GANs), which we name by VIGAN. This approach first treats each view as a separate domain and identifies domain-to-domain mappings via a GAN using randomly-sampled data from each view, and then employs a multi-modal denoising autoencoder (DAE) to reconstruct the missing view from the GAN outputs based on paired data across the views. Then, by optimizing the GAN and DAE jointly, our model enables the knowledge integration for domain mappings and view correspondences to effectively recover the missing view. Empirical results on benchmark datasets validate the VIGAN approach by comparing against the state of the art. The evaluation of VIGAN in a genetic study of substance use disorders further proves the effectiveness and usability of this approach in life science.Comment: 10 pages, 8 figures, conferenc
    • …
    corecore