753 research outputs found

    Learning Unitary Operators with Help From u(n)

    Full text link
    A major challenge in the training of recurrent neural networks is the so-called vanishing or exploding gradient problem. The use of a norm-preserving transition operator can address this issue, but parametrization is challenging. In this work we focus on unitary operators and describe a parametrization using the Lie algebra u(n)\mathfrak{u}(n) associated with the Lie group U(n)U(n) of n×nn \times n unitary matrices. The exponential map provides a correspondence between these spaces, and allows us to define a unitary matrix using n2n^2 real coefficients relative to a basis of the Lie algebra. The parametrization is closed under additive updates of these coefficients, and thus provides a simple space in which to do gradient descent. We demonstrate the effectiveness of this parametrization on the problem of learning arbitrary unitary operators, comparing to several baselines and outperforming a recently-proposed lower-dimensional parametrization. We additionally use our parametrization to generalize a recently-proposed unitary recurrent neural network to arbitrary unitary matrices, using it to solve standard long-memory tasks.Comment: 9 pages, 3 figures, 5 figures inc. subfigures, to appear at AAAI-1

    A Generative Model of Words and Relationships from Multiple Sources

    Full text link
    Neural language models are a powerful tool to embed words into semantic vector spaces. However, learning such models generally relies on the availability of abundant and diverse training examples. In highly specialised domains this requirement may not be met due to difficulties in obtaining a large corpus, or the limited range of expression in average use. Such domains may encode prior knowledge about entities in a knowledge base or ontology. We propose a generative model which integrates evidence from diverse data sources, enabling the sharing of semantic information. We achieve this by generalising the concept of co-occurrence from distributional semantics to include other relationships between entities or words, which we model as affine transformations on the embedding space. We demonstrate the effectiveness of this approach by outperforming recent models on a link prediction task and demonstrating its ability to profit from partially or fully unobserved data training labels. We further demonstrate the usefulness of learning from different data sources with overlapping vocabularies.Comment: 8 pages, 5 figures; incorporated feedback from reviewers; to appear in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence 201

    Boosting Variational Inference: an Optimization Perspective

    Full text link
    Variational inference is a popular technique to approximate a possibly intractable Bayesian posterior with a more tractable one. Recently, boosting variational inference has been proposed as a new paradigm to approximate the posterior by a mixture of densities by greedily adding components to the mixture. However, as is the case with many other variational inference algorithms, its theoretical properties have not been studied. In the present work, we study the convergence properties of this approach from a modern optimization viewpoint by establishing connections to the classic Frank-Wolfe algorithm. Our analyses yields novel theoretical insights regarding the sufficient conditions for convergence, explicit rates, and algorithmic simplifications. Since a lot of focus in previous works for variational inference has been on tractability, our work is especially important as a much needed attempt to bridge the gap between probabilistic models and their corresponding theoretical properties

    Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees

    Full text link
    Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe (FW) algorithms regained popularity in recent years due to their simplicity, effectiveness and theoretical guarantees. MP and FW address optimization over the linear span and the convex hull of a set of atoms, respectively. In this paper, we consider the intermediate case of optimization over the convex cone, parametrized as the conic hull of a generic atom set, leading to the first principled definitions of non-negative MP algorithms for which we give explicit convergence rates and demonstrate excellent empirical performance. In particular, we derive sublinear (O(1/t)\mathcal{O}(1/t)) convergence on general smooth and convex objectives, and linear convergence (O(e−t)\mathcal{O}(e^{-t})) on strongly convex objectives, in both cases for general sets of atoms. Furthermore, we establish a clear correspondence of our algorithms to known algorithms from the MP and FW literature. Our novel algorithms and analyses target general atom sets and general objective functions, and hence are directly applicable to a large variety of learning settings.Comment: NIPS 201

    Wavelet frame accelerated reduced vector machine for efficient image analysis

    Get PDF
    We propose a new approach for face and facial feature detection combined with the advantages of the Morphable Model. The presented method reduces the runtime complexity of a Support Vector Machine classifier and the new training algorithm is fast and simple. This is achieved by an Over-Complete Wavelet Transform that finds optimally sparse approximations of the Support Set Vectors. The wavelet-based approach provides an upper bound on the distance between the decision function of the Support Vector Machine and our classifier. The obtained classifier is fast since the used Haar wavelet approximations of the Support Set Vectors allow efficient Integral Image-based kernel evaluations. This provides a set of double-cascaded classifiers of increasing accuracy for an early rejection. The algorithm yields an excellent runtime performance that is achieved by hierarchically discriminating with respect to the number and approximation accuracy of incorporated Reduced Set Vectors. The proposed algorithm is applied to the problem of face and facial feature detection, but it can also be used for other image-based classifications. The algorithm presented, provides a 530-fold speed-up over the Support Vector Machine, enabling face detection at more than 25 fps on a standard PC. Summarizing, we propose very fast and efficient to train classifiers that improve the detection performance by involving the advantages of the Morphable Model. On one hand to improve the fitting algorithm of the Morphable Model by automatic anchor point detection and on the other hand to use the Morphable Model for improving the training by synthetic data sets and to reduced the False Acceptance Rate

    SOM-VAE: Interpretable Discrete Representation Learning on Time Series

    Full text link
    High-dimensional time series are common in many domains. Since human cognition is not optimized to work well in high-dimensional spaces, these areas could benefit from interpretable low-dimensional representations. However, most representation learning algorithms for time series data are difficult to interpret. This is due to non-intuitive mappings from data features to salient properties of the representation and non-smoothness over time. To address this problem, we propose a new representation learning framework building on ideas from interpretable discrete dimensionality reduction and deep generative modeling. This framework allows us to learn discrete representations of time series, which give rise to smooth and interpretable embeddings with superior clustering performance. We introduce a new way to overcome the non-differentiability in discrete representation learning and present a gradient-based version of the traditional self-organizing map algorithm that is more performant than the original. Furthermore, to allow for a probabilistic interpretation of our method, we integrate a Markov model in the representation space. This model uncovers the temporal transition structure, improves clustering performance even further and provides additional explanatory insights as well as a natural representation of uncertainty. We evaluate our model in terms of clustering performance and interpretability on static (Fashion-)MNIST data, a time series of linearly interpolated (Fashion-)MNIST images, a chaotic Lorenz attractor system with two macro states, as well as on a challenging real world medical time series application on the eICU data set. Our learned representations compare favorably with competitor methods and facilitate downstream tasks on the real world data.Comment: Accepted for publication at the Seventh International Conference on Learning Representations (ICLR 2019
    • …
    corecore