108 research outputs found

    Kalman-filtering using local interactions

    Full text link
    There is a growing interest in using Kalman-filter models for brain modelling. In turn, it is of considerable importance to represent Kalman-filter in connectionist forms with local Hebbian learning rules. To our best knowledge, Kalman-filter has not been given such local representation. It seems that the main obstacle is the dynamic adaptation of the Kalman-gain. Here, a connectionist representation is presented, which is derived by means of the recursive prediction error method. We show that this method gives rise to attractive local learning rules and can adapt the Kalman-gain

    Undercomplete Blind Subspace Deconvolution

    Full text link
    We introduce the blind subspace deconvolution (BSSD) problem, which is the extension of both the blind source deconvolution (BSD) and the independent subspace analysis (ISA) tasks. We examine the case of the undercomplete BSSD (uBSSD). Applying temporal concatenation we reduce this problem to ISA. The associated `high dimensional' ISA problem can be handled by a recent technique called joint f-decorrelation (JFD). Similar decorrelation methods have been used previously for kernel independent component analysis (kernel-ICA). More precisely, the kernel canonical correlation (KCCA) technique is a member of this family, and, as is shown in this paper, the kernel generalized variance (KGV) method can also be seen as a decorrelation method in the feature space. These kernel based algorithms will be adapted to the ISA task. In the numerical examples, we (i) examine how efficiently the emerging higher dimensional ISA tasks can be tackled, and (ii) explore the working and advantages of the derived kernel-ISA methods.Comment: Final version, appeared in Journal of Machine Learning Researc

    Equivariance Through Parameter-Sharing

    Full text link
    We propose to study equivariance in deep neural networks through parameter symmetries. In particular, given a group G\mathcal{G} that acts discretely on the input and output of a standard neural network layer Ο•W:β„œMβ†’β„œN\phi_{W}: \Re^{M} \to \Re^{N}, we show that Ο•W\phi_{W} is equivariant with respect to G\mathcal{G}-action iff G\mathcal{G} explains the symmetries of the network parameters WW. Inspired by this observation, we then propose two parameter-sharing schemes to induce the desirable symmetry on WW. Our procedures for tying the parameters achieve G\mathcal{G}-equivariance and, under some conditions on the action of G\mathcal{G}, they guarantee sensitivity to all other permutation groups outside G\mathcal{G}.Comment: icml'1

    Boolean Matrix Factorization and Noisy Completion via Message Passing

    Full text link
    Boolean matrix factorization and Boolean matrix completion from noisy observations are desirable unsupervised data-analysis methods due to their interpretability, but hard to perform due to their NP-hardness. We treat these problems as maximum a posteriori inference problems in a graphical model and present a message passing approach that scales linearly with the number of observations and factors. Our empirical study demonstrates that message passing is able to recover low-rank Boolean matrices, in the boundaries of theoretically possible recovery and compares favorably with state-of-the-art in real-world applications, such collaborative filtering with large-scale Boolean data

    High Dimensional Bayesian Optimisation and Bandits via Additive Models

    Full text link
    Bayesian Optimisation (BO) is a technique used in optimising a DD-dimensional function which is typically expensive to evaluate. While there have been many successes for BO in low dimensions, scaling it to high dimensions has been notoriously difficult. Existing literature on the topic are under very restrictive settings. In this paper, we identify two key challenges in this endeavour. We tackle these challenges by assuming an additive structure for the function. This setting is substantially more expressive and contains a richer class of functions than previous work. We prove that, for additive functions the regret has only linear dependence on DD even though the function depends on all DD dimensions. We also demonstrate several other statistical and computational benefits in our framework. Via synthetic examples, a scientific simulation and a face detection problem we demonstrate that our method outperforms naive BO on additive functions and on several examples where the function is not additive.Comment: Proceedings of The 32nd International Conference on Machine Learning 201

    Separation Theorem for Independent Subspace Analysis with Sufficient Conditions

    Full text link
    Here, a separation theorem about Independent Subspace Analysis (ISA), a generalization of Independent Component Analysis (ICA) is proven. According to the theorem, ISA estimation can be executed in two steps under certain conditions. In the first step, 1-dimensional ICA estimation is executed. In the second step, optimal permutation of the ICA elements is searched for. We present sufficient conditions for the ISA Separation Theorem. Namely, we shall show that (i) elliptically symmetric sources, (ii) 2-dimensional sources invariant to 90 degree rotation, among others, satisfy the conditions of the theorem.Comment: 11 pages, 0 figure

    Copula-based Kernel Dependency Measures

    Full text link
    The paper presents a new copula based method for measuring dependence between random variables. Our approach extends the Maximum Mean Discrepancy to the copula of the joint distribution. We prove that this approach has several advantageous properties. Similarly to Shannon mutual information, the proposed dependence measure is invariant to any strictly increasing transformation of the marginal variables. This is important in many applications, for example in feature selection. The estimator is consistent, robust to outliers, and uses rank statistics only. We derive upper bounds on the convergence rate and propose independence tests too. We illustrate the theoretical contributions through a series of experiments in feature selection and low-dimensional embedding of distributions.Comment: ICML201

    The Statistical Recurrent Unit

    Full text link
    Sophisticated gated recurrent neural network architectures like LSTMs and GRUs have been shown to be highly effective in a myriad of applications. We develop an un-gated unit, the statistical recurrent unit (SRU), that is able to learn long term dependencies in data by only keeping moving averages of statistics. The SRU's architecture is simple, un-gated, and contains a comparable number of parameters to LSTMs; yet, SRUs perform favorably to more sophisticated LSTM and GRU alternatives, often outperforming one or both in various tasks. We show the efficacy of SRUs as compared to LSTMs and GRUs in an unbiased manner by optimizing respective architectures' hyperparameters in a Bayesian optimization scheme for both synthetic and real-world tasks

    D-optimal Bayesian Interrogation for Parameter and Noise Identification of Recurrent Neural Networks

    Full text link
    We introduce a novel online Bayesian method for the identification of a family of noisy recurrent neural networks (RNNs). We develop Bayesian active learning technique in order to optimize the interrogating stimuli given past experiences. In particular, we consider the unknown parameters as stochastic variables and use the D-optimality principle, also known as `\emph{infomax method}', to choose optimal stimuli. We apply a greedy technique to maximize the information gain concerning network parameters at each time step. We also derive the D-optimal estimation of the additive noise that perturbs the dynamical system of the RNN. Our analytical results are approximation-free. The analytic derivation gives rise to attractive quadratic update rules

    Multi-fidelity Bayesian Optimisation with Continuous Approximations

    Full text link
    Bandit methods for black-box optimisation, such as Bayesian optimisation, are used in a variety of applications including hyper-parameter tuning and experiment design. Recently, \emph{multi-fidelity} methods have garnered considerable attention since function evaluations have become increasingly expensive in such applications. Multi-fidelity methods use cheap approximations to the function of interest to speed up the overall optimisation process. However, most multi-fidelity methods assume only a finite number of approximations. In many practical applications however, a continuous spectrum of approximations might be available. For instance, when tuning an expensive neural network, one might choose to approximate the cross validation performance using less data NN and/or few training iterations TT. Here, the approximations are best viewed as arising out of a continuous two dimensional space (N,T)(N,T). In this work, we develop a Bayesian optimisation method, BOCA, for this setting. We characterise its theoretical properties and show that it achieves better regret than than strategies which ignore the approximations. BOCA outperforms several other baselines in synthetic and real experiments
    • …
    corecore