108 research outputs found
Kalman-filtering using local interactions
There is a growing interest in using Kalman-filter models for brain
modelling. In turn, it is of considerable importance to represent Kalman-filter
in connectionist forms with local Hebbian learning rules. To our best
knowledge, Kalman-filter has not been given such local representation. It seems
that the main obstacle is the dynamic adaptation of the Kalman-gain. Here, a
connectionist representation is presented, which is derived by means of the
recursive prediction error method. We show that this method gives rise to
attractive local learning rules and can adapt the Kalman-gain
Undercomplete Blind Subspace Deconvolution
We introduce the blind subspace deconvolution (BSSD) problem, which is the
extension of both the blind source deconvolution (BSD) and the independent
subspace analysis (ISA) tasks. We examine the case of the undercomplete BSSD
(uBSSD). Applying temporal concatenation we reduce this problem to ISA. The
associated `high dimensional' ISA problem can be handled by a recent technique
called joint f-decorrelation (JFD). Similar decorrelation methods have been
used previously for kernel independent component analysis (kernel-ICA). More
precisely, the kernel canonical correlation (KCCA) technique is a member of
this family, and, as is shown in this paper, the kernel generalized variance
(KGV) method can also be seen as a decorrelation method in the feature space.
These kernel based algorithms will be adapted to the ISA task. In the numerical
examples, we (i) examine how efficiently the emerging higher dimensional ISA
tasks can be tackled, and (ii) explore the working and advantages of the
derived kernel-ISA methods.Comment: Final version, appeared in Journal of Machine Learning Researc
Equivariance Through Parameter-Sharing
We propose to study equivariance in deep neural networks through parameter
symmetries. In particular, given a group that acts discretely on
the input and output of a standard neural network layer , we show that is equivariant with respect to
-action iff explains the symmetries of the network
parameters . Inspired by this observation, we then propose two
parameter-sharing schemes to induce the desirable symmetry on . Our
procedures for tying the parameters achieve -equivariance and,
under some conditions on the action of , they guarantee
sensitivity to all other permutation groups outside .Comment: icml'1
Boolean Matrix Factorization and Noisy Completion via Message Passing
Boolean matrix factorization and Boolean matrix completion from noisy
observations are desirable unsupervised data-analysis methods due to their
interpretability, but hard to perform due to their NP-hardness. We treat these
problems as maximum a posteriori inference problems in a graphical model and
present a message passing approach that scales linearly with the number of
observations and factors. Our empirical study demonstrates that message passing
is able to recover low-rank Boolean matrices, in the boundaries of
theoretically possible recovery and compares favorably with state-of-the-art in
real-world applications, such collaborative filtering with large-scale Boolean
data
High Dimensional Bayesian Optimisation and Bandits via Additive Models
Bayesian Optimisation (BO) is a technique used in optimising a
-dimensional function which is typically expensive to evaluate. While there
have been many successes for BO in low dimensions, scaling it to high
dimensions has been notoriously difficult. Existing literature on the topic are
under very restrictive settings. In this paper, we identify two key challenges
in this endeavour. We tackle these challenges by assuming an additive structure
for the function. This setting is substantially more expressive and contains a
richer class of functions than previous work. We prove that, for additive
functions the regret has only linear dependence on even though the function
depends on all dimensions. We also demonstrate several other statistical
and computational benefits in our framework. Via synthetic examples, a
scientific simulation and a face detection problem we demonstrate that our
method outperforms naive BO on additive functions and on several examples where
the function is not additive.Comment: Proceedings of The 32nd International Conference on Machine Learning
201
Separation Theorem for Independent Subspace Analysis with Sufficient Conditions
Here, a separation theorem about Independent Subspace Analysis (ISA), a
generalization of Independent Component Analysis (ICA) is proven. According to
the theorem, ISA estimation can be executed in two steps under certain
conditions. In the first step, 1-dimensional ICA estimation is executed. In the
second step, optimal permutation of the ICA elements is searched for. We
present sufficient conditions for the ISA Separation Theorem. Namely, we shall
show that (i) elliptically symmetric sources, (ii) 2-dimensional sources
invariant to 90 degree rotation, among others, satisfy the conditions of the
theorem.Comment: 11 pages, 0 figure
Copula-based Kernel Dependency Measures
The paper presents a new copula based method for measuring dependence between
random variables. Our approach extends the Maximum Mean Discrepancy to the
copula of the joint distribution. We prove that this approach has several
advantageous properties. Similarly to Shannon mutual information, the proposed
dependence measure is invariant to any strictly increasing transformation of
the marginal variables. This is important in many applications, for example in
feature selection. The estimator is consistent, robust to outliers, and uses
rank statistics only. We derive upper bounds on the convergence rate and
propose independence tests too. We illustrate the theoretical contributions
through a series of experiments in feature selection and low-dimensional
embedding of distributions.Comment: ICML201
The Statistical Recurrent Unit
Sophisticated gated recurrent neural network architectures like LSTMs and
GRUs have been shown to be highly effective in a myriad of applications. We
develop an un-gated unit, the statistical recurrent unit (SRU), that is able to
learn long term dependencies in data by only keeping moving averages of
statistics. The SRU's architecture is simple, un-gated, and contains a
comparable number of parameters to LSTMs; yet, SRUs perform favorably to more
sophisticated LSTM and GRU alternatives, often outperforming one or both in
various tasks. We show the efficacy of SRUs as compared to LSTMs and GRUs in an
unbiased manner by optimizing respective architectures' hyperparameters in a
Bayesian optimization scheme for both synthetic and real-world tasks
D-optimal Bayesian Interrogation for Parameter and Noise Identification of Recurrent Neural Networks
We introduce a novel online Bayesian method for the identification of a
family of noisy recurrent neural networks (RNNs). We develop Bayesian active
learning technique in order to optimize the interrogating stimuli given past
experiences. In particular, we consider the unknown parameters as stochastic
variables and use the D-optimality principle, also known as `\emph{infomax
method}', to choose optimal stimuli. We apply a greedy technique to maximize
the information gain concerning network parameters at each time step. We also
derive the D-optimal estimation of the additive noise that perturbs the
dynamical system of the RNN. Our analytical results are approximation-free. The
analytic derivation gives rise to attractive quadratic update rules
Multi-fidelity Bayesian Optimisation with Continuous Approximations
Bandit methods for black-box optimisation, such as Bayesian optimisation, are
used in a variety of applications including hyper-parameter tuning and
experiment design. Recently, \emph{multi-fidelity} methods have garnered
considerable attention since function evaluations have become increasingly
expensive in such applications. Multi-fidelity methods use cheap approximations
to the function of interest to speed up the overall optimisation process.
However, most multi-fidelity methods assume only a finite number of
approximations. In many practical applications however, a continuous spectrum
of approximations might be available. For instance, when tuning an expensive
neural network, one might choose to approximate the cross validation
performance using less data and/or few training iterations . Here, the
approximations are best viewed as arising out of a continuous two dimensional
space . In this work, we develop a Bayesian optimisation method, BOCA,
for this setting. We characterise its theoretical properties and show that it
achieves better regret than than strategies which ignore the approximations.
BOCA outperforms several other baselines in synthetic and real experiments
- β¦