1,032 research outputs found
Network self-organization explains the statistics and dynamics of synaptic connection strengths in cortex
The information processing abilities of neural circuits arise from their synaptic connection patterns. Understanding the laws governing these connectivity patterns is essential for understanding brain function. The overall distribution of synaptic strengths of local excitatory connections in cortex and hippocampus is long-tailed, exhibiting a small number of synaptic connections of very large efficacy. At the same time, new synaptic connections are constantly being created and individual synaptic connection strengths show substantial fluctuations across time. It remains unclear through what mechanisms these properties of neural circuits arise and how they contribute to learning and memory. In this study we show that fundamental characteristics of excitatory synaptic connections in cortex and hippocampus can be explained as a consequence of self-organization in a recurrent network combining spike-timing-dependent plasticity (STDP), structural plasticity and different forms of homeostatic plasticity. In the network, associative synaptic plasticity in the form of STDP induces a rich-get-richer dynamics among synapses, while homeostatic mechanisms induce competition. Under distinctly different initial conditions, the ensuing self-organization produces long-tailed synaptic strength distributions matching experimental findings. We show that this self-organization can take place with a purely additive STDP mechanism and that multiplicative weight dynamics emerge as a consequence of network interactions. The observed patterns of fluctuation of synaptic strengths, including elimination and generation of synaptic connections and long-term persistence of strong connections, are consistent with the dynamics of dendritic spines found in rat hippocampus. Beyond this, the model predicts an approximately power-law scaling of the lifetimes of newly established synaptic connection strengths during development. Our results suggest that the combined action of multiple forms of neuronal plasticity plays an essential role in the formation and maintenance of cortical circuits
Cover Tree Bayesian Reinforcement Learning
This paper proposes an online tree-based Bayesian approach for reinforcement
learning. For inference, we employ a generalised context tree model. This
defines a distribution on multivariate Gaussian piecewise-linear models, which
can be updated in closed form. The tree structure itself is constructed using
the cover tree method, which remains efficient in high dimensional spaces. We
combine the model with Thompson sampling and approximate dynamic programming to
obtain effective exploration policies in unknown environments. The flexibility
and computational simplicity of the model render it suitable for many
reinforcement learning problems in continuous state spaces. We demonstrate this
in an experimental comparison with least squares policy iteration
Probabilistic inverse reinforcement learning in unknown environments
We consider the problem of learning by demonstration from agents acting in
unknown stochastic Markov environments or games. Our aim is to estimate agent
preferences in order to construct improved policies for the same task that the
agents are trying to solve. To do so, we extend previous probabilistic
approaches for inverse reinforcement learning in known MDPs to the case of
unknown dynamics or opponents. We do this by deriving two simplified
probabilistic models of the demonstrator's policy and utility. For
tractability, we use maximum a posteriori estimation rather than full Bayesian
inference. Under a flat prior, this results in a convex optimisation problem.
We find that the resulting algorithms are highly competitive against a variety
of other methods for inverse reinforcement learning that do have knowledge of
the dynamics.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
Algorithms for Differentially Private Multi-Armed Bandits
We present differentially private algorithms for the stochastic Multi-Armed
Bandit (MAB) problem. This is a problem for applications such as adaptive
clinical trials, experiment design, and user-targeted advertising where private
information is connected to individual rewards. Our major contribution is to
show that there exist differentially private variants of
Upper Confidence Bound algorithms which have optimal regret, . This is a significant improvement over previous results, which only
achieve poly-log regret , because of our use of a
novel interval-based mechanism. We also substantially improve the bounds of
previous family of algorithms which use a continual release mechanism.
Experiments clearly validate our theoretical bounds
Generalised Entropy MDPs and Minimax Regret
Bayesian methods suffer from the problem of how to specify prior beliefs. One
interesting idea is to consider worst-case priors. This requires solving a
stochastic zero-sum game. In this paper, we extend well-known results from
bandit theory in order to discover minimax-Bayes policies and discuss when they
are practical.Comment: 7 pages, NIPS workshop "From bad models to good policies
Phoneme and sentence-level ensembles for speech recognition
We address the question of whether and how boosting and bagging can be used for speech recognition. In order to do this, we compare two different boosting schemes, one at the phoneme level and one at the utterance level, with a phoneme-level bagging scheme. We control for many parameters and other choices, such as the state inference scheme used. In an unbiased experiment, we clearly show that the gain of boosting methods compared to a single hidden Markov model is in all cases only marginal, while bagging significantly outperforms all other methods. We thus conclude that bagging methods, which have so far been overlooked in favour of boosting, should be examined more closely as a potentially useful ensemble learning technique for speech recognition
- …
