20,904 research outputs found
CaMKII activation supports reward-based neural network optimization through Hamiltonian sampling
Synaptic plasticity is implemented and controlled through over thousand
different types of molecules in the postsynaptic density and presynaptic
boutons that assume a staggering array of different states through
phosporylation and other mechanisms. One of the most prominent molecule in the
postsynaptic density is CaMKII, that is described in molecular biology as a
"memory molecule" that can integrate through auto-phosporylation Ca-influx
signals on a relatively large time scale of dozens of seconds. The functional
impact of this memory mechanism is largely unknown. We show that the
experimental data on the specific role of CaMKII activation in dopamine-gated
spine consolidation suggest a general functional role in speeding up
reward-guided search for network configurations that maximize reward
expectation. Our theoretical analysis shows that stochastic search could in
principle even attain optimal network configurations by emulating one of the
most well-known nonlinear optimization methods, simulated annealing. But this
optimization is usually impeded by slowness of stochastic search at a given
temperature. We propose that CaMKII contributes a momentum term that
substantially speeds up this search. In particular, it allows the network to
overcome saddle points of the fitness function. The resulting improved
stochastic policy search can be understood on a more abstract level as
Hamiltonian sampling, which is known to be one of the most efficient stochastic
search methods.Comment: 27 pages, 5 figure
Scalable Bayesian Optimization Using Deep Neural Networks
Bayesian optimization is an effective methodology for the global optimization
of functions with expensive evaluations. It relies on querying a distribution
over functions defined by a relatively cheap surrogate model. An accurate model
for this distribution over functions is critical to the effectiveness of the
approach, and is typically fit using Gaussian processes (GPs). However, since
GPs scale cubically with the number of observations, it has been challenging to
handle objectives whose optimization requires many evaluations, and as such,
massively parallelizing the optimization.
In this work, we explore the use of neural networks as an alternative to GPs
to model distributions over functions. We show that performing adaptive basis
function regression with a neural network as the parametric form performs
competitively with state-of-the-art GP-based approaches, but scales linearly
with the number of data rather than cubically. This allows us to achieve a
previously intractable degree of parallelism, which we apply to large scale
hyperparameter optimization, rapidly finding competitive models on benchmark
object recognition tasks using convolutional networks, and image caption
generation using neural language models
Trajectory-Based Off-Policy Deep Reinforcement Learning
Policy gradient methods are powerful reinforcement learning algorithms and
have been demonstrated to solve many complex tasks. However, these methods are
also data-inefficient, afflicted with high variance gradient estimates, and
frequently get stuck in local optima. This work addresses these weaknesses by
combining recent improvements in the reuse of off-policy data and exploration
in parameter space with deterministic behavioral policies. The resulting
objective is amenable to standard neural network optimization strategies like
stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo.
Incorporation of previous rollouts via importance sampling greatly improves
data-efficiency, whilst stochastic optimization schemes facilitate the escape
from local optima. We evaluate the proposed approach on a series of continuous
control benchmark tasks. The results show that the proposed algorithm is able
to successfully and reliably learn solutions using fewer system interactions
than standard policy gradient methods.Comment: Includes appendix. Accepted for ICML 201
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
Reinforcement learning can acquire complex behaviors from high-level
specifications. However, defining a cost function that can be optimized
effectively and encodes the correct task is challenging in practice. We explore
how inverse optimal control (IOC) can be used to learn behaviors from
demonstrations, with applications to torque control of high-dimensional robotic
systems. Our method addresses two key challenges in inverse optimal control:
first, the need for informative features and effective regularization to impose
structure on the cost, and second, the difficulty of learning the cost function
under unknown dynamics for high-dimensional continuous systems. To address the
former challenge, we present an algorithm capable of learning arbitrary
nonlinear cost functions, such as neural networks, without meticulous feature
engineering. To address the latter challenge, we formulate an efficient
sample-based approximation for MaxEnt IOC. We evaluate our method on a series
of simulated tasks and real-world robotic manipulation problems, demonstrating
substantial improvement over prior methods both in terms of task complexity and
sample efficiency.Comment: International Conference on Machine Learning (ICML), 2016, to appea
Reconciling meta-learning and continual learning with online mixtures of tasks
Learning-to-learn or meta-learning leverages data-driven inductive bias to
increase the efficiency of learning on a novel task. This approach encounters
difficulty when transfer is not advantageous, for instance, when tasks are
considerably dissimilar or change over time. We use the connection between
gradient-based meta-learning and hierarchical Bayes to propose a Dirichlet
process mixture of hierarchical Bayesian models over the parameters of an
arbitrary parametric model such as a neural network. In contrast to
consolidating inductive biases into a single set of hyperparameters, our
approach of task-dependent hyperparameter selection better handles latent
distribution shift, as demonstrated on a set of evolving, image-based, few-shot
learning benchmarks.Comment: updated experimental result
Cyclical Learning Rates for Training Neural Networks
It is known that the learning rate is the most important hyper-parameter to
tune for training deep neural networks. This paper describes a new method for
setting the learning rate, named cyclical learning rates, which practically
eliminates the need to experimentally find the best values and schedule for the
global learning rates. Instead of monotonically decreasing the learning rate,
this method lets the learning rate cyclically vary between reasonable boundary
values. Training with cyclical learning rates instead of fixed values achieves
improved classification accuracy without a need to tune and often in fewer
iterations. This paper also describes a simple way to estimate "reasonable
bounds" -- linearly increasing the learning rate of the network for a few
epochs. In addition, cyclical learning rates are demonstrated on the CIFAR-10
and CIFAR-100 datasets with ResNets, Stochastic Depth networks, and DenseNets,
and the ImageNet dataset with the AlexNet and GoogLeNet architectures. These
are practical tools for everyone who trains neural networks.Comment: Presented at WACV 2017; see https://github.com/bckenstler/CLR for
instructions to implement CLR in Kera
Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space
Most existing deep reinforcement learning (DRL) frameworks consider either
discrete action space or continuous action space solely. Motivated by
applications in computer games, we consider the scenario with
discrete-continuous hybrid action space. To handle hybrid action space,
previous works either approximate the hybrid space by discretization, or relax
it into a continuous set. In this paper, we propose a parametrized deep
Q-network (P- DQN) framework for the hybrid action space without approximation
or relaxation. Our algorithm combines the spirits of both DQN (dealing with
discrete action space) and DDPG (dealing with continuous action space) by
seamlessly integrating them. Empirical results on a simulation example, scoring
a goal in simulated RoboCup soccer and the solo mode in game King of Glory
(KOG) validate the efficiency and effectiveness of our method
Reinforcement Learning for Batch Bioprocess Optimization
Bioprocesses have received a lot of attention to produce clean and
sustainable alternatives to fossil-based materials. However, they are generally
difficult to optimize due to their unsteady-state operation modes and
stochastic behaviours. Furthermore, biological systems are highly complex,
therefore plant-model mismatch is often present. To address the aforementioned
challenges we propose a Reinforcement learning based optimization strategy for
batch processes.
In this work, we applied the Policy Gradient method from batch-to-batch to
update a control policy parametrized by a recurrent neural network. We assume
that a preliminary process model is available, which is exploited to obtain a
preliminary optimal control policy. Subsequently, this policy is updatedbased
on measurements from thetrueplant. The capabilities of our proposed approach
were tested on three case studies (one of which is nonsmooth) using a more
complex process model for thetruesystemembedded with adequate process
disturbance. Lastly, we discussed the advantages and disadvantages of this
strategy compared against current existing approaches such as nonlinear model
predictive control
Semantics, Representations and Grammars for Deep Learning
Deep learning is currently the subject of intensive study. However,
fundamental concepts such as representations are not formally defined --
researchers "know them when they see them" -- and there is no common language
for describing and analyzing algorithms. This essay proposes an abstract
framework that identifies the essential features of current practice and may
provide a foundation for future developments.
The backbone of almost all deep learning algorithms is backpropagation, which
is simply a gradient computation distributed over a neural network. The main
ingredients of the framework are thus, unsurprisingly: (i) game theory, to
formalize distributed optimization; and (ii) communication protocols, to track
the flow of zeroth and first-order information. The framework allows natural
definitions of semantics (as the meaning encoded in functions), representations
(as functions whose semantics is chosen to optimized a criterion) and grammars
(as communication protocols equipped with first-order convergence guarantees).
Much of the essay is spent discussing examples taken from the literature. The
ultimate aim is to develop a graphical language for describing the structure of
deep learning algorithms that backgrounds the details of the optimization
procedure and foregrounds how the components interact. Inspiration is taken
from probabilistic graphical models and factor graphs, which capture the
essential structural features of multivariate distributions.Comment: 20 pages, many diagram
CNNs are Globally Optimal Given Multi-Layer Support
Stochastic Gradient Descent (SGD) is the central workhorse for training
modern CNNs. Although giving impressive empirical performance it can be slow to
converge. In this paper we explore a novel strategy for training a CNN using an
alternation strategy that offers substantial speedups during training. We make
the following contributions: (i) replace the ReLU non-linearity within a CNN
with positive hard-thresholding, (ii) reinterpret this non-linearity as a
binary state vector making the entire CNN linear if the multi-layer support is
known, and (iii) demonstrate that under certain conditions a global optima to
the CNN can be found through local descent. We then employ a novel alternation
strategy (between weights and support) for CNN training that leads to
substantially faster convergence rates, nice theoretical properties, and
achieving state of the art results across large scale datasets (e.g. ImageNet)
as well as other standard benchmarks
- …