3,931 research outputs found
Smoothing Policies and Safe Policy Gradients
Policy gradient algorithms are among the best candidates for the much
anticipated application of reinforcement learning to real-world control tasks,
such as the ones arising in robotics. However, the trial-and-error nature of
these methods introduces safety issues whenever the learning phase itself must
be performed on a physical system. In this paper, we address a specific safety
formulation, where danger is encoded in the reward signal and the learning
agent is constrained to never worsen its performance. By studying actor-only
policy gradient from a stochastic optimization perspective, we establish
improvement guarantees for a wide class of parametric policies, generalizing
existing results on Gaussian policies. This, together with novel upper bounds
on the variance of policy gradient estimators, allows to identify those
meta-parameter schedules that guarantee monotonic improvement with high
probability. The two key meta-parameters are the step size of the parameter
updates and the batch size of the gradient estimators. By a joint, adaptive
selection of these meta-parameters, we obtain a safe policy gradient algorithm
Training Input-Output Recurrent Neural Networks through Spectral Methods
We consider the problem of training input-output recurrent neural networks
(RNN) for sequence labeling tasks. We propose a novel spectral approach for
learning the network parameters. It is based on decomposition of the
cross-moment tensor between the output and a non-linear transformation of the
input, based on score functions. We guarantee consistent learning with
polynomial sample and computational complexity under transparent conditions
such as non-degeneracy of model parameters, polynomial activations for the
neurons, and a Markovian evolution of the input sequence. We also extend our
results to Bidirectional RNN which uses both previous and future information to
output the label at each time point, and is employed in many NLP tasks such as
POS tagging
- …