31,633 research outputs found
On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation
Reinforcement learning, mathematically described by Markov Decision Problems,
may be approached either through dynamic programming or policy search.
Actor-critic algorithms combine the merits of both approaches by alternating
between steps to estimate the value function and policy gradient updates. Due
to the fact that the updates exhibit correlated noise and biased gradient
updates, only the asymptotic behavior of actor-critic is known by connecting
its behavior to dynamical systems. This work puts forth a new variant of
actor-critic that employs Monte Carlo rollouts during the policy search
updates, which results in controllable bias that depends on the number of
critic evaluations. As a result, we are able to provide for the first time the
convergence rate of actor-critic algorithms when the policy search step employs
policy gradient, agnostic to the choice of policy evaluation technique. In
particular, we establish conditions under which the sample complexity is
comparable to stochastic gradient method for non-convex problems or slower as a
result of the critic estimation error, which is the main complexity bottleneck.
These results hold in continuous state and action spaces with linear function
approximation for the value function. We then specialize these conceptual
results to the case where the critic is estimated by Temporal Difference,
Gradient Temporal Difference, and Accelerated Gradient Temporal Difference.
These learning rates are then corroborated on a navigation problem involving an
obstacle, providing insight into the interplay between optimization and
generalization in reinforcement learning
Model based learning for accelerated, limited-view 3D photoacoustic tomography
Recent advances in deep learning for tomographic reconstructions have shown
great potential to create accurate and high quality images with a considerable
speed-up. In this work we present a deep neural network that is specifically
designed to provide high resolution 3D images from restricted photoacoustic
measurements. The network is designed to represent an iterative scheme and
incorporates gradient information of the data fit to compensate for limited
view artefacts. Due to the high complexity of the photoacoustic forward
operator, we separate training and computation of the gradient information. A
suitable prior for the desired image structures is learned as part of the
training. The resulting network is trained and tested on a set of segmented
vessels from lung CT scans and then applied to in-vivo photoacoustic
measurement data
- …