23 research outputs found
How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies
Using deep neural nets as function approximator for reinforcement learning
tasks have recently been shown to be very powerful for solving problems
approaching real-world complexity. Using these results as a benchmark, we
discuss the role that the discount factor may play in the quality of the
learning process of a deep Q-network (DQN). When the discount factor
progressively increases up to its final value, we empirically show that it is
possible to significantly reduce the number of learning steps. When used in
conjunction with a varying learning rate, we empirically show that it
outperforms original DQN on several experiments. We relate this phenomenon with
the instabilities of neural networks when they are used in an approximate
Dynamic Programming setting. We also describe the possibility to fall within a
local optimum during the learning process, thus connecting our discussion with
the exploration/exploitation dilemma.Comment: NIPS 2015 Deep Reinforcement Learning Worksho
Differentiable Algorithm Networks for Composable Robot Learning
This paper introduces the Differentiable Algorithm Network (DAN), a
composable architecture for robot learning systems. A DAN is composed of neural
network modules, each encoding a differentiable robot algorithm and an
associated model; and it is trained end-to-end from data. DAN combines the
strengths of model-driven modular system design and data-driven end-to-end
learning. The algorithms and models act as structural assumptions to reduce the
data requirements for learning; end-to-end learning allows the modules to adapt
to one another and compensate for imperfect models and algorithms, in order to
achieve the best overall system performance. We illustrate the DAN methodology
through a case study on a simulated robot system, which learns to navigate in
complex 3-D environments with only local visual observations and an image of a
partially correct 2-D floor map.Comment: RSS 2019 camera ready. Video is available at
https://youtu.be/4jcYlTSJF4
On overfitting and asymptotic bias in batch reinforcement learning with partial observability
This paper provides an analysis of the tradeoff between asymptotic bias
(suboptimality with unlimited data) and overfitting (additional suboptimality
due to limited data) in the context of reinforcement learning with partial
observability. Our theoretical analysis formally characterizes that while
potentially increasing the asymptotic bias, a smaller state representation
decreases the risk of overfitting. This analysis relies on expressing the
quality of a state representation by bounding L1 error terms of the associated
belief states. Theoretical results are empirically illustrated when the state
representation is a truncated history of observations, both on synthetic POMDPs
and on a large-scale POMDP in the context of smartgrids, with real-world data.
Finally, similarly to known results in the fully observable setting, we also
briefly discuss and empirically illustrate how using function approximators and
adapting the discount factor may enhance the tradeoff between asymptotic bias
and overfitting in the partially observable context.Comment: Accepted at the Journal of Artificial Intelligence Research (JAIR) -
31 page