569 research outputs found
Generic rank-one corrections for value iteration : in Markovian decision problems
Caption title.Includes bibliographical references (p. 12-13).Supported by the NSF. CCR-9103804by Dimitri P. Bertsekas
A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using - Smoothness
Gradient Temporal Difference (GTD) algorithms (Sutton et al., 2008, 2009) are
the first ( is the number features) algorithms that have convergence
guarantees for off-policy learning with linear function approximation. Liu et
al. (2015) and Dalal et. al. (2018) proved the convergence rates of GTD, GTD2
and TDC are for some . This bound is tight
(Dalal et al., 2020), and slower than . GTD algorithms also have
two step-size parameters, which are difficult to tune. In literature, there is
a "single-time-scale" formulation of GTD. However, this formulation still has
two step-size parameters.
This paper presents a truly single-time-scale GTD algorithm for minimizing
the Norm of Expected td Update (NEU) objective, and it has only one step-size
parameter. We prove that the new algorithm, called Impression GTD, converges at
least as fast as . Furthermore, based on a generalization of the
expected smoothness (Gower et al. 2019), called - smoothness, we
are able to prove that the new GTD converges even faster, in fact, with a
linear rate. Our rate actually also improves Gower et al.'s result with a
tighter bound under a weaker assumption. Besides Impression GTD, we also prove
the rates of three other GTD algorithms, one by Yao and Liu (2008), another
called A-transpose-TD (Sutton et al., 2008), and a counterpart of
A-transpose-TD. The convergence rates of all the four GTD algorithms are proved
in a single generic GTD framework to which - smoothness applies.
Empirical results on Random walks, Boyan chain, and Baird counterexample show
that Impression GTD converges much faster than existing GTD algorithms for both
on-policy and off-policy learning problems, with well-performing step-sizes in
a big range
Recommended from our members
Approaches to Safety in Inverse Reinforcement Learning
As the capabilities of robotic systems increase, we move closer to the vision of ubiquitous robotic assistance throughout our everyday lives. In transitioning robots and autonomous systems from traditional factory and industrial settings, it is critical that these systems are able to adapt to uncertain environments and the humans who populate them. In order to better understand and predict the behavior of these humans, Inverse Reinforcement Learning (IRL) uses demonstrations to infer the underlying motivations driving human actions. The information gained from IRL can be used to improve a robot’s understanding of the environment as well as to allow the robot to better interact with or assist humans.In this dissertation, we address the challenge of incorporating safety into the application of IRL. We first consider safety in the context of using IRL for assisting humans in shared control tasks. Through a user study, we show how incorporating haptic feedback into human assistance can increase humans’ sense of control while improving safety in the presence of imperfect learning. Further, we present our method for using IRL to automatically create such haptic feedback policies from task demonstrations.We further address safety in IRL by incorporating notions of safety directly into the learning process. Currently, most work on IRL focuses on learning explanatory rewards that humans are modeled as optimizing. However, pure reward optimization can fail to effectively capture hard requirements, such as safety constraints. We draw on the definition of safety from Hamilton-Jacobi reachability analysis to infer human perceptions of safety and to modify robot behavior to respect these learned safety constraints. We also extend this work on learning constraints by adapting the framework of Maximum Entropy IRL in order to learn hard constraints given nominal task rewards, and we show how this technique infers the most likely constraints to align expected behavior with observed demonstrations
Conciliating accuracy and efficiency to empower engineering based on performance: a short journey
This paper revisits the different arts of engineering. The art of modeling for describing the behavior of complex systems from the solution of partial differential equations that are expected to govern their responses. Then, the art of simulation concerns the ability of solving these complex mathematical objects expected to describe the physical reality as accurately as possible (accuracy with respect to the exact solution of the models) and as fast as possible. Finally, the art of decision making needs to ensure accurate and fast predictions for efficient diagnosis and prognosis. For that purpose physics-informed digital twins (also known as Hybrid Twins) will be employed, allying real-time physics (where complex models are solved by using advanced model order reduction techniques) and physics-informed data-driven models for filling the gap between the reality and the physics-based model predictions. The use of physics-aware data-driven models in tandem with physics-based reduced order models allows us to predict very fast without compromising accuracy. This is compulsory for diagnosis and prognosis purposes
- …