569 research outputs found

    Generic rank-one corrections for value iteration : in Markovian decision problems

    Get PDF
    Caption title.Includes bibliographical references (p. 12-13).Supported by the NSF. CCR-9103804by Dimitri P. Bertsekas

    A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using LL-λ\lambda Smoothness

    Full text link
    Gradient Temporal Difference (GTD) algorithms (Sutton et al., 2008, 2009) are the first O(d)O(d) (dd is the number features) algorithms that have convergence guarantees for off-policy learning with linear function approximation. Liu et al. (2015) and Dalal et. al. (2018) proved the convergence rates of GTD, GTD2 and TDC are O(t−α/2)O(t^{-\alpha/2}) for some α∈(0,1)\alpha \in (0,1). This bound is tight (Dalal et al., 2020), and slower than O(1/t)O(1/\sqrt{t}). GTD algorithms also have two step-size parameters, which are difficult to tune. In literature, there is a "single-time-scale" formulation of GTD. However, this formulation still has two step-size parameters. This paper presents a truly single-time-scale GTD algorithm for minimizing the Norm of Expected td Update (NEU) objective, and it has only one step-size parameter. We prove that the new algorithm, called Impression GTD, converges at least as fast as O(1/t)O(1/t). Furthermore, based on a generalization of the expected smoothness (Gower et al. 2019), called LL-λ\lambda smoothness, we are able to prove that the new GTD converges even faster, in fact, with a linear rate. Our rate actually also improves Gower et al.'s result with a tighter bound under a weaker assumption. Besides Impression GTD, we also prove the rates of three other GTD algorithms, one by Yao and Liu (2008), another called A-transpose-TD (Sutton et al., 2008), and a counterpart of A-transpose-TD. The convergence rates of all the four GTD algorithms are proved in a single generic GTD framework to which LL-λ\lambda smoothness applies. Empirical results on Random walks, Boyan chain, and Baird counterexample show that Impression GTD converges much faster than existing GTD algorithms for both on-policy and off-policy learning problems, with well-performing step-sizes in a big range

    Conciliating accuracy and efficiency to empower engineering based on performance: a short journey

    Get PDF
    This paper revisits the different arts of engineering. The art of modeling for describing the behavior of complex systems from the solution of partial differential equations that are expected to govern their responses. Then, the art of simulation concerns the ability of solving these complex mathematical objects expected to describe the physical reality as accurately as possible (accuracy with respect to the exact solution of the models) and as fast as possible. Finally, the art of decision making needs to ensure accurate and fast predictions for efficient diagnosis and prognosis. For that purpose physics-informed digital twins (also known as Hybrid Twins) will be employed, allying real-time physics (where complex models are solved by using advanced model order reduction techniques) and physics-informed data-driven models for filling the gap between the reality and the physics-based model predictions. The use of physics-aware data-driven models in tandem with physics-based reduced order models allows us to predict very fast without compromising accuracy. This is compulsory for diagnosis and prognosis purposes
    • …
    corecore