610 research outputs found

    Backward Imitation and Forward Reinforcement Learning via Bi-directional Model Rollouts

    Full text link
    Traditional model-based reinforcement learning (RL) methods generate forward rollout traces using the learnt dynamics model to reduce interactions with the real environment. The recent model-based RL method considers the way to learn a backward model that specifies the conditional probability of the previous state given the previous action and the current state to additionally generate backward rollout trajectories. However, in this type of model-based method, the samples derived from backward rollouts and those from forward rollouts are simply aggregated together to optimize the policy via the model-free RL algorithm, which may decrease both the sample efficiency and the convergence rate. This is because such an approach ignores the fact that backward rollout traces are often generated starting from some high-value states and are certainly more instructive for the agent to improve the behavior. In this paper, we propose the backward imitation and forward reinforcement learning (BIFRL) framework where the agent treats backward rollout traces as expert demonstrations for the imitation of excellent behaviors, and then collects forward rollout transitions for policy reinforcement. Consequently, BIFRL empowers the agent to both reach to and explore from high-value states in a more efficient manner, and further reduces the real interactions, making it potentially more suitable for real-robot learning. Moreover, a value-regularized generative adversarial network is introduced to augment the valuable states which are infrequently received by the agent. Theoretically, we provide the condition where BIFRL is superior to the baseline methods. Experimentally, we demonstrate that BIFRL acquires the better sample efficiency and produces the competitive asymptotic performance on various MuJoCo locomotion tasks compared against state-of-the-art model-based methods.Comment: Accepted by IROS202

    On a conjecture of Ghorpade, Datta and Beelen for the number of points of varities over finite fields

    Full text link
    Consider a finite field Fq\mathbb{F}_q and positive integers d,m,rd,m,r with 1≤r≤(m+dd)1\leq r\leq \binom{m+d}{d}. Let Sd(m)S_d(m) be the Fq\mathbb{F}_q vector space of all homogeneous polynomials of degree dd in X0,…,XmX_0,\dots,X_m. Let er(d,m)e_r(d,m) be the maximum number of Fq\mathbb{F}_q-rational points in the vanishing set of WW as WW varies through all subspaces of Sd(m)S_d(m) of dimension rr. Ghorpade, Datta and Beelen had conjectured an exact formula of er(d,m)e_r(d,m) when q≥d+1q\geq d+1. We prove that their conjectured formula is true when qq is sufficiently large in terms of m,d,rm,d,r. The problem of determining er(d,m)e_r(d,m) is equivalent to the problem of computing the rthr^{th} generalized hamming weights of projective the Reed Muller code PRMq(d,m)PRM_q(d,m). It is also equivalent to the problem of determining the maximum number of points on sections of Veronese varieties by linear subvarieties of codimension rr
    • …
    corecore