21 research outputs found

    Model-free trajectory optimization for reinforcement learning

    Get PDF
    Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics

    Empowered skills

    Get PDF
    Robot Reinforcement Learning (RL) algorithms return a policy that maximizes a global cumulative reward signal but typically do not create diverse behaviors. Hence, the policy will typically only capture a single solution of a task. However, many motor tasks have a large variety of solutions and the knowledge about these solutions can have several advantages. For example, in an adversarial setting such as robot table tennis, the lack of diversity renders the behavior predictable and hence easy to counter for the opponent. In an interactive setting such as learning from human feedback, an emphasis on diversity gives the human more opportunity for guiding the robot and to avoid the latter to be stuck in local optima of the task. In order to increase diversity of the learned behaviors, we leverage prior work on intrinsic motivation and empowerment. We derive a new intrinsic motivation signal by enriching the description of a task with an outcome space, representing interesting aspects of a sensorimotor stream. For example, in table tennis, the outcome space could be given by the return position and return ball speed. The intrinsic motivation is now given by the diversity of future outcomes, a concept also known as empowerment. We derive a new policy search algorithm that maximizes a trade-off between the extrinsic reward and this intrinsic motivation criterion. Experiments on a planar reaching task and simulated robot table tennis demonstrate that our algorithm can learn a diverse set of behaviors within the area of interest of the tasks

    Model-Free Trajectory-based Policy Optimization with Monotonic Improvement

    Get PDF
    Many of the recent trajectory optimization algorithms alternate between linear approximation of the system dynamics around the mean trajectory and conservative policy update. One way of constraining the policy change is by bounding the Kullback-Leibler (KL) divergence between successive policies. These approaches already demonstrated great experimental success in challenging problems such as end-to-end control of physical systems. However, the linear approximation of the system dynamics can introduce a bias in the policy update and prevent convergence to the optimal policy. In this article, we propose a new model-free trajectory-based policy optimization algorithm with guaranteed monotonic improvement. The algorithm backpropagates a local, quadratic and time-dependent Q-Function learned from trajectory data instead of a model of the system dynamics. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics. We experimentally demonstrate on highly non-linear control tasks the improvement in performance of our algorithm in comparison to approaches linearizing the system dynamics. In order to show the monotonic improvement of our algorithm, we additionally conduct a theoretical analysis of our policy update scheme to derive a lower bound of the change in policy return between successive iterations

    Model-Free Trajectory-based Policy Optimization with Monotonic Improvement

    Get PDF
    Many of the recent trajectory optimization algorithms alternate between linear approximation of the system dynamics around the mean trajectory and conservative policy update. One way of constraining the policy change is by bounding the Kullback-Leibler (KL) divergence between successive policies. These approaches already demonstrated great experimental success in challenging problems such as end-to-end control of physical systems. However, these approaches lack any improvement guarantee as the linear approximation of the system dynamics can introduce a bias in the policy update and prevent convergence to the optimal policy. In this article, we propose a new model-free trajectory-based policy optimization algorithm with guaranteed monotonic improvement. The algorithm backpropagates a local, quadratic and time-dependent Q-Function learned from trajectory data instead of a model of the system dynamics. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics. We experimentally demonstrate on highly non-linear control tasks the improvement in performance of our algorithm in comparison to approaches linearizing the system dynamics. In order to show the monotonic improvement of our algorithm, we additionally conduct a theoretical analysis of our policy update scheme to derive a lower bound of the change in policy return between successive iterations

    Porous electrodes based on platinum capped electrocatalyst: Combining thermal treatment XPS analysis and electrochemistry give evidence for the stabilizing role of the thiol capping agent on the Pt dispersion and core feature

    No full text
    International audienceIn previous work we reported oxygen reduction reaction (ORR) studies on porous electrodes based on capped platinum (Pt) electrocatalyst and carbon nanotubes. These structures exhibited a significant activity but a very low platinum electrochemically active surface area (Pt-EASA) due to the grafted molecules on the platinum core nanoparticle surface. The present paper reports on thermal pre-treatment of such electrodes at moderate temperature aiming at degrading the organic capping without changing the nanoparticle Pt core feature. Using X-ray diffraction, transmission electron microscopy (TEM) and X-ray photoelectron spectroscopy (XPS) it is shown that a treatment at 100 C under air fit these requirements. This results in a strong increase of the Pt-EASA. However, it is evidenced by TEM and XRD that, as soon as the organic capping is modified by partial oxidation of sulfur atoms involved in initial strong Pt-S bond, the electrochemical measurement triggers dramatic changes on the Pt dispersion and Pt core feature. Much bigger size Pt regions are formed, complete oxidation of the sulfur atoms is observed and organic capping molecules are significantly eliminated in the electrolyte. Finally, although it was not possible to prove systematically that we get rid of organic contamination after the involved treatments, the dramatic changes of Pt catalyst nanoparticles compared to the initial organically capped ones are clearly established. These original results demonstrate the essential stabilizing role of the grafted thiol molecules in the initial system and allow proposing a scenario for the ageing of these capped electrocatalyst when submitted to prolonged ORR

    Combining ability and heterosis of maize (Zea mays L.) populations from the Algerian Sahara Desert under Mediterranean drought conditions

    No full text
    Drought causes significant yield reduction in maize (Zea mays L.), and germplasm from the Saharan Desert offers potential sources of drought tolerance. Our objectives were to estimate heterosis and combining abilityamong Algerian maize populations under drought conditions and to identify populations and crosses as sources of drought tolerance for breeding programs in temperate environments. A diallel design without reciprocal ofsix populations was used. The populations per se, their respective crosses, and checks were evaluated in Algiers (Algeria) in 2016, 2017 and 2018. Algerian maize populations exhibited high phenotypic variability and genetic divergence under water stress. The populations IGS and AOR per se could provide favorable alleles for higher early vigor under drought, MST for reducing anthesis-silking interval (ASI), and both AOR and SHH for increasing yield under water stress. Among all crosses, IGS × MST was the most outstanding cross for reducing ASI, and IGS × SHH and BAH × SHH for increasing yield under water stress. Our results confirm the existence of  heterotic relationships among Algerian maize populations from diverse origins under water stres

    Evidence for high performances of low Pt loading electrodes based on capped platinum electrocatalyst and carbon nanotubes in fuel cell devices

    No full text
    International audienceRecently we reported the preparation and electrochemical behaviour of porous electrodes based on the controlled combination of carbon nanotubes and capped platinum nanoparticles towards oxygen reduction. Due to the organic crown of the nanoparticles, the electrodes exhibited low hydrogen underpotential deposition (H upd) electroactive surface areas but significant activity towards oxygen reduction was recorded down to very low platinum loadings of few mu g/cm(2). While the presence of organic stabilizing material, at the surface of the electrocatalyst synthesized by wet chemistry, may be considered as a potential drawback in fuel cell community, we present in this paper results showing that our capped electrocatalyst associated with carbon nanotubes can be used without any pre-treatment and exhibit high performances in fuel cell devices, in spite of low platinum loadings. Beyond the practical interest of such capped nanoparticles in fuel cell technology demonstrated here, fundamental question related to the high performances of the capped electrocatalyst are still opened and are currently under investigation

    Nycturie du patient âgé : en pratique [Nocturia in aged patient: in practice]

    No full text
    Nocturia is defined as the complaint that the individual has to wake at night to urinate. In older persons, this urinary functional disorder is most often of multifactorial origin and/or the symptom (sometimes the unique one) of a chronic disease. Nocturia is very annoying and its impact on health and quality of life is related to the disturbance of sleep cycles. In aged patients, who are often polymorbide and polymedicated, the interaction between nocturia and geriatric syndromes as well as comorbidities has to be more particularly underlined. The impact on informal caregiver's health and the decision for institutional admission are also to be considered. An adapted management of nocturia improves quality of life and reduces morbidity in aged patients
    corecore