2,030 research outputs found

    Importance mixing: Improving sample reuse in evolutionary policy search methods

    Full text link
    Deep neuroevolution, that is evolutionary policy search methods based on deep neural networks, have recently emerged as a competitor to deep reinforcement learning algorithms due to their better parallelization capabilities. However, these methods still suffer from a far worse sample efficiency. In this paper we investigate whether a mechanism known as "importance mixing" can significantly improve their sample efficiency. We provide a didactic presentation of importance mixing and we explain how it can be extended to reuse more samples. Then, from an empirical comparison based on a simple benchmark, we show that, though it actually provides better sample efficiency, it is still far from the sample efficiency of deep reinforcement learning, though it is more stable

    Agile low phase noise radio-frequency sine wave generator applied to experiments on ultracold atoms

    Full text link
    We report on the frequency performance of a low cost (~500$) radio-frequency sine wave generator, using direct digital synthesis (DDS) and a field-programmable gate array (FPGA). The output frequency of the device may be changed dynamically to any arbitrary value ranging from DC to 10 MHz without any phase slip. Sampling effects are substantially reduced by a high sample rate, up to 1 MHz, and by a large memory length, more than 2.10^5 samples. By using a low noise external oscillator to clock the DDS, we demonstrate a phase noise as low as that of the master clock, that is at the level of -113 dB.rad^2/Hz at 1 Hz from the carrier for an output frequency of 3.75 MHz. The device is successfully used to confine an ultracold atomic cloud of rubidium 87 in a RF-based trap, and there is no extra heating from the RF source.Comment: 10 pages, 6 figure

    Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration

    Full text link
    Deep Reinforcement Learning has been successfully applied to learn robotic control. However, the corresponding algorithms struggle when applied to problems where the agent is only rewarded after achieving a complex task. In this context, using demonstrations can significantly speed up the learning process, but demonstrations can be costly to acquire. In this paper, we propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration. To do so, our method learns a goal-conditioned policy to control a system between successive low-dimensional goals. This sequential goal-reaching approach raises a problem of compatibility between successive goals: we need to ensure that the state resulting from reaching a goal is compatible with the achievement of the following goals. To tackle this problem, we present a new algorithm called DCIL-II. We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up as well as fast running with a simulated Cassie robot. Our method leveraging sequentiality is a step towards the resolution of complex robotic tasks under minimal specification effort, a key feature for the next generation of autonomous robots

    Timed Specification For Web Services Compatibility Analysis

    Get PDF
    AbstractWeb services are becoming one of the main technologies for designing and building complex inter-enterprise business applications. Usually, a business application cannot be fulfilled by one Web service but by coordinating a set of them. In particular, to perform a coordination, one of the important investigations is the compatibility analysis. Two Web services are said compatible if they can interact correctly. In the literature, the proposed frameworks for the services compatibility checking rely on the supported sequences of messages. The interaction of services depends also on other properties, such that the exchanged data flow. Thus, considering only supported sequences of messages seems to be insufficient. Other properties on which the services interaction can rely on, are the temporal constraints. In this paper, we focus our interest on the compatibility analysis of Web services regarding their (1) supported sequences of messages, (2) the exchanged data flow, (3) constraints related to the exchanged data flow and (4) the temporal requirements. Based on these properties, we study three compatibility classes: (i) absolute compatibility, (ii) likely compatibility and (iii) absolute incompatibility

    The problem with DDPG: understanding failures in deterministic environments with sparse rewards

    Full text link
    In environments with continuous state and action spaces, state-of-the-art actor-critic reinforcement learning algorithms can solve very complex problems, yet can also fail in environments that seem trivial, but the reason for such failures is still poorly understood. In this paper, we contribute a formal explanation of these failures in the particular case of sparse reward and deterministic environments. First, using a very elementary control problem, we illustrate that the learning process can get stuck into a fixed point corresponding to a poor solution. Then, generalizing from the studied example, we provide a detailed analysis of the underlying mechanisms which results in a new understanding of one of the convergence regimes of these algorithms. The resulting perspective casts a new light on already existing solutions to the issues we have highlighted, and suggests other potential approaches.Comment: 19 pages, submitted to ICLR 202

    PBCS : Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning

    Full text link
    The exploration-exploitation trade-off is at the heart of reinforcement learning (RL). However, most continuous control benchmarks used in recent RL research only require local exploration. This led to the development of algorithms that have basic exploration capabilities, and behave poorly in benchmarks that require more versatile exploration. For instance, as demonstrated in our empirical study, state-of-the-art RL algorithms such as DDPG and TD3 are unable to steer a point mass in even small 2D mazes. In this paper, we propose a new algorithm called "Plan, Backplay, Chain Skills" (PBCS) that combines motion planning and reinforcement learning to solve hard exploration environments. In a first phase, a motion planning algorithm is used to find a single good trajectory, then an RL algorithm is trained using a curriculum derived from the trajectory, by combining a variant of the Backplay algorithm and skill chaining. We show that this method outperforms state-of-the-art RL algorithms in 2D maze environments of various sizes, and is able to improve on the trajectory obtained by the motion planning phase
    corecore