17 research outputs found

    Reinforcement Learning with Random Delays

    Full text link
    Action and observation delays commonly occur in many Reinforcement Learning applications, such as remote control scenarios. We study the anatomy of randomly delayed environments, and show that partially resampling trajectory fragments in hindsight allows for off-policy multi-step value estimation. We apply this principle to derive Delay-Correcting Actor-Critic (DCAC), an algorithm based on Soft Actor-Critic with significantly better performance in environments with delays. This is shown theoretically and also demonstrated practically on a delay-augmented version of the MuJoCo continuous control benchmark

    When Being Soft Makes You Tough: A Collision Resilient Quadcopter Inspired by Arthropod Exoskeletons

    Full text link
    Flying robots are usually rather delicate, and require protective enclosures when facing the risk of collision. High complexity and reduced payload are recurrent problems with collision-tolerant flying robots. Inspired by arthropods' exoskeletons, we design a simple, easily manufactured, semi-rigid structure with flexible joints that can withstand high-velocity impacts. With an exoskeleton, the protective shell becomes part of the main robot structure, thereby minimizing its loss in payload capacity. Our design is simple to build and customize using cheap components and consumer-grade 3D printers. Our results show we can build a sub-250g, autonomous quadcopter with visual navigation that can survive multiple collisions at speeds up to 7m/s that is also suitable for automated battery swapping, and with enough computing power to run deep neural network models. This structure makes for an ideal platform for high-risk activities (such as flying in a cluttered environment or reinforcement learning training) without damage to the hardware or the environment

    Deep Reinforcement Learning in Real-Time Environments

    No full text
    RÉSUMÉ : L’Apprentissage par Renforcement regroupe une famille d’algorithmes permettant de découvrir des contrôleurs performants. Ces algorithmes fonctionnent sur le principe d’essai-erreur, en maximisant la somme cumulative d’un signal dit de "récompense", conçu par le praticien. Dans des environnements simples et idéalisés vérifiant certaines propriétés, nombre de ces algorithmes sont mathématiquement garantis de découvrir un contrôleur optimal. Les récents progrès de l’Apprentissage Profond ont permis d’étendre avec succès les algorithmes d’Apprentissage par Renforcement à des environnements beaucoup plus complexes. Néanmoins, ces succès demeurent le plus souvent cantonnés à des applications très cadrées telles que le jeu de go, d’échecs, ou encore les simulations informatiques. Dans ce mémoire, nous nous intéressons à étendre le domaine d’applicabilité de la discipline aux environnements réels, afin de faciliter par exemple l’apprentissage de contrôleurs pour la robotique. Il est en effet difficile d’appliquer avec succès les algorithmes d’Apprentissage par Renforcement dans de tels environnements. En particulier, les environnements utilisés dans la littérature sont conçus pour être assimilables à des Processus Décisionnels Markoviens, sur lesquels se base toute la théorie de l’Apprentissage par Renforcement. Cependant, le monde réel est en général beaucoup trop complexe pour être naïvement assimilé à de tels objets idéaux. En particulier, il est vraisemblablement impossible d’observer l’intégralité de l’univers, le monde réel est non-stationnaire, et les évènements s’y déroulent de manière continue en temps-réel. Dans le cadre de ce mémoire, notre objectif est plus particulièrement d’étendre la théorie et la pratique de l’Apprentissage par Renforcement au domaine du temps-réel.----------ABSTRACT : Whereas all environments commonly used in the Reinforcement Learning (RL) literature are paused between transitions, it is simply not possible to pause the real world. Thus, action and observation delays commonly occur in many practical RL applications. In our central contribution, we study the anatomy of randomly delayed environments, and show that partially resampling trajectory fragments in hindsight allows for unbiased and low-variance off-policy multi-step value estimation. We apply this principle to derive Delay-Correcting Actor-Critic (DCAC), an algorithm based on Soft Actor-Critic with significantly better performance in environments with delays. This is shown theoretically and also demonstrated practically on a delay-augmented version of the MuJoCo continuous control benchmark. This contribution, presented in central chapter of this thesis, has been accepted as a conference paper at the International Conference on Learning Representations (ICLR 2021). In our second and more practical contribution, we develop RL environments in real-time applications. We provide a python helper, Real-Time Gym, that enables implementing delayed RL environments in the real world with minimal effort. We demonstrate this helper on applications such as robotics and real-time video-games. We further introduce a framework that we developed in order to train our real systems distantly on a High Performance Computing server, and present promising results on autonomous car racing tasks

    trackmania-rl/tmrl: Release 0.4.2

    No full text
    Minor release 0.4.2 <p>This update improves TrackMania support (thanks @LaurensNeinders !)</p> <p><strong>Perform a clean install by deleting the <code>TmrlData</code> folder AND <code>Plugin_GrabData_0_1.as</code> from <code>OpenPlanetNext\Scripts</code></strong></p> <p>Release 0.4.2 ships with an optimized version of the OpenPlanet script for the default TrackMania environments. In particular, it should alleviate a common issue where the environment randomly times-out for a roughly constant amount of time even on high-end PCs.</p&gt

    trackmania-rl/tmrl: Release 0.5.2

    No full text
    Minor release <p>Release 0.5.2 fixes a bug in the SAC implementation provided in version 0.5.1, which had been inadvertently pushed from a development branch.</p&gt

    trackmania-rl/tmrl: Release 0.4.0

    No full text
    Major release 0.4.0 <p>:warning: <strong>This release is backward-incompatible!</strong></p> <p><code>tmrl</code> 0.4.0 introduces network security and many changes to the API.</p> <p>In particular:</p> <ul> <li>network security is now handled by <a href="https://github.com/MISTLab/tls-python-object">tlspyo</a></li> <li>the API is now compatible with any framework (i.e., we do not force the use of PyTorch anymore)</li> <li>the framework now supports color images</li> <li>the default <code>config.json</code> file is tuned for fast training of CNN (grayscale) policies</li> </ul> <p><code>tmrl</code> 0.4.0 also "officially" launches the first Beta of the <a href="https://github.com/trackmania-rl/tmrl/blob/master/readme/competition.md">TrackMania Roborace League</a> competition. The <a href="https://github.com/trackmania-rl/tmrl/blob/master/tmrl/tuto/competition/custom_actor_module.py">competition tutorial script</a> is particularly useful for ML developers who desire to code advanced training pipelines in TrackMania.</p&gt

    trackmania-rl/tmrl: Release 0.4.1

    No full text
    Minor release 0.4.1 <p>This release uses a new feature of <code>tlspyo 0.2.5</code> to stop randomly deserializing objects in the background during episode collection.</p&gt

    trackmania-rl/tmrl: Release 0.5.1

    No full text
    Release 0.5.1 <p>This release complies with <code>rtgym>=0.9</code>, which in turns complies with the <code>gymnasium</code> signature of the <code>reset</code> function.</p> <p>In case you are using custom <code>rtgym</code> interfaces in <code>tmrl</code>, you will want to update your <code>reset</code> implementations. This is straightforward, you can just replace:</p> <pre><code class="lang-python">def reset(self): </code></pre> <p>with:</p> <pre><code class="lang-python">def reset(self, seed=None, options=None): </code></pre&gt

    trackmania-rl/tmrl: Release 0.2.0

    No full text
    <p>This release introduces new flavors of the TrackMania Gym environments: one featuring raw screenshots, and one featuring a track-progress observation along the LIDAR (you can use this by ending the "ENV" entry with "LIDARPROGRESS" in <code>config.json</code>).</p> <p>This release also fixes the latest breaking changes that the new Gym maintainers team keeps introducing...</p> <p><strong>This version is backward incompatible!</strong></p> <p>Please perform a clean install by deleting your <code>TmrlData</code> folder before pip-installing.</p&gt
    corecore