1,879 research outputs found

    Parallel QQ-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation

    Full text link
    Reinforcement learning is time-consuming for complex tasks due to the need for large amounts of training data. Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU. Most prior works used on-policy methods like PPO due to their simplicity and ease of scaling. Off-policy methods are more data efficient but challenging to scale, resulting in a longer wall-clock training time. This paper presents a Parallel QQ-Learning (PQL) scheme that outperforms PPO in wall-clock time while maintaining superior sample efficiency of off-policy learning. PQL achieves this by parallelizing data collection, policy learning, and value learning. Different from prior works on distributed off-policy learning, such as Apex, our scheme is designed specifically for massively parallel GPU-based simulation and optimized to work on a single workstation. In experiments, we demonstrate that QQ-learning can be scaled to \textit{tens of thousands of parallel environments} and investigate important factors affecting learning speed. The code is available at https://github.com/Improbable-AI/pql.Comment: Accepted by ICML 202

    Towards a decision-support framework for reducing ramp-up effort in plug-and-produce systems

    Get PDF
    Nowadays, shorter and more flexible production cycles are vital to meet the increasing customized product demand. As any delays and downtimes in the production towards time-to-market means a substantial financial loss, manufacturers take an interest in getting the production system to full utilization as quickly as possible. The concept of plug-and-produce manufacturing systems facilitates an easy integration process through embedded intelligence in the devices. However, a human still needs to validate the functionality of the system and more importantly must ensure that the required quality and performance is delivered. This is done during the ramp-up phase, where the system is assembled and tested first-time. System adaptations and a lack of standard procedures make the ramp-up process still largely dependent on the operator’s experience level. A major problem that currently occurs during ramp-up, is a loss of knowledge and information due to a lack of means to capture the human’s experience. Capturing this information can be used to facilitate future ramp-up cases as additional insights about change actions and their effect on the system could be revealed. Hence, this paper proposes a decision-support framework for plugand-produce assembly systems that will help to reduce the ramp-up effort and ultimately shorten ramp-up time. As an illustrative example, a gluing station developed for the European project openMOS is considered

    Advances in Condition Monitoring, Optimization and Control for Complex Industrial Processes

    Get PDF
    The book documents 25 papers collected from the Special Issue “Advances in Condition Monitoring, Optimization and Control for Complex Industrial Processes”, highlighting recent research trends in complex industrial processes. The book aims to stimulate the research field and be of benefit to readers from both academic institutes and industrial sectors

    OmniDrones: An Efficient and Flexible Platform for Reinforcement Learning in Drone Control

    Full text link
    In this work, we introduce OmniDrones, an efficient and flexible platform tailored for reinforcement learning in drone control, built on Nvidia's Omniverse Isaac Sim. It employs a bottom-up design approach that allows users to easily design and experiment with various application scenarios on top of GPU-parallelized simulations. It also offers a range of benchmark tasks, presenting challenges ranging from single-drone hovering to over-actuated system tracking. In summary, we propose an open-sourced drone simulation platform, equipped with an extensive suite of tools for drone learning. It includes 4 drone models, 5 sensor modalities, 4 control modes, over 10 benchmark tasks, and a selection of widely used RL baselines. To showcase the capabilities of OmniDrones and to support future research, we also provide preliminary results on these benchmark tasks. We hope this platform will encourage further studies on applying RL to practical drone systems.Comment: Submitted to IEEE RA-

    Dispatching AGVs with Battery Constraints using Deep Reinforcement Learning

    Get PDF
    This paper considers the problem of real-time dispatching of a fleet of automated guided vehicles (AGVs) with battery constraints. AGVs must be immediately assigned to transport requests, which arrive randomly. In addition, the AGVs must be repositioned and recharged, awaiting future transport requests. Each transport request has a soft time window with late delivery incurring a tardiness cost. This research aims to minimize the total costs, consisting of tardiness costs of transport requests and travel costs of AGVs. We extend the existing literature by making a distinction between parking and charging nodes, where AGVs wait idle for incoming transporting requests and satisfy their charging needs, respectively. Also, we formulate this online decision-making problem as a Markov decision process and propose a solution approach based on deep reinforcement learning. To assess the quality of the proposed approach, we compare it with the optimal solution of a mixed-integer linear programming model that assumes full knowledge of transport requests in hindsight and hence serves as a lower-bound on the costs. We also compare our solution with a heuristic policy used in practice. We assess the performance of the proposed solutions in an industry case study using real-world data
    • …
    corecore