Search CORE

1,879 research outputs found

Parallel $Q$ -Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation

Author: Agrawal Pulkit
Ajay Anurag
Chen Tao
Hong Zhang-Wei
Li Zechu
Publication venue
Publication date: 24/07/2023
Field of study

Reinforcement learning is time-consuming for complex tasks due to the need for large amounts of training data. Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU. Most prior works used on-policy methods like PPO due to their simplicity and ease of scaling. Off-policy methods are more data efficient but challenging to scale, resulting in a longer wall-clock training time. This paper presents a Parallel

Q

-Learning (PQL) scheme that outperforms PPO in wall-clock time while maintaining superior sample efficiency of off-policy learning. PQL achieves this by parallelizing data collection, policy learning, and value learning. Different from prior works on distributed off-policy learning, such as Apex, our scheme is designed specifically for massively parallel GPU-based simulation and optimized to work on a single workstation. In experiments, we demonstrate that

Q

-learning can be scaled to \textit{tens of thousands of parallel environments} and investigate important factors affecting learning speed. The code is available at https://github.com/Improbable-AI/pql.Comment: Accepted by ICML 202

arXiv.org e-Print Archive

Towards a decision-support framework for reducing ramp-up effort in plug-and-produce systems

Author: Ali Al-Yacoub (4101397)
Melanie Zimmer (1260246)
Niels Lohse (1251180)
Paul Danny (1254636)
Pedro Ferreira (1253733)
Valerio Gentile (7205822)
Publication venue
Publication date: 01/01/2019
Field of study

Nowadays, shorter and more flexible production cycles are vital to meet the increasing customized product demand. As any delays and downtimes in the production towards time-to-market means a substantial financial loss, manufacturers take an interest in getting the production system to full utilization as quickly as possible. The concept of plug-and-produce manufacturing systems facilitates an easy integration process through embedded intelligence in the devices. However, a human still needs to validate the functionality of the system and more importantly must ensure that the required quality and performance is delivered. This is done during the ramp-up phase, where the system is assembled and tested first-time. System adaptations and a lack of standard procedures make the ramp-up process still largely dependent on the operator’s experience level. A major problem that currently occurs during ramp-up, is a loss of knowledge and information due to a lack of means to capture the human’s experience. Capturing this information can be used to facilitate future ramp-up cases as additional insights about change actions and their effect on the system could be revealed. Hence, this paper proposes a decision-support framework for plugand-produce assembly systems that will help to reduce the ramp-up effort and ultimately shorten ramp-up time. As an illustrative example, a gluing station developed for the European project openMOS is considered

Loughborough University Institutional Repository

University of Birmingham Research Portal

Deep reinforcement learning for solving a multi-objective online order batching problem

Author: Beeks M.S.
Publication venue
Publication date: 31/05/2021
Field of study

Pure OAI Repository

Advances in Condition Monitoring, Optimization and Control for Complex Industrial Processes

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

The book documents 25 papers collected from the Special Issue “Advances in Condition Monitoring, Optimization and Control for Complex Industrial Processes”, highlighting recent research trends in complex industrial processes. The book aims to stimulate the research field and be of benefit to readers from both academic institutes and industrial sectors

Directory of Open Access Books (DOAB)

OmniDrones: An Efficient and Flexible Platform for Reinforcement Learning in Drone Control

Author: Gao Feng
Wang Yu
Wu Yi
Xu Botian
Yu Chao
Zhang Ruize
Publication venue
Publication date: 22/09/2023
Field of study

In this work, we introduce OmniDrones, an efficient and flexible platform tailored for reinforcement learning in drone control, built on Nvidia's Omniverse Isaac Sim. It employs a bottom-up design approach that allows users to easily design and experiment with various application scenarios on top of GPU-parallelized simulations. It also offers a range of benchmark tasks, presenting challenges ranging from single-drone hovering to over-actuated system tracking. In summary, we propose an open-sourced drone simulation platform, equipped with an extensive suite of tools for drone learning. It includes 4 drone models, 5 sensor modalities, 4 control modes, over 10 benchmark tasks, and a selection of widely used RL baselines. To showcase the capabilities of OmniDrones and to support future research, we also provide preliminary results on these benchmark tasks. We hope this platform will encourage further studies on applying RL to practical drone systems.Comment: Submitted to IEEE RA-

arXiv.org e-Print Archive

Dispatching AGVs with Battery Constraints using Deep Reinforcement Learning

Author: Adan Ivo J.B.F.
Akcay Alp
Dang Quang-Vinh
Martagan Tugce G.
Singh Nitish
Publication venue
Publication date: 01/01/2024
Field of study

This paper considers the problem of real-time dispatching of a fleet of automated guided vehicles (AGVs) with battery constraints. AGVs must be immediately assigned to transport requests, which arrive randomly. In addition, the AGVs must be repositioned and recharged, awaiting future transport requests. Each transport request has a soft time window with late delivery incurring a tardiness cost. This research aims to minimize the total costs, consisting of tardiness costs of transport requests and travel costs of AGVs. We extend the existing literature by making a distinction between parking and charging nodes, where AGVs wait idle for incoming transporting requests and satisfy their charging needs, respectively. Also, we formulate this online decision-making problem as a Markov decision process and propose a solution approach based on deep reinforcement learning. To assess the quality of the proposed approach, we compare it with the optimal solution of a mixed-integer linear programming model that assumes full knowledge of transport requests in hindsight and hence serves as a lower-bound on the costs. We also compare our solution with a heuristic policy used in practice. We assess the performance of the proposed solutions in an industry case study using real-world data

Pure OAI Repository