3,082 research outputs found
LEAGUE: Guided Skill Learning and Abstraction for Long-Horizon Manipulation
To assist with everyday human activities, robots must solve complex
long-horizon tasks and generalize to new settings. Recent deep reinforcement
learning (RL) methods show promise in fully autonomous learning, but they
struggle to reach long-term goals in large environments. On the other hand,
Task and Motion Planning (TAMP) approaches excel at solving and generalizing
across long-horizon tasks, thanks to their powerful state and action
abstractions. But they assume predefined skill sets, which limits their
real-world applications. In this work, we combine the benefits of these two
paradigms and propose an integrated task planning and skill learning framework
named LEAGUE (Learning and Abstraction with Guidance). LEAGUE leverages the
symbolic interface of a task planner to guide RL-based skill learning and
creates abstract state space to enable skill reuse. More importantly, LEAGUE
learns manipulation skills in-situ of the task planning system, continuously
growing its capability and the set of tasks that it can solve. We evaluate
LEAGUE on four challenging simulated task domains and show that LEAGUE
outperforms baselines by large margins. We also show that the learned skills
can be reused to accelerate learning in new tasks domains and transfer to a
physical robot platform.Comment: Accepted to RA-L 202
Vehicle Dispatching and Routing of On-Demand Intercity Ride-Pooling Services: A Multi-Agent Hierarchical Reinforcement Learning Approach
The integrated development of city clusters has given rise to an increasing
demand for intercity travel. Intercity ride-pooling service exhibits
considerable potential in upgrading traditional intercity bus services by
implementing demand-responsive enhancements. Nevertheless, its online
operations suffer the inherent complexities due to the coupling of vehicle
resource allocation among cities and pooled-ride vehicle routing. To tackle
these challenges, this study proposes a two-level framework designed to
facilitate online fleet management. Specifically, a novel multi-agent feudal
reinforcement learning model is proposed at the upper level of the framework to
cooperatively assign idle vehicles to different intercity lines, while the
lower level updates the routes of vehicles using an adaptive large neighborhood
search heuristic. Numerical studies based on the realistic dataset of Xiamen
and its surrounding cities in China show that the proposed framework
effectively mitigates the supply and demand imbalances, and achieves
significant improvement in both the average daily system profit and order
fulfillment ratio
Latent Plans for Task-Agnostic Offline Reinforcement Learning
Everyday tasks of long-horizon and comprising a sequence of multiple implicit
subtasks still impose a major challenge in offline robot control. While a
number of prior methods aimed to address this setting with variants of
imitation and offline reinforcement learning, the learned behavior is typically
narrow and often struggles to reach configurable long-horizon goals. As both
paradigms have complementary strengths and weaknesses, we propose a novel
hierarchical approach that combines the strengths of both methods to learn
task-agnostic long-horizon policies from high-dimensional camera observations.
Concretely, we combine a low-level policy that learns latent skills via
imitation learning and a high-level policy learned from offline reinforcement
learning for skill-chaining the latent behavior priors. Experiments in various
simulated and real robot control tasks show that our formulation enables
producing previously unseen combinations of skills to reach temporally extended
goals by "stitching" together latent skills through goal chaining with an
order-of-magnitude improvement in performance upon state-of-the-art baselines.
We even learn one multi-task visuomotor policy for 25 distinct manipulation
tasks in the real world which outperforms both imitation learning and offline
reinforcement learning techniques.Comment: CoRL 2022. Project website: http://tacorl.cs.uni-freiburg.de
EASpace: Enhanced Action Space for Policy Transfer
Formulating expert policies as macro actions promises to alleviate the
long-horizon issue via structured exploration and efficient credit assignment.
However, traditional option-based multi-policy transfer methods suffer from
inefficient exploration of macro action's length and insufficient exploitation
of useful long-duration macro actions. In this paper, a novel algorithm named
EASpace (Enhanced Action Space) is proposed, which formulates macro actions in
an alternative form to accelerate the learning process using multiple available
sub-optimal expert policies. Specifically, EASpace formulates each expert
policy into multiple macro actions with different execution {times}. All the
macro actions are then integrated into the primitive action space directly. An
intrinsic reward, which is proportional to the execution time of macro actions,
is introduced to encourage the exploitation of useful macro actions. The
corresponding learning rule that is similar to Intra-option Q-learning is
employed to improve the data efficiency. Theoretical analysis is presented to
show the convergence of the proposed learning rule. The efficiency of EASpace
is illustrated by a grid-based game and a multi-agent pursuit problem. The
proposed algorithm is also implemented in physical systems to validate its
effectiveness.Comment: 15 Page
Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics
Skill-based reinforcement learning (RL) has emerged as a promising strategy
to leverage prior knowledge for accelerated robot learning. Skills are
typically extracted from expert demonstrations and are embedded into a latent
space from which they can be sampled as actions by a high-level RL agent.
However, this skill space is expansive, and not all skills are relevant for a
given robot state, making exploration difficult. Furthermore, the downstream RL
agent is limited to learning structurally similar tasks to those used to
construct the skill space. We firstly propose accelerating exploration in the
skill space using state-conditioned generative models to directly bias the
high-level agent towards only sampling skills relevant to a given state based
on prior experience. Next, we propose a low-level residual policy for
fine-grained skill adaptation enabling downstream RL agents to adapt to unseen
task variations. Finally, we validate our approach across four challenging
manipulation tasks that differ from those used to build the skill space,
demonstrating our ability to learn across task variations while significantly
accelerating exploration, outperforming prior works. Code and videos are
available on our project website: https://krishanrana.github.io/reskill.Comment: 6th Conference on Robot Learning (CoRL), 202
Prioritizing starting states for reinforcement learning
Online, off-policy reinforcement learning algorithms are able to use an experience memory to remember and replay past experiences. In prior work, this approach was used to stabilize training by breaking the temporal correlations of the updates and avoiding the rapid forgetting of possibly rare experiences. In this work, we propose a conceptually simple framework that uses an experience memory to help exploration by prioritizing the starting states from which the agent starts acting in the environment, importantly, in a fashion that is also compatible with on-policy algorithms. Given the capacity to restart the agent in states corresponding to its past observations, we achieve this objective by (i) enabling the agent to restart in states belonging to significant past experiences (e.g., nearby goals), and (ii) promoting faster coverage of the state space through starting from a more diverse set of states. While, using a good priority measure to identify significant past transitions, we expect case (i) to more considerably help exploration in certain domains (e.g., sparse reward tasks), we hypothesize that case (ii) will generally be beneficial, even without any prioritization. We show empirically that our approach improves learning performance for both off-policy and on-policy deep reinforcement learning methods, with most notable gains in highly sparse reward tasks
System Optimisation for Multi-access Edge Computing Based on Deep Reinforcement Learning
Multi-access edge computing (MEC) is an emerging and important distributed computing paradigm that aims to extend cloud service to the network edge to reduce network traffic and service latency. Proper system optimisation and maintenance are crucial to maintaining high Quality-of-service (QoS) for end-users. However, with the increasing complexity of the architecture of MEC and mobile applications, effectively optimising MEC systems is non-trivial. Traditional optimisation methods are generally based on simplified mathematical models and fixed heuristics, which rely heavily on expert knowledge. As a consequence, when facing dynamic MEC scenarios, considerable human efforts and expertise are required to redesign the model and tune the heuristics, which is time-consuming.
This thesis aims to develop deep reinforcement learning (DRL) methods to handle system optimisation problems in MEC. Instead of developing fixed heuristic algorithms for the problems, this thesis aims to design DRL-based methods that enable systems to learn optimal solutions on their own. This research demonstrates the effectiveness of DRL-based methods on two crucial system optimisation problems: task offloading and service migration. Specifically, this thesis first investigate the dependent task offloading problem that considers the inner dependencies of tasks. This research builds a DRL-based method combining sequence-to-sequence (seq2seq) neural network to address the problem. Experiment results demonstrate that our method outperforms the existing heuristic algorithms and achieves near-optimal performance. To further enhance the learning efficiency of the DRL-based task offloading method for unseen learning tasks, this thesis then integrates meta reinforcement learning to handle the task offloading problem. Our method can adapt fast to new environments with a small number of gradient updates and samples. Finally, this thesis exploits the DRL-based solution for the service migration problem in MEC considering user mobility. This research models the service migration problem as a Partially Observable Markov Decision Process (POMDP) and propose a tailored actor-critic algorithm combining Long-short Term Memory (LSTM) to solve the POMDP. Results from extensive experiments based on real-world mobility traces demonstrate that our method consistently outperforms both the heuristic and state-of-the-art learning-driven algorithms on various MEC scenarios
Adaptive Railway Traffic Control using Approximate Dynamic Programming
Railway networks around the world have become challenging to operate in recent decades, with a mixture of track layouts running several different classes of trains with varying operational speeds. This complexity has come about as a result of the sustained increase in passenger numbers where in many countries railways are now more popular than ever before as means of commuting to cities. To address operational challenges, governments and railway undertakings are encouraging development of intelligent and digital transport systems to regulate and optimise train operations in real-time to increase capacity and customer satisfaction by improved usage of existing railway infrastructure. Accordingly, this thesis presents an adaptive railway traffic control system for realtime operations based on a data-based approximate dynamic programming (ADP) approach with integrated reinforcement learning (RL). By assessing requirements and opportunities, the controller aims to reduce delays resulting from trains that entered a control area behind schedule by re-scheduling control plans in real-time at critical locations in a timely manner. The present data-based approach depends on an approximation to the value function of dynamic programming after optimisation from a specified state, which is estimated dynamically from operational experience using RL techniques. By using this approximation, ADP avoids extensive explicit evaluation of performance and so reduces the computational burden substantially. In this thesis, formulations of the approximation function and variants of the RL learning techniques used to estimate it are explored. Evaluation of this controller shows considerable improvements in delays by comparison with current industry practices
- …