8 research outputs found
A Framework for Reinforcement Learning and Planning
Sequential decision making, commonly formalized as Markov Decision Process
optimization, is a key challenge in artificial intelligence. Two successful
approaches to MDP optimization are planning and reinforcement learning. Both
research fields largely have their own research communities. However, if both
research fields solve the same problem, then we should be able to disentangle
the common factors in their solution approaches. Therefore, this paper presents
a unifying framework for reinforcement learning and planning (FRAP), which
identifies the underlying dimensions on which any planning or learning
algorithm has to decide. At the end of the paper, we compare - in a single
table - a variety of well-known planning, model-free and model-based RL
algorithms along the dimensions of our framework, illustrating the validity of
the framework. Altogether, FRAP provides deeper insight into the algorithmic
space of planning and reinforcement learning, and also suggests new approaches
to integration of both fields
PlaNet-ClothPick: Effective Fabric Flattening Based on Latent Dynamic Planning
Why do Recurrent State Space Models such as PlaNet fail at cloth manipulation
tasks? Recent work has attributed this to the blurry prediction of the
observation, which makes it difficult to plan directly in the latent space.
This paper explores the reasons behind this by applying PlaNet in the
pick-and-place fabric-flattening domain. We find that the sharp discontinuity
of the transition function on the contour of the fabric makes it difficult to
learn an accurate latent dynamic model, causing the MPC planner to produce pick
actions slightly outside of the article. By limiting picking space on the cloth
mask and training on specially engineered trajectories, our mesh-free
PlaNet-ClothPick surpasses visual planning and policy learning methods on
principal metrics in simulation, achieving similar performance as
state-of-the-art mesh-based planning approaches. Notably, our model exhibits a
faster action inference and requires fewer transitional model parameters than
the state-of-the-art robotic systems in this domain. Other supplementary
materials are available at: https://sites.google.com/view/planet-clothpick.Comment: 12 pages, 2 tables, and 14 figures. It has been accepted to The 2024
16th IEEE/SICE International Symposium on System Integration, Ha Long,
Vietnam 8-11th January, 202
Learning Team-Based Navigation: A Review of Deep Reinforcement Learning Techniques for Multi-Agent Pathfinding
Multi-agent pathfinding (MAPF) is a critical field in many large-scale
robotic applications, often being the fundamental step in multi-agent systems.
The increasing complexity of MAPF in complex and crowded environments, however,
critically diminishes the effectiveness of existing solutions. In contrast to
other studies that have either presented a general overview of the recent
advancements in MAPF or extensively reviewed Deep Reinforcement Learning (DRL)
within multi-agent system settings independently, our work presented in this
review paper focuses on highlighting the integration of DRL-based approaches in
MAPF. Moreover, we aim to bridge the current gap in evaluating MAPF solutions
by addressing the lack of unified evaluation metrics and providing
comprehensive clarification on these metrics. Finally, our paper discusses the
potential of model-based DRL as a promising future direction and provides its
required foundational understanding to address current challenges in MAPF. Our
objective is to assist readers in gaining insight into the current research
direction, providing unified metrics for comparing different MAPF algorithms
and expanding their knowledge of model-based DRL to address the existing
challenges in MAPF.Comment: 36 pages, 10 figures, published in Artif Intell Rev 57, 41 (2024
A Review of Symbolic, Subsymbolic and Hybrid Methods for Sequential Decision Making
The field of Sequential Decision Making (SDM) provides tools for solving
Sequential Decision Processes (SDPs), where an agent must make a series of
decisions in order to complete a task or achieve a goal. Historically, two
competing SDM paradigms have view for supremacy. Automated Planning (AP)
proposes to solve SDPs by performing a reasoning process over a model of the
world, often represented symbolically. Conversely, Reinforcement Learning (RL)
proposes to learn the solution of the SDP from data, without a world model, and
represent the learned knowledge subsymbolically. In the spirit of
reconciliation, we provide a review of symbolic, subsymbolic and hybrid methods
for SDM. We cover both methods for solving SDPs (e.g., AP, RL and techniques
that learn to plan) and for learning aspects of their structure (e.g., world
models, state invariants and landmarks). To the best of our knowledge, no other
review in the field provides the same scope. As an additional contribution, we
discuss what properties an ideal method for SDM should exhibit and argue that
neurosymbolic AI is the current approach which most closely resembles this
ideal method. Finally, we outline several proposals to advance the field of SDM
via the integration of symbolic and subsymbolic AI
Robust online planning with imperfect models
Environment models are not always known a priori, and approximating stochastic transition dynamics may introduce errors, especially if only a small amount of data is available and/or model misspecification is
Data-driven robotic manipulation of cloth-like deformable objects : the present, challenges and future prospects
Manipulating cloth-like deformable objects (CDOs) is a long-standing problem in the robotics community. CDOs are flexible (non-rigid) objects that do not show a detectable level of compression strength while two points on the article are pushed towards each other and include objects such as ropes (1D), fabrics (2D) and bags (3D). In general, CDOs’ many degrees of freedom (DoF) introduce severe self-occlusion and complex state–action dynamics as significant obstacles to perception and manipulation systems. These challenges exacerbate existing issues of modern robotic control methods such as imitation learning (IL) and reinforcement learning (RL). This review focuses on the application details of data-driven control methods on four major task families in this domain: cloth shaping, knot tying/untying, dressing and bag manipulation. Furthermore, we identify specific inductive biases in these four domains that present challenges for more general IL and RL algorithms.Publisher PDFPeer reviewe