453 research outputs found
Learning and Reasoning for Robot Sequential Decision Making under Uncertainty
Robots frequently face complex tasks that require more than one action, where
sequential decision-making (SDM) capabilities become necessary. The key
contribution of this work is a robot SDM framework, called LCORPP, that
supports the simultaneous capabilities of supervised learning for passive state
estimation, automated reasoning with declarative human knowledge, and planning
under uncertainty toward achieving long-term goals. In particular, we use a
hybrid reasoning paradigm to refine the state estimator, and provide
informative priors for the probabilistic planner. In experiments, a mobile
robot is tasked with estimating human intentions using their motion
trajectories, declarative contextual knowledge, and human-robot interaction
(dialog-based and motion-based). Results suggest that, in efficiency and
accuracy, our framework performs better than its no-learning and no-reasoning
counterparts in office environment.Comment: In proceedings of 34th AAAI conference on Artificial Intelligence,
202
Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal
Model-free reinforcement learning has recently been shown to be effective at
learning navigation policies from complex image input. However, these
algorithms tend to require large amounts of interaction with the environment,
which can be prohibitively costly to obtain on robots in the real world. We
present an approach for efficiently learning goal-directed navigation policies
on a mobile robot, from only a single coverage traversal of recorded data. The
navigation agent learns an effective policy over a diverse action space in a
large heterogeneous environment consisting of more than 2km of travel, through
buildings and outdoor regions that collectively exhibit large variations in
visual appearance, self-similarity, and connectivity. We compare pretrained
visual encoders that enable precomputation of visual embeddings to achieve a
throughput of tens of thousands of transitions per second at training time on a
commodity desktop computer, allowing agents to learn from millions of
trajectories of experience in a matter of hours. We propose multiple forms of
computationally efficient stochastic augmentation to enable the learned policy
to generalise beyond these precomputed embeddings, and demonstrate successful
deployment of the learned policy on the real robot without fine tuning, despite
environmental appearance differences at test time. The dataset and code
required to reproduce these results and apply the technique to other datasets
and robots is made publicly available at rl-navigation.github.io/deployable
Autonomous Reinforcement of Behavioral Sequences in Neural Dynamics
We introduce a dynamic neural algorithm called Dynamic Neural (DN)
SARSA(\lambda) for learning a behavioral sequence from delayed reward.
DN-SARSA(\lambda) combines Dynamic Field Theory models of behavioral sequence
representation, classical reinforcement learning, and a computational
neuroscience model of working memory, called Item and Order working memory,
which serves as an eligibility trace. DN-SARSA(\lambda) is implemented on both
a simulated and real robot that must learn a specific rewarding sequence of
elementary behaviors from exploration. Results show DN-SARSA(\lambda) performs
on the level of the discrete SARSA(\lambda), validating the feasibility of
general reinforcement learning without compromising neural dynamics.Comment: Sohrob Kazerounian, Matthew Luciw are Joint first author
Hierarchical Policy Learning for Mechanical Search
Retrieving objects from clutters is a complex task, which requires multiple
interactions with the environment until the target object can be extracted.
These interactions involve executing action primitives like grasping or pushing
as well as setting priorities for the objects to manipulate and the actions to
execute. Mechanical Search (MS) is a framework for object retrieval, which uses
a heuristic algorithm for pushing and rule-based algorithms for high-level
planning. While rule-based policies profit from human intuition in how they
work, they usually perform sub-optimally in many cases. Deep reinforcement
learning (RL) has shown great performance in complex tasks such as taking
decisions through evaluating pixels, which makes it suitable for training
policies in the context of object-retrieval. In this work, we first formulate
the MS problem in a principled formulation as a hierarchical POMDP. Based on
this formulation, we propose a hierarchical policy learning approach for the MS
problem. For demonstration, we present two main parameterized sub-policies: a
push policy and an action selection policy. When integrated into the
hierarchical POMDP's policy, our proposed sub-policies increase the success
rate of retrieving the target object from less than 32% to nearly 80%, while
reducing the computation time for push actions from multiple seconds to less
than 10 milliseconds.Comment: ICRA 202
- …