9 research outputs found
Inferring Actions, Intentions, and Causal Relations in a Deep Neural Network
From a young age, we can select actions to achieve desired goals, infer the goals of other agents, and learn causal relations in our environment through social interactions. Crucially, these abilities are productive and generative: we can impute desires to others that we have never held ourselves. These abilities are often captured by only partially overlapping models, each requiring substantial changes to fit combinations of abilities. Here, in an attempt to unify previous models, we present a neural network underpinned by the linearly solvable Markov Decision Process (LMDP) framework which permits a distributed representation of tasks. The network contains two pathways: one captures the desirability of states, and another encodes the passive dynamics of state transitions in the absence of control. Interactions between pathways are bound by a principle of rational action, enabling generative inference of actions, goals, and causal relations supported by gradient updates to parts of the network
Danger-aware Adaptive Composition of DRL Agents for Self-navigation
Self-navigation, referred as the capability of automatically reaching the
goal while avoiding collisions with obstacles, is a fundamental skill required
for mobile robots. Recently, deep reinforcement learning (DRL) has shown great
potential in the development of robot navigation algorithms. However, it is
still difficult to train the robot to learn goal-reaching and
obstacle-avoidance skills simultaneously. On the other hand, although many
DRL-based obstacle-avoidance algorithms are proposed, few of them are reused
for more complex navigation tasks. In this paper, a novel danger-aware adaptive
composition (DAAC) framework is proposed to combine two individually
DRL-trained agents, obstacle-avoidance and goal-reaching, to construct a
navigation agent without any redesigning and retraining. The key to this
adaptive composition approach is that the value function outputted by the
obstacle-avoidance agent serves as an indicator for evaluating the risk level
of the current situation, which in turn determines the contribution of these
two agents for the next move. Simulation and real-world testing results show
that the composed Navigation network can control the robot to accomplish
difficult navigation tasks, e.g., reaching a series of successive goals in an
unknown and complex environment safely and quickly.Comment: 7 pages, 9 figure
Hierarchy through composition with multitask LMDPs
Hierarchical architectures are critical to the
scalability of reinforcement learning methods.
Most current hierarchical frameworks execute actions serially, with macro-actions comprising sequences of primitive actions. We propose a novel
alternative to these control hierarchies based on
concurrent execution of many actions in parallel.
Our scheme exploits the guaranteed concurrent
compositionality provided by the linearly solvable
Markov decision process (LMDP) framework,
which naturally enables a learning agent to draw
on several macro-actions simultaneously to solve
new tasks. We introduce the Multitask LMDP
module, which maintains a parallel distributed
representation of tasks and may be stacked to form
deep hierarchies abstracted in space and time