20,587 research outputs found
A Developmental Organization for Robot Behavior
This paper focuses on exploring how learning and development can be structured in synthetic (robot) systems. We present a developmental assembler for constructing reusable and temporally extended actions in a sequence. The discussion adopts the traditions
of dynamic pattern theory in which behavior
is an artifact of coupled dynamical systems
with a number of controllable degrees of freedom. In our model, the events that delineate
control decisions are derived from the pattern
of (dis)equilibria on a working subset of sensorimotor policies. We show how this architecture can be used to accomplish sequential
knowledge gathering and representation tasks
and provide examples of the kind of developmental milestones that this approach has
already produced in our lab
Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
We propose to directly map raw visual observations and text input to actions
for instruction execution. While existing approaches assume access to
structured environment representations or use a pipeline of separately trained
models, we learn a single model to jointly reason about linguistic and visual
input. We use reinforcement learning in a contextual bandit setting to train a
neural network agent. To guide the agent's exploration, we use reward shaping
with different forms of supervision. Our approach does not require intermediate
representations, planning procedures, or training different models. We evaluate
in a simulated environment, and show significant improvements over supervised
learning and common reinforcement learning variants.Comment: In Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP), 201
Reinforcement Learning With Temporal Logic Rewards
Reinforcement learning (RL) depends critically on the choice of reward
functions used to capture the de- sired behavior and constraints of a robot.
Usually, these are handcrafted by a expert designer and represent heuristics
for relatively simple tasks. Real world applications typically involve more
complex tasks with rich temporal and logical structure. In this paper we take
advantage of the expressive power of temporal logic (TL) to specify complex
rules the robot should follow, and incorporate domain knowledge into learning.
We propose Truncated Linear Temporal Logic (TLTL) as specifications language,
that is arguably well suited for the robotics applications, together with
quantitative semantics, i.e., robustness degree. We propose a RL approach to
learn tasks expressed as TLTL formulae that uses their associated robustness
degree as reward functions, instead of the manually crafted heuristics trying
to capture the same specifications. We show in simulated trials that learning
is faster and policies obtained using the proposed approach outperform the ones
learned using heuristic rewards in terms of the robustness degree, i.e., how
well the tasks are satisfied. Furthermore, we demonstrate the proposed RL
approach in a toast-placing task learned by a Baxter robot
- …