3 research outputs found
Daten integrations strategien für verteiltes verstärkendes lernen in der robotik
The field of reinforcement learning, developed during the nineteen-eighties and nineties, is a branch of machine learning which has consistently shown wide potential. Using this theory, it is possible to design computer programs able to learn which actions must be taken, in a given environment, to maximise a cumulative reward function. In other words, by rewarding the program, it is able to learn how to behave in order to solve a problem. Originally this field was mainly applied to discrete and finite environments, however, it was possible to handle continuous environments using traditional function approximators. Recently the field has experienced a revolution, with the increase of the computational capacity, which enabled the use of artificial neural networks as function approximators. It has shown surprising results previously thought unfeasible and the number of fields where it may be applied has drastically increased. Robotics is one of them and in the past few years the achieved results have been very promising. In general, and in robotics, one of the topics still to be deeply explored is the learning distribution. This distribution means to parallelise the learning, in other words, to have many workers facing the problem and sharing information instead of one isolated worker. With it, the learning can be optimised; involving shorter learning times and better knowledge of the environment among many other advantages. To contribute to this topic, in this project three different distributed architectures, based on the state-ofthe-art algorithms, will be designed and implemented. The learning will be distributed using many simulated robotic arms, that will work in parallel performing the same task.Outgoin
Obedience-based Multi-Agent Cooperation for Sequential Social Dilemmas
We propose a mechanism for achieving cooperation and communication in Multi-Agent Reinforcement Learning (MARL) settings by intrinsically rewarding agents for obeying the commands of other agents. At every timestep, agents exchange commands through a cheap-talk channel. During the following timestep, agents are rewarded both for taking actions that conform to commands received as well as for giving successful commands. We refer to this approach as obedience-based learning.
We demonstrate the potential for obedience-based approaches to enhance coordination and communication in challenging sequential social dilemmas, where traditional MARL approaches often collapse without centralized training or specialized architectures. We also demonstrate the flexibility of this approach with regards to population heterogeneity and vocabulary size.
Obedience-based learning stands out as an intuitive form of cooperation with minimal complexity and overhead that can be applied to heterogeneous populations. In contrast, previous works with sequential social dilemmas are often restricted to homogeneous populations and require complete knowledge of every player's reward structure. Obedience-based learning is a promising direction for exploration in the field of cooperative MARL
Advancing Robot Autonomy for Long-Horizon Tasks
Autonomous robots have real-world applications in diverse fields, such as
mobile manipulation and environmental exploration, and many such tasks benefit
from a hands-off approach in terms of human user involvement over a long task
horizon. However, the level of autonomy achievable by a deployment is limited
in part by the problem definition or task specification required by the system.
Task specifications often require technical, low-level information that is
unintuitive to describe and may result in generic solutions, burdening the user
technically both before and after task completion. In this thesis, we aim to
advance task specification abstraction toward the goal of increasing robot
autonomy in real-world scenarios. We do so by tackling problems that address
several different angles of this goal. First, we develop a way for the
automatic discovery of optimal transition points between subtasks in the
context of constrained mobile manipulation, removing the need for the human to
hand-specify these in the task specification. We further propose a way to
automatically describe constraints on robot motion by using demonstrated data
as opposed to manually-defined constraints. Then, within the context of
environmental exploration, we propose a flexible task specification framework,
requiring just a set of quantiles of interest from the user that allows the
robot to directly suggest locations in the environment for the user to study.
We next systematically study the effect of including a robot team in the task
specification and show that multirobot teams have the ability to improve
performance under certain specification conditions, including enabling
inter-robot communication. Finally, we propose methods for a communication
protocol that autonomously selects useful but limited information to share with
the other robots.Comment: PhD dissertation. 160 page