480 research outputs found
On Partially Controlled Multi-Agent Systems
Motivated by the control theoretic distinction between controllable and
uncontrollable events, we distinguish between two types of agents within a
multi-agent system: controllable agents, which are directly controlled by the
system's designer, and uncontrollable agents, which are not under the
designer's direct control. We refer to such systems as partially controlled
multi-agent systems, and we investigate how one might influence the behavior of
the uncontrolled agents through appropriate design of the controlled agents. In
particular, we wish to understand which problems are naturally described in
these terms, what methods can be applied to influence the uncontrollable
agents, the effectiveness of such methods, and whether similar methods work
across different domains. Using a game-theoretic framework, this paper studies
the design of partially controlled multi-agent systems in two contexts: in one
context, the uncontrollable agents are expected utility maximizers, while in
the other they are reinforcement learners. We suggest different techniques for
controlling agents' behavior in each domain, assess their success, and examine
their relationship.Comment: See http://www.jair.org/ for any accompanying file
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
Asimovian Adaptive Agents
The goal of this research is to develop agents that are adaptive and
predictable and timely. At first blush, these three requirements seem
contradictory. For example, adaptation risks introducing undesirable side
effects, thereby making agents' behavior less predictable. Furthermore,
although formal verification can assist in ensuring behavioral predictability,
it is known to be time-consuming. Our solution to the challenge of satisfying
all three requirements is the following. Agents have finite-state automaton
plans, which are adapted online via evolutionary learning (perturbation)
operators. To ensure that critical behavioral constraints are always satisfied,
agents' plans are first formally verified. They are then reverified after every
adaptation. If reverification concludes that constraints are violated, the
plans are repaired. The main objective of this paper is to improve the
efficiency of reverification after learning, so that agents have a sufficiently
rapid response time. We present two solutions: positive results that certain
learning operators are a priori guaranteed to preserve useful classes of
behavioral assurance constraints (which implies that no reverification is
needed for these operators), and efficient incremental reverification
algorithms for those learning operators that have negative a priori results
Toward Agent Programs with Circuit Semantics
New ideas are presented for computing and organizing actions for autonomous agents in dynamic environments-environments in which the agent's current situation cannot always be accurately discerned and in which the effects of actions cannot always be reliably predicted. The notion of 'circuit semantics' for programs based on 'teleo-reactive trees' is introduced. Program execution builds a combinational circuit which receives sensory inputs and controls actions. These formalisms embody a high degree of inherent conditionality and thus yield programs that are suitably reactive to their environments. At the same time, the actions computed by the programs are guided by the overall goals of the agent. The paper also speculates about how programs using these ideas could be automatically generated by artificial intelligence planning systems and adapted by learning methods
Artificial societies and information theory: modelling of sub system formation based on Luhmann's autopoietic theory
This thesis develops a theoretical framework for the generation of artificial societies. In particular
it shows how sub-systems emerge when the agents are able to learn and have the ability
to communicate.
This novel theoretical framework integrates the autopoietic hypothesis of human societies, formulated
originally by the German sociologist Luhmann, with concepts of Shannon's information
theory applied to adaptive learning agents.
Simulations were executed using Multi-Agent-Based Modelling (ABM), a relatively new computational
modelling paradigm involving the modelling of phenomena as dynamical systems of
interacting agents. The thesis in particular, investigates the functions and properties necessary
to reproduce the paradigm of society by using the mentioned ABM approach.
Luhmann has proposed that in society subsystems are formed to reduce uncertainty. Subsystems
can then be composed by agents with a reduced behavioural complexity. For example in
society there are people who produce goods and other who distribute them.
Both the behaviour and communication is learned by the agent and not imposed. The simulated
task is to collect food, keep it and eat it until sated. Every agent communicates its energy state
to the neighbouring agents. This results in two subsystems whereas agents in the first collect
food and in the latter steal food from others. The ratio between the number of agents that
belongs to the first system and to the second system, depends on the number of food resources.
Simulations are in accordance with Luhmann, who suggested that adaptive agents self-organise
by reducing the amount of sensory information or, equivalently, reducing the complexity of the
perceived environment from the agent's perspective. Shannon's information theorem is used
to assess the performance of the simulated learning agents. A practical measure, based on the
concept of Shannon's information
ow, is developed and applied to adaptive controllers which
use Hebbian learning, input correlation learning (ICO/ISO) and temporal difference learning.
The behavioural complexity is measured with a novel information measure, called Predictive
Performance, which is able to measure at a subjective level how good an agent is performing
a task. This is then used to quantify the social division of tasks in a social group of honest,
cooperative food foraging, communicating agents
Resource-aware plan recognition in instrumented environments
This thesis addresses the problem of plan recognition in instrumented environments, which is to infer an agent';s plans by observing its behavior. In instrumented environments such observations are made by physical sensors. This introduces specific challenges, of which the following two are considered in this thesis:
- Physical sensors often observe state information instead of actions. As classical plan recognition approaches usually can only deal with action observations, this requires a cumbersome and error-prone inference of executed actions from observed states.
- Due to limited physical resources of the environment it is often not possible to run all sensors at the same time, thus sensor selection techniques have to be applied. Current plan recognition approaches are not able to support the environment in selecting relevant subsets of sensors.
This thesis proposes a two-stage approach to solve the problems described above. Firstly, a DBN-based plan recognition approach is presented which allows for the explicit representation and consideration of state knowledge. Secondly, a POMDP-based utility model for observation sources is presented which can be used with generic utility-based sensor selection algorithms. Further contributions include the presentation of a software toolkit that realizes plan recognition and sensor selection in instrumented environments, and an empirical evaluation of the validity and performance of the proposed models.Diese Arbeit behandelt das Problem der Planerkennung in instrumentierten Umgebungen. Ziel ist dabei das Erschließen der Pläne des Nutzers anhand der Beobachtung seiner Handlungen. In instrumentierten Umgebungen erfolgt diese Beobachtung über physische Sensoren. Dies wirft spezifische Probleme auf, von denen zwei in dieser Arbeit näher betrachtet werden:
- Physische Sensoren beobachten in der Regel Zustände anstelle direkter Nutzeraktionen. Klassische Planerkennungsverfahren basieren jedoch auf der Beobachtung von Aktionen, was bisher eine aufwendige und fehlerträchtige Ableitung von Aktionen aus Zustandsbeobachtungen notwendig macht.
- Aufgrund beschränkter Resourcen der Umgebung ist es oft nicht möglich alle Sensoren gleichzeitig zu aktivieren. Aktuelle Planerkennungsverfahren bieten keine Möglichkeit, die Umgebung bei der Auswahl einer relevanten Teilmenge von Sensoren zu unterstützen.
Diese Arbeit beschreibt einen zweistufigen Ansatz zur Lösung der genannten Probleme. Zunächst wird ein DBN-basiertes Planerkennungsverfahren vorgestellt, das Zustandswissen explizit repräsentiert und in Schlussfolgerungen berücksichtigt. Dieses Verfahren bildet die Basis für ein POMDP-basiertes Nutzenmodell für Beobachtungsquellen, das für den Zweck der Sensorauswahl genutzt werden kann. Des Weiteren wird ein Toolkit zur Realisierung von Planerkennungs- und Sensorauswahlfunktionen vorgestellt sowie die Gültigkeit und Performanz der vorgestellten Modelle in einer empirischen Studie evaluiert
Practical strategies for agent-based negotiation in complex environments
Agent-based negotiation, whereby the negotiation is automated by software programs, can be applied to many different negotiation situations, including negotiations between friends, businesses or countries. A key benefit of agent-based negotiation over human negotiation is that it can be used to negotiate effectively in complex negotiation environments, which consist of multiple negotiation issues, time constraints, and multiple unknown opponents. While automated negotiation has been an active area of research in the past twenty years, existing work has a number of limitations. Specifically, most of the existing literature has considered time constraints in terms of the number of rounds of negotiation that take place. In contrast, in this work we consider time constraints which are based on the amount of time that has elapsed. This requires a different approach, since the time spent computing the next action has an effect on the utility of the outcome, whereas the actual number of offers exchanged does not. In addition to these time constraints, in the complex negotiation environments which we consider, there are multiple negotiation issues, and we assume that the opponents’ preferences over these issues and the behaviour of those opponents are unknown. Finally, in our environment there can be concurrent negotiations between many participants.Against this background, in this thesis we present the design of a range of practical negotiation strategies, the most advanced of which uses Gaussian process regression to coordinate its concession against its various opponents, whilst considering the behaviour of those opponents and the time constraints. In more detail, the strategy uses observations of the offers made by each opponent to predict the future concession of that opponent. By considering the discounting factor, it predicts the future time which maximises the utility of the offers, and we then use this in setting our rate of concession.Furthermore, we evaluate the negotiation agents that we have developed, which use our strategies, and show that, particularly in the more challenging scenarios, our most advanced strategy outperforms other state-of-the-art agents from the Automated Negotiating Agent Competition, which provides an international benchmark for this work. In more detail, our results show that, in one-to-one negotiation, in the highly discounted scenarios, our agent reaches outcomes which, on average, are 2.3% higher than those of the next best agent. Furthermore, using empirical game theoretic analysis we show the robustness of our strategy in a variety of tournament settings. This analysis shows that, in the highly discounted scenarios, no agent can benefit by choosing a different strategy (taken from the top four strategies in that setting) than ours. Finally, in the many-to-many negotiations, we show how our strategy is particularly effective in highly competitive scenarios, where it outperforms the state-of-the-art many-to-many negotiation strategy by up to 45%
- …