480 research outputs found

    On Partially Controlled Multi-Agent Systems

    Full text link
    Motivated by the control theoretic distinction between controllable and uncontrollable events, we distinguish between two types of agents within a multi-agent system: controllable agents, which are directly controlled by the system's designer, and uncontrollable agents, which are not under the designer's direct control. We refer to such systems as partially controlled multi-agent systems, and we investigate how one might influence the behavior of the uncontrolled agents through appropriate design of the controlled agents. In particular, we wish to understand which problems are naturally described in these terms, what methods can be applied to influence the uncontrollable agents, the effectiveness of such methods, and whether similar methods work across different domains. Using a game-theoretic framework, this paper studies the design of partially controlled multi-agent systems in two contexts: in one context, the uncontrollable agents are expected utility maximizers, while in the other they are reinforcement learners. We suggest different techniques for controlling agents' behavior in each domain, assess their success, and examine their relationship.Comment: See http://www.jair.org/ for any accompanying file

    Reinforcement Learning: A Survey

    Full text link
    This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

    Asimovian Adaptive Agents

    Full text link
    The goal of this research is to develop agents that are adaptive and predictable and timely. At first blush, these three requirements seem contradictory. For example, adaptation risks introducing undesirable side effects, thereby making agents' behavior less predictable. Furthermore, although formal verification can assist in ensuring behavioral predictability, it is known to be time-consuming. Our solution to the challenge of satisfying all three requirements is the following. Agents have finite-state automaton plans, which are adapted online via evolutionary learning (perturbation) operators. To ensure that critical behavioral constraints are always satisfied, agents' plans are first formally verified. They are then reverified after every adaptation. If reverification concludes that constraints are violated, the plans are repaired. The main objective of this paper is to improve the efficiency of reverification after learning, so that agents have a sufficiently rapid response time. We present two solutions: positive results that certain learning operators are a priori guaranteed to preserve useful classes of behavioral assurance constraints (which implies that no reverification is needed for these operators), and efficient incremental reverification algorithms for those learning operators that have negative a priori results

    Toward Agent Programs with Circuit Semantics

    Get PDF
    New ideas are presented for computing and organizing actions for autonomous agents in dynamic environments-environments in which the agent's current situation cannot always be accurately discerned and in which the effects of actions cannot always be reliably predicted. The notion of 'circuit semantics' for programs based on 'teleo-reactive trees' is introduced. Program execution builds a combinational circuit which receives sensory inputs and controls actions. These formalisms embody a high degree of inherent conditionality and thus yield programs that are suitably reactive to their environments. At the same time, the actions computed by the programs are guided by the overall goals of the agent. The paper also speculates about how programs using these ideas could be automatically generated by artificial intelligence planning systems and adapted by learning methods

    Artificial societies and information theory: modelling of sub system formation based on Luhmann's autopoietic theory

    Get PDF
    This thesis develops a theoretical framework for the generation of artificial societies. In particular it shows how sub-systems emerge when the agents are able to learn and have the ability to communicate. This novel theoretical framework integrates the autopoietic hypothesis of human societies, formulated originally by the German sociologist Luhmann, with concepts of Shannon's information theory applied to adaptive learning agents. Simulations were executed using Multi-Agent-Based Modelling (ABM), a relatively new computational modelling paradigm involving the modelling of phenomena as dynamical systems of interacting agents. The thesis in particular, investigates the functions and properties necessary to reproduce the paradigm of society by using the mentioned ABM approach. Luhmann has proposed that in society subsystems are formed to reduce uncertainty. Subsystems can then be composed by agents with a reduced behavioural complexity. For example in society there are people who produce goods and other who distribute them. Both the behaviour and communication is learned by the agent and not imposed. The simulated task is to collect food, keep it and eat it until sated. Every agent communicates its energy state to the neighbouring agents. This results in two subsystems whereas agents in the first collect food and in the latter steal food from others. The ratio between the number of agents that belongs to the first system and to the second system, depends on the number of food resources. Simulations are in accordance with Luhmann, who suggested that adaptive agents self-organise by reducing the amount of sensory information or, equivalently, reducing the complexity of the perceived environment from the agent's perspective. Shannon's information theorem is used to assess the performance of the simulated learning agents. A practical measure, based on the concept of Shannon's information ow, is developed and applied to adaptive controllers which use Hebbian learning, input correlation learning (ICO/ISO) and temporal difference learning. The behavioural complexity is measured with a novel information measure, called Predictive Performance, which is able to measure at a subjective level how good an agent is performing a task. This is then used to quantify the social division of tasks in a social group of honest, cooperative food foraging, communicating agents

    Resource-aware plan recognition in instrumented environments

    Get PDF
    This thesis addresses the problem of plan recognition in instrumented environments, which is to infer an agent';s plans by observing its behavior. In instrumented environments such observations are made by physical sensors. This introduces specific challenges, of which the following two are considered in this thesis: - Physical sensors often observe state information instead of actions. As classical plan recognition approaches usually can only deal with action observations, this requires a cumbersome and error-prone inference of executed actions from observed states. - Due to limited physical resources of the environment it is often not possible to run all sensors at the same time, thus sensor selection techniques have to be applied. Current plan recognition approaches are not able to support the environment in selecting relevant subsets of sensors. This thesis proposes a two-stage approach to solve the problems described above. Firstly, a DBN-based plan recognition approach is presented which allows for the explicit representation and consideration of state knowledge. Secondly, a POMDP-based utility model for observation sources is presented which can be used with generic utility-based sensor selection algorithms. Further contributions include the presentation of a software toolkit that realizes plan recognition and sensor selection in instrumented environments, and an empirical evaluation of the validity and performance of the proposed models.Diese Arbeit behandelt das Problem der Planerkennung in instrumentierten Umgebungen. Ziel ist dabei das Erschließen der Pläne des Nutzers anhand der Beobachtung seiner Handlungen. In instrumentierten Umgebungen erfolgt diese Beobachtung über physische Sensoren. Dies wirft spezifische Probleme auf, von denen zwei in dieser Arbeit näher betrachtet werden: - Physische Sensoren beobachten in der Regel Zustände anstelle direkter Nutzeraktionen. Klassische Planerkennungsverfahren basieren jedoch auf der Beobachtung von Aktionen, was bisher eine aufwendige und fehlerträchtige Ableitung von Aktionen aus Zustandsbeobachtungen notwendig macht. - Aufgrund beschränkter Resourcen der Umgebung ist es oft nicht möglich alle Sensoren gleichzeitig zu aktivieren. Aktuelle Planerkennungsverfahren bieten keine Möglichkeit, die Umgebung bei der Auswahl einer relevanten Teilmenge von Sensoren zu unterstützen. Diese Arbeit beschreibt einen zweistufigen Ansatz zur Lösung der genannten Probleme. Zunächst wird ein DBN-basiertes Planerkennungsverfahren vorgestellt, das Zustandswissen explizit repräsentiert und in Schlussfolgerungen berücksichtigt. Dieses Verfahren bildet die Basis für ein POMDP-basiertes Nutzenmodell für Beobachtungsquellen, das für den Zweck der Sensorauswahl genutzt werden kann. Des Weiteren wird ein Toolkit zur Realisierung von Planerkennungs- und Sensorauswahlfunktionen vorgestellt sowie die Gültigkeit und Performanz der vorgestellten Modelle in einer empirischen Studie evaluiert

    Practical strategies for agent-based negotiation in complex environments

    No full text
    Agent-based negotiation, whereby the negotiation is automated by software programs, can be applied to many different negotiation situations, including negotiations between friends, businesses or countries. A key benefit of agent-based negotiation over human negotiation is that it can be used to negotiate effectively in complex negotiation environments, which consist of multiple negotiation issues, time constraints, and multiple unknown opponents. While automated negotiation has been an active area of research in the past twenty years, existing work has a number of limitations. Specifically, most of the existing literature has considered time constraints in terms of the number of rounds of negotiation that take place. In contrast, in this work we consider time constraints which are based on the amount of time that has elapsed. This requires a different approach, since the time spent computing the next action has an effect on the utility of the outcome, whereas the actual number of offers exchanged does not. In addition to these time constraints, in the complex negotiation environments which we consider, there are multiple negotiation issues, and we assume that the opponents’ preferences over these issues and the behaviour of those opponents are unknown. Finally, in our environment there can be concurrent negotiations between many participants.Against this background, in this thesis we present the design of a range of practical negotiation strategies, the most advanced of which uses Gaussian process regression to coordinate its concession against its various opponents, whilst considering the behaviour of those opponents and the time constraints. In more detail, the strategy uses observations of the offers made by each opponent to predict the future concession of that opponent. By considering the discounting factor, it predicts the future time which maximises the utility of the offers, and we then use this in setting our rate of concession.Furthermore, we evaluate the negotiation agents that we have developed, which use our strategies, and show that, particularly in the more challenging scenarios, our most advanced strategy outperforms other state-of-the-art agents from the Automated Negotiating Agent Competition, which provides an international benchmark for this work. In more detail, our results show that, in one-to-one negotiation, in the highly discounted scenarios, our agent reaches outcomes which, on average, are 2.3% higher than those of the next best agent. Furthermore, using empirical game theoretic analysis we show the robustness of our strategy in a variety of tournament settings. This analysis shows that, in the highly discounted scenarios, no agent can benefit by choosing a different strategy (taken from the top four strategies in that setting) than ours. Finally, in the many-to-many negotiations, we show how our strategy is particularly effective in highly competitive scenarios, where it outperforms the state-of-the-art many-to-many negotiation strategy by up to 45%
    • …
    corecore