1,085 research outputs found
Learning to Coordinate Efficiently: A Model-based Approach
In common-interest stochastic games all players receive an identical payoff.
Players participating in such games must learn to coordinate with each other in
order to receive the highest-possible value. A number of reinforcement learning
algorithms have been proposed for this problem, and some have been shown to
converge to good solutions in the limit. In this paper we show that using very
simple model-based algorithms, much better (i.e., polynomial) convergence rates
can be attained. Moreover, our model-based algorithms are guaranteed to
converge to the optimal value, unlike many of the existing algorithms
Structure and Complexity in Planning with Unary Operators
Unary operator domains -- i.e., domains in which operators have a single
effect -- arise naturally in many control problems. In its most general form,
the problem of STRIPS planning in unary operator domains is known to be as hard
as the general STRIPS planning problem -- both are PSPACE-complete. However,
unary operator domains induce a natural structure, called the domain's causal
graph. This graph relates between the preconditions and effect of each domain
operator. Causal graphs were exploited by Williams and Nayak in order to
analyze plan generation for one of the controllers in NASA's Deep-Space One
spacecraft. There, they utilized the fact that when this graph is acyclic, a
serialization ordering over any subgoal can be obtained quickly. In this paper
we conduct a comprehensive study of the relationship between the structure of a
domain's causal graph and the complexity of planning in this domain. On the
positive side, we show that a non-trivial polynomial time plan generation
algorithm exists for domains whose causal graph induces a polytree with a
constant bound on its node indegree. On the negative side, we show that even
plan existence is hard when the graph is a directed-path singly connected DAG.
More generally, we show that the number of paths in the causal graph is closely
related to the complexity of planning in the associated domain. Finally we
relate our results to the question of complexity of planning with serializable
subgoals
On Partially Controlled Multi-Agent Systems
Motivated by the control theoretic distinction between controllable and
uncontrollable events, we distinguish between two types of agents within a
multi-agent system: controllable agents, which are directly controlled by the
system's designer, and uncontrollable agents, which are not under the
designer's direct control. We refer to such systems as partially controlled
multi-agent systems, and we investigate how one might influence the behavior of
the uncontrolled agents through appropriate design of the controlled agents. In
particular, we wish to understand which problems are naturally described in
these terms, what methods can be applied to influence the uncontrollable
agents, the effectiveness of such methods, and whether similar methods work
across different domains. Using a game-theoretic framework, this paper studies
the design of partially controlled multi-agent systems in two contexts: in one
context, the uncontrollable agents are expected utility maximizers, while in
the other they are reinforcement learners. We suggest different techniques for
controlling agents' behavior in each domain, assess their success, and examine
their relationship.Comment: See http://www.jair.org/ for any accompanying file
LTLf/LDLf Non-Markovian Rewards
In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on the last state and action. This dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle non-Markovian reward functions was the subject of two previous lines of work. Both use LTL variants to specify the reward function and then compile the new model back into a Markovian model. Building on recent progress in temporal logics over finite traces, we adopt LDLf for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees
- …