7,757 research outputs found
MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs
We present multi-agent A* (MAA*), the first complete and optimal heuristic
search algorithm for solving decentralized partially-observable Markov decision
problems (DEC-POMDPs) with finite horizon. The algorithm is suitable for
computing optimal plans for a cooperative group of agents that operate in a
stochastic environment such as multirobot coordination, network traffic
control, `or distributed resource allocation. Solving such problems efiectively
is a major challenge in the area of planning under uncertainty. Our solution is
based on a synthesis of classical heuristic search and decentralized control
theory. Experimental results show that MAA* has significant advantages. We
introduce an anytime variant of MAA* and conclude with a discussion of
promising extensions such as an approach to solving infinite horizon problems.Comment: Appears in Proceedings of the Twenty-First Conference on Uncertainty
in Artificial Intelligence (UAI2005
A Theoretical Analysis of Two-Stage Recommendation for Cold-Start Collaborative Filtering
In this paper, we present a theoretical framework for tackling the cold-start
collaborative filtering problem, where unknown targets (items or users) keep
coming to the system, and there is a limited number of resources (users or
items) that can be allocated and related to them. The solution requires a
trade-off between exploitation and exploration as with the limited
recommendation opportunities, we need to, on one hand, allocate the most
relevant resources right away, but, on the other hand, it is also necessary to
allocate resources that are useful for learning the target's properties in
order to recommend more relevant ones in the future. In this paper, we study a
simple two-stage recommendation combining a sequential and a batch solution
together. We first model the problem with the partially observable Markov
decision process (POMDP) and provide an exact solution. Then, through an
in-depth analysis over the POMDP value iteration solution, we identify that an
exact solution can be abstracted as selecting resources that are not only
highly relevant to the target according to the initial-stage information, but
also highly correlated, either positively or negatively, with other potential
resources for the next stage. With this finding, we propose an approximate
solution to ease the intractability of the exact solution. Our initial results
on synthetic data and the Movie Lens 100K dataset confirm the performance gains
of our theoretical development and analysis
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits
In this paper, we propose an information-theoretic exploration strategy for
stochastic, discrete multi-armed bandits that achieves optimal regret. Our
strategy is based on the value of information criterion. This criterion
measures the trade-off between policy information and obtainable rewards. High
amounts of policy information are associated with exploration-dominant searches
of the space and yield high rewards. Low amounts of policy information favor
the exploitation of existing knowledge. Information, in this criterion, is
quantified by a parameter that can be varied during search. We demonstrate that
a simulated-annealing-like update of this parameter, with a sufficiently fast
cooling schedule, leads to an optimal regret that is logarithmic with respect
to the number of episodes.Comment: Entrop
Approximate Policy Iteration for Generalized Semi-Markov Decision Processes: an Improved Algorithm
In the context of time-dependent problems of planning under uncertainty, most of the problem's complexity comes from the concurrent interaction of simultaneous processes. Generalized Semi-Markov Decision Processes represent an efficient formalism to capture both concurrency of events and actions and uncertainty. We introduce GSMDP with observable time and hybrid state space and present an new algorithm based on Approximate Policy Iteration to generate efficient policies. This algorithm relies on simulation-based exploration and makes use of SVM regression. We experimentally illustrate the strengths and weaknesses of this algorithm and propose an improved version based on the weaknesses highlighted by the experiments
- …