2,216 research outputs found
Learning Conventions in Multiagent Stochastic Domains using Likelihood Estimates
Fully cooperative multiagent systems - those in which agents share a joint
utility model- is of special interest in AI. A key problem is that of ensuring
that the actions of individual agents are coordinated, especially in settings
where the agents are autonomous decision makers. We investigate approaches to
learning coordinated strategies in stochastic domains where an agent's actions
are not directly observable by others. Much recent work in game theory has
adopted a Bayesian learning perspective to the more general problem of
equilibrium selection, but tends to assume that actions can be observed. We
discuss the special problems that arise when actions are not observable,
including effects on rates of convergence, and the effect of action failure
probabilities and asymmetries. We also use likelihood estimates as a means of
generalizing fictitious play learning models in our setting. Finally, we
propose the use of maximum likelihood as a means of removing strategies from
consideration, with the aim of convergence to a conventional equilibrium, at
which point learning and deliberation can cease.Comment: Appears in Proceedings of the Twelfth Conference on Uncertainty in
Artificial Intelligence (UAI1996
Modal Logics for Qualitative Possibility and Beliefs
Possibilistic logic has been proposed as a numerical formalism for reasoning
with uncertainty. There has been interest in developing qualitative accounts of
possibility, as well as an explanation of the relationship between possibility
and modal logics. We present two modal logics that can be used to represent and
reason with qualitative statements of possibility and necessity. Within this
modal framework, we are able to identify interesting relationships between
possibilistic logic, beliefs and conditionals. In particular, the most natural
conditional definable via possibilistic means for default reasoning is
identical to Pearl's conditional for e-semantics.Comment: Appears in Proceedings of the Eighth Conference on Uncertainty in
Artificial Intelligence (UAI1992
The Probability of a Possibility: Adding Uncertainty to Default Rules
We present a semantics for adding uncertainty to conditional logics for
default reasoning and belief revision. We are able to treat conditional
sentences as statements of conditional probability, and express rules for
revision such as "If A were believed, then B would be believed to degree p."
This method of revision extends conditionalization by allowing meaningful
revision by sentences whose probability is zero. This is achieved through the
use of counterfactual probabilities. Thus, our system accounts for the best
properties of qualitative methods of update (in particular, the AGM theory of
revision) and probabilistic methods. We also show how our system can be viewed
as a unification of probability theory and possibility theory, highlighting
their orthogonality and providing a means for expressing the probability of a
possibility. We also demonstrate the connection to Lewis's method of imaging.Comment: Appears in Proceedings of the Ninth Conference on Uncertainty in
Artificial Intelligence (UAI1993
Eliciting Forecasts from Self-interested Experts: Scoring Rules for Decision Makers
Scoring rules for eliciting expert predictions of random variables are
usually developed assuming that experts derive utility only from the quality of
their predictions (e.g., score awarded by the rule, or payoff in a prediction
market). We study a more realistic setting in which (a) the principal is a
decision maker and will take a decision based on the expert's prediction; and
(b) the expert has an inherent interest in the decision. For example, in a
corporate decision market, the expert may derive different levels of utility
from the actions taken by her manager. As a consequence the expert will usually
have an incentive to misreport her forecast to influence the choice of the
decision maker if typical scoring rules are used. We develop a general model
for this setting and introduce the concept of a compensation rule. When
combined with the expert's inherent utility for decisions, a compensation rule
induces a net scoring rule that behaves like a normal scoring rule. Assuming
full knowledge of expert utility, we provide a complete characterization of all
(strictly) proper compensation rules. We then analyze the situation where the
expert's utility function is not fully known to the decision maker. We show
bounds on: (a) expert incentive to misreport; (b) the degree to which an expert
will misreport; and (c) decision maker loss in utility due to such uncertainty.
These bounds depend in natural ways on the degree of uncertainty, the local
degree of convexity of net scoring function, and natural properties of the
decision maker's utility function. They also suggest optimization procedures
for the design of compensation rules. Finally, we briefly discuss the use of
compensation rules as market scoring rules for self-interested experts in a
prediction market.Comment: 11 pages 4 figures pdflatex See
http://www.cs.toronto.edu/~cebly/papers.htm
Value-Directed Belief State Approximation for POMDPs
We consider the problem belief-state monitoring for the purposes of
implementing a policy for a partially-observable Markov decision process
(POMDP), specifically how one might approximate the belief state. Other schemes
for belief-state approximation (e.g., based on minimixing a measures such as
KL-diveregence between the true and estimated state) are not necessarily
appropriate for POMDPs. Instead we propose a framework for analyzing
value-directed approximation schemes, where approximation quality is determined
by the expected error in utility rather than by the error in the belief state
itself. We propose heuristic methods for finding good projection schemes for
belief state estimation - exhibiting anytime characteristics - given a POMDP
value fucntion. We also describe several algorithms for constructing bounds on
the error in decision quality (expected utility) associated with acting in
accordance with a given belief state approximation.Comment: Appears in Proceedings of the Sixteenth Conference on Uncertainty in
Artificial Intelligence (UAI2000
Practical Linear Value-approximation Techniques for First-order MDPs
Recent work on approximate linear programming (ALP) techniques for
first-order Markov Decision Processes (FOMDPs) represents the value function
linearly w.r.t. a set of first-order basis functions and uses linear
programming techniques to determine suitable weights. This approach offers the
advantage that it does not require simplification of the first-order value
function, and allows one to solve FOMDPs independent of a specific domain
instantiation. In this paper, we address several questions to enhance the
applicability of this work: (1) Can we extend the first-order ALP framework to
approximate policy iteration to address performance deficiencies of previous
approaches? (2) Can we automatically generate basis functions and evaluate
their impact on value function quality? (3) How can we decompose intractable
problems with universally quantified rewards into tractable subproblems? We
propose answers to these questions along with a number of novel optimizations
and provide a comparative empirical evaluation on logistics problems from the
ICAPS 2004 Probabilistic Planning Competition.Comment: Appears in Proceedings of the Twenty-Second Conference on Uncertainty
in Artificial Intelligence (UAI2006
Approximate Linear Programming for First-order MDPs
We introduce a new approximate solution technique for first-order Markov
decision processes (FOMDPs). Representing the value function linearly w.r.t. a
set of first-order basis functions, we compute suitable weights by casting the
corresponding optimization as a first-order linear program and show how
off-the-shelf theorem prover and LP software can be effectively used. This
technique allows one to solve FOMDPs independent of a specific domain
instantiation; furthermore, it allows one to determine bounds on approximation
error that apply equally to all domain instantiations. We apply this solution
technique to the task of elevator scheduling with a rich feature space and
multi-criteria additive reward, and demonstrate that it outperforms a number of
intuitive, heuristicallyguided policies.Comment: Appears in Proceedings of the Twenty-First Conference on Uncertainty
in Artificial Intelligence (UAI2005
Vector-space Analysis of Belief-state Approximation for POMDPs
We propose a new approach to value-directed belief state approximation for
POMDPs. The value-directed model allows one to choose approximation methods for
belief state monitoring that have a small impact on decision quality. Using a
vector space analysis of the problem, we devise two new search procedures for
selecting an approximation scheme that have much better computational
properties than existing methods. Though these provide looser error bounds, we
show empirically that they have a similar impact on decision quality in
practice, and run up to two orders of magnitude more quickly.Comment: Appears in Proceedings of the Seventeenth Conference on Uncertainty
in Artificial Intelligence (UAI2001
Accelerating Reinforcement Learning through Implicit Imitation
Imitation can be viewed as a means of enhancing learning in multiagent
environments. It augments an agent's ability to learn useful behaviors by
making intelligent use of the knowledge implicit in behaviors demonstrated by
cooperative teachers or other more experienced agents. We propose and study a
formal model of implicit imitation that can accelerate reinforcement learning
dramatically in certain cases. Roughly, by observing a mentor, a
reinforcement-learning agent can extract information about its own capabilities
in, and the relative value of, unvisited parts of the state space. We study two
specific instantiations of this model, one in which the learning agent and the
mentor have identical abilities, and one designed to deal with agents and
mentors with different action sets. We illustrate the benefits of implicit
imitation by integrating it with prioritized sweeping, and demonstrating
improved performance and convergence through observation of single and multiple
mentors. Though we make some stringent assumptions regarding observability and
possible interactions, we briefly comment on extensions of the model that relax
these restricitions
Regret-based Reward Elicitation for Markov Decision Processes
The specification of aMarkov decision process (MDP) can be difficult. Reward
function specification is especially problematic; in practice, it is often
cognitively complex and time-consuming for users to precisely specify rewards.
This work casts the problem of specifying rewards as one of preference
elicitation and aims to minimize the degree of precision with which a reward
function must be specified while still allowing optimal or near-optimal
policies to be produced. We first discuss how robust policies can be computed
for MDPs given only partial reward information using the minimax regret
criterion. We then demonstrate how regret can be reduced by efficiently
eliciting reward information using bound queries, using regret-reduction as a
means for choosing suitable queries. Empirical results demonstrate that
regret-based reward elicitation offers an effective way to produce near-optimal
policies without resorting to the precise specification of the entire reward
function.Comment: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty
in Artificial Intelligence (UAI2009
- …