231 research outputs found
Simple Regret Optimization in Online Planning for Markov Decision Processes
We consider online planning in Markov decision processes (MDPs). In online
planning, the agent focuses on its current state only, deliberates about the
set of possible policies from that state onwards and, when interrupted, uses
the outcome of that exploratory deliberation to choose what action to perform
next. The performance of algorithms for online planning is assessed in terms of
simple regret, which is the agent's expected performance loss when the chosen
action, rather than an optimal one, is followed.
To date, state-of-the-art algorithms for online planning in general MDPs are
either best effort, or guarantee only polynomial-rate reduction of simple
regret over time. Here we introduce a new Monte-Carlo tree search algorithm,
BRUE, that guarantees exponential-rate reduction of simple regret and error
probability. This algorithm is based on a simple yet non-standard state-space
sampling scheme, MCTS2e, in which different parts of each sample are dedicated
to different exploratory objectives. Our empirical evaluation shows that BRUE
not only provides superior performance guarantees, but is also very effective
in practice and favorably compares to state-of-the-art. We then extend BRUE
with a variant of "learning by forgetting." The resulting set of algorithms,
BRUE(alpha), generalizes BRUE, improves the exponential factor in the upper
bound on its reduction rate, and exhibits even more attractive empirical
performance
Structure and Complexity in Planning with Unary Operators
Unary operator domains -- i.e., domains in which operators have a single
effect -- arise naturally in many control problems. In its most general form,
the problem of STRIPS planning in unary operator domains is known to be as hard
as the general STRIPS planning problem -- both are PSPACE-complete. However,
unary operator domains induce a natural structure, called the domain's causal
graph. This graph relates between the preconditions and effect of each domain
operator. Causal graphs were exploited by Williams and Nayak in order to
analyze plan generation for one of the controllers in NASA's Deep-Space One
spacecraft. There, they utilized the fact that when this graph is acyclic, a
serialization ordering over any subgoal can be obtained quickly. In this paper
we conduct a comprehensive study of the relationship between the structure of a
domain's causal graph and the complexity of planning in this domain. On the
positive side, we show that a non-trivial polynomial time plan generation
algorithm exists for domains whose causal graph induces a polytree with a
constant bound on its node indegree. On the negative side, we show that even
plan existence is hard when the graph is a directed-path singly connected DAG.
More generally, we show that the number of paths in the causal graph is closely
related to the complexity of planning in the associated domain. Finally we
relate our results to the question of complexity of planning with serializable
subgoals
Landmarks, Critical Paths and Abstractions: What\u27s the Difference Anyway?
Current heuristic estimators for classical domain-independent planning are usually based on one of four ideas: delete relaxation, abstraction, critical paths, and, most recently, landmarks.
Previously, these different ideas for deriving heuristic functions were largely unconnected. In my talk, I will show that these heuristics are in fact very closely related. Moreover, I will introduce a new admissible heuristic called the landmark cut heuristic which exploits this relationship. In our experiments, the landmark cut heuristic provides better estimates than
other current admissible planning heuristics, especially on large problem instances
Probabilistic Planning via Heuristic Forward Search and Weighted Model Counting
We present a new algorithm for probabilistic planning with no observability.
Our algorithm, called Probabilistic-FF, extends the heuristic forward-search
machinery of Conformant-FF to problems with probabilistic uncertainty about
both the initial state and action effects. Specifically, Probabilistic-FF
combines Conformant-FFs techniques with a powerful machinery for weighted model
counting in (weighted) CNFs, serving to elegantly define both the search space
and the heuristic function. Our evaluation of Probabilistic-FF shows its fine
scalability in a range of probabilistic domains, constituting a several orders
of magnitude improvement over previous results in this area. We use a
problematic case to point out the main open issue to be addressed by further
research
CP-nets: A Tool for Representing and Reasoning withConditional Ceteris Paribus Preference Statements
Information about user preferences plays a key role in automated decision
making. In many domains it is desirable to assess such preferences in a
qualitative rather than quantitative way. In this paper, we propose a
qualitative graphical representation of preferences that reflects conditional
dependence and independence of preference statements under a ceteris paribus
(all else being equal) interpretation. Such a representation is often compact
and arguably quite natural in many circumstances. We provide a formal semantics
for this model, and describe how the structure of the network can be exploited
in several inference tasks, such as determining whether one outcome dominates
(is preferred to) another, ordering a set outcomes according to the preference
relation, and constructing the best outcome subject to available evidence
Graphically structured value-function compilation
AbstractClassical work on eliciting and representing preferences over multi-attribute alternatives has attempted to recognize conditions under which value functions take on particularly simple and compact form, making their elicitation much easier. In this paper we consider preferences over discrete domains, and show that for a certain class of simple and intuitive qualitative preference statements, one can always generate compact value functions consistent with these statements. These value functions maintain the independence structure implicit in the original statements. For discrete domains, these representation theorems are much more general than previous results. However, we also show that it is not always possible to maintain this compact structure if we add explicit ordering constraints among the available outcomes
- …