675 research outputs found
Expectation Optimization with Probabilistic Guarantees in POMDPs with Discounted-sum Objectives
Partially-observable Markov decision processes (POMDPs) with discounted-sum
payoff are a standard framework to model a wide range of problems related to
decision making under uncertainty. Traditionally, the goal has been to obtain
policies that optimize the expectation of the discounted-sum payoff. A key
drawback of the expectation measure is that even low probability events with
extreme payoff can significantly affect the expectation, and thus the obtained
policies are not necessarily risk-averse. An alternate approach is to optimize
the probability that the payoff is above a certain threshold, which allows
obtaining risk-averse policies, but ignores optimization of the expectation. We
consider the expectation optimization with probabilistic guarantee (EOPG)
problem, where the goal is to optimize the expectation ensuring that the payoff
is above a given threshold with at least a specified probability. We present
several results on the EOPG problem, including the first algorithm to solve it.Comment: Full version of a paper published at IJCAI/ECAI 201
Constrained Hierarchical Monte Carlo Belief-State Planning
Optimal plans in Constrained Partially Observable Markov Decision Processes
(CPOMDPs) maximize reward objectives while satisfying hard cost constraints,
generalizing safe planning under state and transition uncertainty.
Unfortunately, online CPOMDP planning is extremely difficult in large or
continuous problem domains. In many large robotic domains, hierarchical
decomposition can simplify planning by using tools for low-level control given
high-level action primitives (options). We introduce Constrained Options Belief
Tree Search (COBeTS) to leverage this hierarchy and scale online search-based
CPOMDP planning to large robotic problems. We show that if primitive option
controllers are defined to satisfy assigned constraint budgets, then COBeTS
will satisfy constraints anytime. Otherwise, COBeTS will guide the search
towards a safe sequence of option primitives, and hierarchical monitoring can
be used to achieve runtime safety. We demonstrate COBeTS in several
safety-critical, constrained partially observable robotic domains, showing that
it can plan successfully in continuous CPOMDPs while non-hierarchical baselines
cannot.Comment: Under review for the 2024 IEEE International Conference on Robotics
and Automation (ICRA
Learning environment properties in Partially Observable Monte Carlo Planning
We tackle the problem of learning state-variable relationships in Partially Observable Markov Decision Processes to improve planning performance on mobile robots. The proposed approach extends Partially Observable Monte Carlo Planning (POMCP) and represents state-variable relationships with Markov Random Fields. A ROS-based implementation of the approach is proposed and evaluated in rocksample, a standard benchmark for probabilistic planning under uncertainty. Experiments have been performed in simulation with Gazebo. Results show that the proposed approach allows to effectively learn state- variable probabilistic constraints on ROS-based robotic platforms and to use them in subsequent episodes to outperform standard POMC
Universal Reinforcement Learning Algorithms: Survey and Experiments
Many state-of-the-art reinforcement learning (RL) algorithms typically assume
that the environment is an ergodic Markov Decision Process (MDP). In contrast,
the field of universal reinforcement learning (URL) is concerned with
algorithms that make as few assumptions as possible about the environment. The
universal Bayesian agent AIXI and a family of related URL algorithms have been
developed in this setting. While numerous theoretical optimality results have
been proven for these agents, there has been no empirical investigation of
their behavior to date. We present a short and accessible survey of these URL
algorithms under a unified notation and framework, along with results of some
experiments that qualitatively illustrate some properties of the resulting
policies, and their relative performance on partially-observable gridworld
environments. We also present an open-source reference implementation of the
algorithms which we hope will facilitate further understanding of, and
experimentation with, these ideas.Comment: 8 pages, 6 figures, Twenty-sixth International Joint Conference on
Artificial Intelligence (IJCAI-17
- …