4,562 research outputs found
Perseus: Randomized Point-based Value Iteration for POMDPs
Partially observable Markov decision processes (POMDPs) form an attractive
and principled framework for agent planning under uncertainty. Point-based
approximate techniques for POMDPs compute a policy based on a finite set of
points collected in advance from the agents belief space. We present a
randomized point-based value iteration algorithm called Perseus. The algorithm
performs approximate value backup stages, ensuring that in each backup stage
the value of each point in the belief set is improved; the key observation is
that a single backup may improve the value of many belief points. Contrary to
other point-based methods, Perseus backs up only a (randomly selected) subset
of points in the belief set, sufficient for improving the value of each belief
point in the set. We show how the same idea can be extended to dealing with
continuous action spaces. Experimental results show the potential of Perseus in
large scale POMDP problems
Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey
Wireless sensor networks (WSNs) consist of autonomous and resource-limited
devices. The devices cooperate to monitor one or more physical phenomena within
an area of interest. WSNs operate as stochastic systems because of randomness
in the monitored environments. For long service time and low maintenance cost,
WSNs require adaptive and robust methods to address data exchange, topology
formulation, resource and power optimization, sensing coverage and object
detection, and security challenges. In these problems, sensor nodes are to make
optimized decisions from a set of accessible strategies to achieve design
goals. This survey reviews numerous applications of the Markov decision process
(MDP) framework, a powerful decision-making tool to develop adaptive algorithms
and protocols for WSNs. Furthermore, various solution methods are discussed and
compared to serve as a guide for using MDPs in WSNs
Strategy Synthesis for Autonomous Agents Using PRISM
We present probabilistic models for autonomous agent search and retrieve missions derived from Simulink models for an Unmanned Aerial Vehicle (UAV) and show how probabilistic model checking and the probabilistic model checker PRISM can be used for optimal controller generation. We introduce a sequence of scenarios relevant to UAVs and other autonomous agents such as underwater and ground vehicles. For each scenario we demonstrate how it can be modelled using the PRISM language, give model checking statistics and present the synthesised optimal controllers. We conclude with a discussion of the limitations when using probabilistic model checking and PRISM in this context and what steps can be taken to overcome them. In addition, we consider how the controllers can be returned to the UAV and adapted for use on larger search areas
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
Stochastic Shortest Path with Energy Constraints in POMDPs
We consider partially observable Markov decision processes (POMDPs) with a
set of target states and positive integer costs associated with every
transition. The traditional optimization objective (stochastic shortest path)
asks to minimize the expected total cost until the target set is reached. We
extend the traditional framework of POMDPs to model energy consumption, which
represents a hard constraint. The energy levels may increase and decrease with
transitions, and the hard constraint requires that the energy level must remain
positive in all steps till the target is reached. First, we present a novel
algorithm for solving POMDPs with energy levels, developing on existing POMDP
solvers and using RTDP as its main method. Our second contribution is related
to policy representation. For larger POMDP instances the policies computed by
existing solvers are too large to be understandable. We present an automated
procedure based on machine learning techniques that automatically extracts
important decisions of the policy allowing us to compute succinct human
readable policies. Finally, we show experimentally that our algorithm performs
well and computes succinct policies on a number of POMDP instances from the
literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of
AAMAS 201
Observer-based correct-by-design controller synthesis
Current state-of-the-art correct-by-design controllers are designed for
full-state measurable systems. This work first extends the applicability of
correct-by-design controllers to partially observable LTI systems. Leveraging
2nd order bounds we give a design method that has a quantifiable robustness to
probabilistic disturbances on state transitions and on output measurements. In
a case study from smart buildings we evaluate the new output-based
correct-by-design controller on a physical system with limited sensor
information
Online algorithms for POMDPs with continuous state, action, and observation spaces
Online solvers for partially observable Markov decision processes have been
applied to problems with large discrete state spaces, but continuous state,
action, and observation spaces remain a challenge. This paper begins by
investigating double progressive widening (DPW) as a solution to this
challenge. However, we prove that this modification alone is not sufficient
because the belief representations in the search tree collapse to a single
particle causing the algorithm to converge to a policy that is suboptimal
regardless of the computation time. This paper proposes and evaluates two new
algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using
weighted particle filtering. Simulation results show that these modifications
allow the algorithms to be successful where previous approaches fail.Comment: Added Multilane sectio
- …