337 research outputs found
Pond-Hindsight: Applying Hindsight Optimization to Partially-Observable Markov Decision Processes
Partially-observable Markov decision processes (POMDPs) are especially good at modeling real-world problems because they allow for sensor and effector uncertainty. Unfortunately, such uncertainty makes solving a POMDP computationally challenging. Traditional approaches, which are based on value iteration, can be slow because they find optimal actions for every possible situation. With the help of the Fast Forward (FF) planner, FF- Replan and FF-Hindsight have shown success in quickly solving fully-observable Markov decision processes (MDPs) by solving classical planning translations of the problem. This thesis extends the concept of problem determination to POMDPs by sampling action observations (similar to how FF-Replan samples action outcomes) and guiding the construction of policy trajectories with a conformant (as opposed to classical) planning heuristic. The resultant planner is called POND-Hindsight
REBA: A Refinement-Based Architecture for Knowledge Representation and Reasoning in Robotics
This paper describes an architecture for robots that combines the
complementary strengths of probabilistic graphical models and declarative
programming to represent and reason with logic-based and probabilistic
descriptions of uncertainty and domain knowledge. An action language is
extended to support non-boolean fluents and non-deterministic causal laws. This
action language is used to describe tightly-coupled transition diagrams at two
levels of granularity, with a fine-resolution transition diagram defined as a
refinement of a coarse-resolution transition diagram of the domain. The
coarse-resolution system description, and a history that includes (prioritized)
defaults, are translated into an Answer Set Prolog (ASP) program. For any given
goal, inference in the ASP program provides a plan of abstract actions. To
implement each such abstract action, the robot automatically zooms to the part
of the fine-resolution transition diagram relevant to this action. A
probabilistic representation of the uncertainty in sensing and actuation is
then included in this zoomed fine-resolution system description, and used to
construct a partially observable Markov decision process (POMDP). The policy
obtained by solving the POMDP is invoked repeatedly to implement the abstract
action as a sequence of concrete actions, with the corresponding observations
being recorded in the coarse-resolution history and used for subsequent
reasoning. The architecture is evaluated in simulation and on a mobile robot
moving objects in an indoor domain, to show that it supports reasoning with
violation of defaults, noisy observations and unreliable actions, in complex
domains.Comment: 72 pages, 14 figure
Optimized Bacteria are Environmental Prediction Engines
Experimentalists have observed phenotypic variability in isogenic bacteria
populations. We explore the hypothesis that in fluctuating environments this
variability is tuned to maximize a bacterium's expected log growth rate,
potentially aided by epigenetic markers that store information about past
environments. We show that, in a complex, memoryful environment, the maximal
expected log growth rate is linear in the instantaneous predictive
information---the mutual information between a bacterium's epigenetic markers
and future environmental states. Hence, under resource constraints, optimal
epigenetic markers are causal states---the minimal sufficient statistics for
prediction. This is the minimal amount of information about the past needed to
predict the future as well as possible. We suggest new theoretical
investigations into and new experiments on bacteria phenotypic bet-hedging in
fluctuating complex environments.Comment: 7 pages, 1 figure;
http://csc.ucdavis.edu/~cmg/compmech/pubs/obepe.ht
Contributions on complexity bounds for Deterministic Partially Observed Markov Decision Process
Markov Decision Processes (Mdps) form a versatile framework used to model a
wide range of optimization problems. The Mdp model consists of sets of states,
actions, time steps, rewards, and probability transitions. When in a given
state and at a given time, the decision maker's action generates a reward and
determines the state at the next time step according to the probability
transition function. However, Mdps assume that the decision maker knows the
state of the controlled dynamical system. Hence, when one needs to optimize
controlled dynamical systems under partial observation, one often turns toward
the formalism of Partially Observed Markov Decision Processes (Pomdp). Pomdps
are often untractable in the general case as Dynamic Programming suffers from
the curse of dimensionality. Instead of focusing on the general Pomdps, we
present a subclass where transitions and observations mappings are
deterministic: Deterministic Partially Observed Markov Decision Processes
(Det-Pomdp). That subclass of problems has been studied by (Littman, 1996) and
(Bonet, 2009). It was first considered as a limit case of Pomdps by Littman,
mainly used to illustrate the complexity of Pomdps when considering as few
sources of uncertainties as possible. In this paper, we improve on Littman's
complexity bounds. We then introduce and study an even simpler class: Separated
Det-Pomdps and give some new complexity bounds for this class. This new class
of problems uses a property of the dynamics and observation to push back the
curse of dimensionality
- …