Search CORE

337 research outputs found

Pond-Hindsight: Applying Hindsight Optimization to Partially-Observable Markov Decision Processes

Author: Olsen Alan
Publication venue: DigitalCommons@USU
Publication date: 01/05/2011
Field of study

Partially-observable Markov decision processes (POMDPs) are especially good at modeling real-world problems because they allow for sensor and effector uncertainty. Unfortunately, such uncertainty makes solving a POMDP computationally challenging. Traditional approaches, which are based on value iteration, can be slow because they find optimal actions for every possible situation. With the help of the Fast Forward (FF) planner, FF- Replan and FF-Hindsight have shown success in quickly solving fully-observable Markov decision processes (MDPs) by solving classical planning translations of the problem. This thesis extends the concept of problem determination to POMDPs by sampling action observations (similar to how FF-Replan samples action outcomes) and guiding the construction of policy trajectories with a conformant (as opposed to classical) planning heuristic. The resultant planner is called POND-Hindsight

DigitalCommons@USU

REBA: A Refinement-Based Architecture for Knowledge Representation and Reasoning in Robotics

Author: Gelfond Michael
Sridharan Mohan
Wyatt Jeremy
Zhang Shiqi
Publication venue
Publication date: 21/09/2018
Field of study

This paper describes an architecture for robots that combines the complementary strengths of probabilistic graphical models and declarative programming to represent and reason with logic-based and probabilistic descriptions of uncertainty and domain knowledge. An action language is extended to support non-boolean fluents and non-deterministic causal laws. This action language is used to describe tightly-coupled transition diagrams at two levels of granularity, with a fine-resolution transition diagram defined as a refinement of a coarse-resolution transition diagram of the domain. The coarse-resolution system description, and a history that includes (prioritized) defaults, are translated into an Answer Set Prolog (ASP) program. For any given goal, inference in the ASP program provides a plan of abstract actions. To implement each such abstract action, the robot automatically zooms to the part of the fine-resolution transition diagram relevant to this action. A probabilistic representation of the uncertainty in sensing and actuation is then included in this zoomed fine-resolution system description, and used to construct a partially observable Markov decision process (POMDP). The policy obtained by solving the POMDP is invoked repeatedly to implement the abstract action as a sequence of concrete actions, with the corresponding observations being recorded in the coarse-resolution history and used for subsequent reasoning. The architecture is evaluated in simulation and on a mobile robot moving objects in an indoor domain, to show that it supports reasoning with violation of defaults, noisy observations and unreliable actions, in complex domains.Comment: 72 pages, 14 figure

arXiv.org e-Print Archive

University of Birmingham Research Portal

Optimized Bacteria are Environmental Prediction Engines

Author: Crutchfield James P.
Marzen Sarah E.
Publication venue: 'American Physical Society (APS)'
Publication date: 08/02/2018
Field of study

Experimentalists have observed phenotypic variability in isogenic bacteria populations. We explore the hypothesis that in fluctuating environments this variability is tuned to maximize a bacterium's expected log growth rate, potentially aided by epigenetic markers that store information about past environments. We show that, in a complex, memoryful environment, the maximal expected log growth rate is linear in the instantaneous predictive information---the mutual information between a bacterium's epigenetic markers and future environmental states. Hence, under resource constraints, optimal epigenetic markers are causal states---the minimal sufficient statistics for prediction. This is the minimal amount of information about the past needed to predict the future as well as possible. We suggest new theoretical investigations into and new experiments on bacteria phenotypic bet-hedging in fluctuating complex environments.Comment: 7 pages, 1 figure; http://csc.ucdavis.edu/~cmg/compmech/pubs/obepe.ht

arXiv.org e-Print Archive

DSpace@MIT

eScholarship - University of California

Contributions on complexity bounds for Deterministic Partially Observed Markov Decision Process

Author: Carpentier Pierre
Chancelier Jean-Philippe
de Lara Michel
Rodríguez-Martínez Alejandro
Vessaire Cyrille
Publication venue
Publication date: 15/12/2022
Field of study

Markov Decision Processes (Mdps) form a versatile framework used to model a wide range of optimization problems. The Mdp model consists of sets of states, actions, time steps, rewards, and probability transitions. When in a given state and at a given time, the decision maker's action generates a reward and determines the state at the next time step according to the probability transition function. However, Mdps assume that the decision maker knows the state of the controlled dynamical system. Hence, when one needs to optimize controlled dynamical systems under partial observation, one often turns toward the formalism of Partially Observed Markov Decision Processes (Pomdp). Pomdps are often untractable in the general case as Dynamic Programming suffers from the curse of dimensionality. Instead of focusing on the general Pomdps, we present a subclass where transitions and observations mappings are deterministic: Deterministic Partially Observed Markov Decision Processes (Det-Pomdp). That subclass of problems has been studied by (Littman, 1996) and (Bonet, 2009). It was first considered as a limit case of Pomdps by Littman, mainly used to illustrate the complexity of Pomdps when considering as few sources of uncertainties as possible. In this paper, we improve on Littman's complexity bounds. We then introduce and study an even simpler class: Separated Det-Pomdps and give some new complexity bounds for this class. This new class of problems uses a property of the dynamics and observation to push back the curse of dimensionality

arXiv.org e-Print Archive

HAL-Ecole des Ponts ParisTech