4 research outputs found

    Energy Efficient Execution of POMDP Policies

    Get PDF
    Recent advances in planning techniques for partially observable Markov decision processes have focused on online search techniques and offline point-based value iteration. While these techniques allow practitioners to obtain policies for fairly large problems, they assume that a non-negligible amount of computation can be done between each decision point. In contrast, the recent proliferation of mobile and embedded devices has lead to a surge of applications that could benefit from state of the art planning techniques if they can operate under severe constraints on computational resources. To that effect, we describe two techniques to compile policies into controllers that can be executed by a mere table lookup at each decision point. The first approach compiles policies induced by a set of alpha vectors (such as those obtained by point-based techniques) into approximately equivalent controllers, while the second approach performs a simulation to compile arbitrary policies into approximately equivalent controllers. We also describe an approach to compress controllers by removing redundant and dominated nodes, often yielding smaller and yet better controllers. Further compression and higher value can sometimes be obtained by considering stochastic controllers. The compilation and compression techniques are demonstrated on benchmark problems as well as a mobile application to help persons with Alzheimer's to way-find. The battery consumption of several POMDP policies is compared against finite-state controllers learned using methods introduced in this paper. Experiments performed on the Nexus 4 phone show that finite-state controllers are the least battery consuming POMDP policies

    Planification robuste pour la collaboration homme-robot

    Get PDF
    National audienceFrom the robot’s point of view, a major issue in human-robot collaboration is how to be robust against uncertain human objectives, and uncertain human behaviors given a known objective. A key preliminary question is then : How to derive realistic human behaviors given a known objective ? Indeed, to allow for collaboration, such behaviors should also account for the robot behavior, while it is not known in the first place. In this paper, we rely on Markov decision models, representing the uncertainty over the human objective as a probability distribution over a finite set of reward functions (what will induce a distribution over human behaviors). Based on this, we propose two contributions : 1. a robot planning algorithm that is robust to the uncertain human behavior and relies on solving a POMDP obtained by reasoning on the distribution over human behaviors ; and 2. an approach to automatically generate an uncertain human behavior (a policy) for each provided reward function while accounting for the possible robot behavior. A co-working scenario allows conducting experiments and presenting qualitative and quantitative results to evaluate our approach.Du point de vue du robot, une difficulté majeure de la collaboration homme-robot est d'être robuste face à des objectifs incertains de l'humain, et aux comportements incertains étant donné un objectif connu. Une question préliminaire clef est alors : Comment dériver des comportements humains réalistes étant donné un objectif connu ? En effet, pour rendre la collaboration possible, de tels comportements devraient aussi tenir compte du comportement du robot, alors que celui-ci n'est justement pas connu. Dans cet article, nous nous appuyons sur des modèles de décision markoviens, représentant l'incertitude sur l'objectif de l'humain par une distribution de probabilité sur un ensemble fini de fonctions de récompense (ce qui induira une distribution sur des comportements humains). Sur cette base, nous proposons deux contributions : 1. un algorithme de planification pour le robot qui est robuste au comportement incertain de l'humain et repose sur la résolution d'un POMDP obtenu en raisonnant sur la distribution sur les comportements humains ; et 2. une approche pour générer automatiquement un comportement humain incertain (une politique) pour chaque fonction de récompense fournie tout en tenant compte du comportement possible du robot. Un scénario de travail collaboratif permet de mener des expérimentations et de présenter des résultats qualitatifs et quantitatifs pour évaluer notre approche

    Quick and Automatic Selection of POMDP Implementations on Mobile Platform Based on Battery Consumption Estimation

    Get PDF
    Partially Observable Markov Decision Process (POMDP) is widely used to model sequential decision making process under uncertainty and incomplete knowledge of the environment. It requires strong computation capability and is thus usually deployed on powerful machine. However, as mobile platforms become more advanced and more popular, the potential has been studied to combine POMDP and mobile in order to provide a broader range of services. And yet a question comes with this trend: how should we implement POMDP on mobile platform so that we can take advantages of mobile features while at the same time avoid being restricted by mobile limitations, such as short battery life, weak CPU, unstable networking connection, and other limited resources. In response to the above question, we first point out that the cases vary by problem nature, accuracy requirements and mobile device models. Rather than pure mathematical analysis, our approach is to run experiments on a mobile device and concentrate on a more specific question: which POMDP implementation is the ``best'' for a particular problem on a particular kind of device. Second, we propose and justify a POMDP implementation criterion mainly based on battery consumption that quantifies ``goodness'' of POMDP implementations in terms of mobile battery depletion rate. Then, we present a mobile battery consumption model that translates CPU and WIFI usage into part of the battery depletion rate in order to greatly accelerate the experiment process. With our mobile battery consumption model, we combine a set of simple benchmark experiments with CPU and WIFI usage data from each POMDP implementation candidate to generate estimated battery depletion rates, as opposed to conducting hours of real battery experiments for each implementation individually. The final result is a ranking of POMDP implementations based on their estimated battery depletion rates. It serves as a guidance for on POMDP implementation selection for mobile developers. We develop a mobile software toolkit to automate the above process. Given basic POMDP problem specifications, a set of POMDP implementation candidates and a simple press on the ``start'' button, the toolkit automatically performs benchmark experiments on the target device on which it is installed, and records CPU and WIFI statistics for each POMDP implementation candidate. It then feeds the data to its embedded mobile battery consumption model and produces an estimated battery depletion rate for each candidate. Finally, the toolkit visualizes the ranking of POMDP implementations for mobile developers' reference. Evaluation is assessed through comparsion between the ranking from estimated battery depletion rate and that from real experimental battery depletion rate. We observe the same ranking out of both, which is also our expectation. What's more, the similarity between estimated battery depletion rate and experimental battery depletion rate measured by cosine-similarity is almost 0.999 where 1 indicates they are exactly the same

    Journal Track Paper Abstracts

    No full text
    This edited compilation contains abstracts of the journal paper summaries presented at the 25th ICAPs conference Journal Paper Track. Papers include Simple Regret Optimization in Online Planning for Markov Decision Processes, by Zohar Feldman and Carmel Domshlak; Energy Efficient Execution of POMDP Policies by Marek Grzes, Pascal Poupart, Xiao Yang, and Jesse Hoey; Envisioning the Qualitative Effects of Robot Manipulation Actions Using Simulation-Based Projections by Lars Kunze and Michael Beetz; and Distributed Heuristic Forward Search for Multi-Agent Planning by Raz Nissim and Ronen Brafman
    corecore