Search CORE

138 research outputs found

Planning in Partially Observable Domains with Fuzzy Epistemic States and Probabilistic Dynamics

Author: Drougard Nicolas
Dubois Didier
Farges Jean-Loup
Teichteil-Königsbuch Florent
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2015
Field of study

International audienceA new translation from Partially Observable MDP into Fully Observable MDP is described here. Unlike the classical translation, the resulting problem state space is finite, making MDP solvers able to solve this simplified version of the initial partially observable problem: this approach encodes agent beliefs with possibility distributions over states, leading to an MDP whose state space is a finite set of epistemic states. After a short description of the POMDP framework as well as notions of Possibility Theory, the translation is described in a formal manner with semantic arguments. Then actual computations of this transformation are detailed, in order to highly benefit from the factored structure of the initial POMDP in the final MDP size reduction and structure. Finally size reduction and tractability of the resulting MDP is illustrated on a simple POMDP problem

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Deep Variational Reinforcement Learning for POMDPs

Author: Igl Maximilian
Le Tuan Anh
Whiteson Shimon
Wood Frank
Zintgraf Luisa
Publication venue
Publication date: 01/01/2018
Field of study

Many real-world sequential decision making problems are partially observable by nature, and the environment model is typically unknown. Consequently, there is great need for reinforcement learning methods that can tackle such problems given only a stream of incomplete and noisy observations. In this paper, we propose deep variational reinforcement learning (DVRL), which introduces an inductive bias that allows an agent to learn a generative model of the environment and perform inference in that model to effectively aggregate the available information. We develop an n-step approximation to the evidence lower bound (ELBO), allowing the model to be trained jointly with the policy. This ensures that the latent state representation is suitable for the control task. In experiments on Mountain Hike and flickering Atari we show that our method outperforms previous approaches relying on recurrent neural networks to encode the past

arXiv.org e-Print Archive

Oxford University Research Archive

Data exfiltration detection and prevention: Virtually distributed POMDPs for practically safer networks

Author: MANADHATA Pratyusa
MC CARTHY Sara Marie
SINHA Arunesh
TAMBE Milind
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/11/2016
Field of study

Institutional Knowledge at Singapore Management University

Influence of State-Variable Constraints on Partially Observable Monte Carlo Planning

Author: Castellini A.
Chalkiadakis Georgios
Farinelli A.
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 01/01/2019
Field of study

Online planning methods for partially observable Markov decision processes (POMDPs) have re- cently gained much interest. In this paper, we pro- pose the introduction of prior knowledge in the form of (probabilistic) relationships among dis- crete state-variables, for online planning based on the well-known POMCP algorithm. In particu- lar, we propose the use of hard constraint net- works and probabilistic Markov random fields to formalize state-variable constraints and we extend the POMCP algorithm to take advantage of these constraints. Results on a case study based on Rock- sample show that the usage of this knowledge pro- vides significant improvements to the performance of the algorithm. The extent of this improvement depends on the amount of knowledge encoded in the constraints and reaches the 50% of the average discounted return in the most favorable cases that we analyzed

Crossref

Catalogo dei prodotti della ricerca