Search CORE

80 research outputs found

Hilbert Space Embeddings of POMDPs

Author: Boularias Abdeslam
Fukumizu Kenji
Gretton Arthur
Nishiyama Yu
Publication venue
Publication date: 01/01/2012
Field of study

A nonparametric approach for policy learning for POMDPs is proposed. The approach represents distributions over the states, observations, and actions as embeddings in feature spaces, which are reproducing kernel Hilbert spaces. Distributions over states given the observations are obtained by applying the kernel Bayes' rule to these distribution embeddings. Policies and value functions are defined on the feature space over states, which leads to a feature space expression for the Bellman equation. Value iteration may then be used to estimate the optimal value function and associated policy. Experimental results confirm that the correct policy is learned using the feature space representation.Comment: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012

arXiv.org e-Print Archive

UCL Discovery

MPG.PuRe

A New Distribution-Free Concept for Representing, Comparing, and Propagating Uncertainty in Dynamical Systems with Kernel Probabilistic Programming

Author: Diehl Moritz
Muandet Krikamol
Schölkopf Bernhard
Zhu Jia-Jie
Publication venue
Publication date: 04/05/2020
Field of study

This work presents the concept of kernel mean embedding and kernel probabilistic programming in the context of stochastic systems. We propose formulations to represent, compare, and propagate uncertainties for fairly general stochastic dynamics in a distribution-free manner. The new tools enjoy sound theory rooted in functional analysis and wide applicability as demonstrated in distinct numerical examples. The implication of this new concept is a new mode of thinking about the statistical nature of uncertainty in dynamical systems

arXiv.org e-Print Archive

MPG.PuRe

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Author: Bennett Andrew
Chernozhukov Victor
Jiang Nan
Kallus Nathan
Kiyohara Haruka
Shi Chengchun
Sun Wen
Uehara Masatoshi
Publication venue
Publication date: 14/11/2023
Field of study

We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs. Future-dependent value functions play similar roles as classical value functions in fully-observable MDPs. We derive a new Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is consistent as long as futures and histories contain sufficient information about latent states, and the Bellman completeness. Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.Comment: This paper was accepted in NeurIPS 202

arXiv.org e-Print Archive

Reinforcement Learning in Non-Markovian Environments

Author: Borkar Vivek S
Chandak Siddharth
Dodhia Parth
Publication venue
Publication date: 03/11/2022
Field of study

Following the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation inspired by classical stochastic control that reduces the problem to recursive computation of approximate sufficient statistics.Comment: 15 pages, submitted to Systems and Control Letter

arXiv.org e-Print Archive

Deep reinforcement learning for attacking wireless sensor networks

Author: Hüttenrauch Maximilian
Neumann Gerhard
Parras Juan
Zazo Santiago
Publication venue: MDPI
Publication date: 01/06/2021
Field of study

Recent advances in Deep Reinforcement Learning allow solving increasingly complex problems. In this work, we show how current defense mechanisms in Wireless Sensor Networks are vulnerable to attacks that use these advances. We use a Deep Reinforcement Learning attacker architecture that allows having one or more attacking agents that can learn to attack using only partial observations. Then, we subject our architecture to a test-bench consisting of two defense mechanisms against a distributed spectrum sensing attack and a backoff attack. Our simulations show that our attacker learns to exploit these systems without having a priori information about the defense mechanism used nor its concrete parameters. Since our attacker requires minimal hyper-parameter tuning, scales with the number of attackers, and learns only by interacting with the defense mechanism, it poses a significant threat to current defense procedures

Multidisciplinary Digital Publishing Institute

KITopen

Directory of Open Access Journals

Learning Causal State Representations of Partially Observable Environments

Author: Anandkumar Anima
Azizzadenesheli Kamyar
Furlanello Tommaso
Itti Laurent
Lipton Zachary C.
Pineau Joelle
Pineda Luis
Zhang Amy
Publication venue
Publication date: 25/06/2019
Field of study

Intelligent agents can cope with sensory-rich environments by learning task-agnostic state abstractions. In this paper, we propose mechanisms to approximate causal states, which optimally compress the joint history of actions and observations in partially-observable Markov decision processes. Our proposed algorithm extracts causal state representations from RNNs that are trained to predict subsequent observations given the history. We demonstrate that these learned task-agnostic state abstractions can be used to efficiently learn policies for reinforcement learning problems with rich observation spaces. We evaluate agents using multiple partially observable navigation tasks with both discrete (GridWorld) and continuous (VizDoom, ALE) observation processes that cannot be solved by traditional memory-limited methods. Our experiments demonstrate systematic improvement of the DQN and tabular models using approximate causal state representations with respect to recurrent-DQN baselines trained with raw inputs

Caltech Authors