Search CORE

17 research outputs found

Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning

Author: How Jonathan P
Shen Macheng
Publication venue
Publication date: 18/09/2019
Field of study

We pose an active perception problem where an autonomous agent actively interacts with a second agent with potentially adversarial behaviors. Given the uncertainty in the intent of the other agent, the objective is to collect further evidence to help discriminate potential threats. The main technical challenges are the partial observability of the agent intent, the adversary modeling, and the corresponding uncertainty modeling. Note that an adversary agent may act to mislead the autonomous agent by using a deceptive strategy that is learned from past experiences. We propose an approach that combines belief space planning, generative adversary modeling, and maximum entropy reinforcement learning to obtain a stochastic belief space policy. By accounting for various adversarial behaviors in the simulation framework and minimizing the predictability of the autonomous agent's action, the resulting policy is more robust to unmodeled adversarial strategies. This improved robustness is empirically shown against an adversary that adapts to and exploits the autonomous agent's policy when compared with a standard Chance-Constraint Partially Observable Markov Decision Process robust approach

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Learning environment properties in Partially Observable Monte Carlo Planning

Author: A. Castellini
A. Farinelli
E. Marchesini
M. Piccinelli
M. Zuccotto
Publication venue
Publication date: 01/01/2022
Field of study

We tackle the problem of learning state-variable relationships in Partially Observable Markov Decision Processes to improve planning performance on mobile robots. The proposed approach extends Partially Observable Monte Carlo Planning (POMCP) and represents state-variable relationships with Markov Random Fields. A ROS-based implementation of the approach is proposed and evaluated in rocksample, a standard benchmark for probabilistic planning under uncertainty. Experiments have been performed in simulation with Gazebo. Results show that the proposed approach allows to effectively learn state- variable probabilistic constraints on ROS-based robotic platforms and to use them in subsequent episodes to outperform standard POMC

Catalogo dei prodotti della ricerca

Deep Variational Reinforcement Learning for POMDPs

Author: Igl Maximilian
Le Tuan Anh
Whiteson Shimon
Wood Frank
Zintgraf Luisa
Publication venue
Publication date: 01/01/2018
Field of study

Many real-world sequential decision making problems are partially observable by nature, and the environment model is typically unknown. Consequently, there is great need for reinforcement learning methods that can tackle such problems given only a stream of incomplete and noisy observations. In this paper, we propose deep variational reinforcement learning (DVRL), which introduces an inductive bias that allows an agent to learn a generative model of the environment and perform inference in that model to effectively aggregate the available information. We develop an n-step approximation to the evidence lower bound (ELBO), allowing the model to be trained jointly with the policy. This ensures that the latent state representation is suitable for the control task. In experiments on Mountain Hike and flickering Atari we show that our method outperforms previous approaches relying on recurrent neural networks to encode the past

arXiv.org e-Print Archive

Oxford University Research Archive

Navigation between states in ecological communities by taking shortcuts, with application to control

Author: Blonder Benjamin W.
Lim Michael H.
Sunberg Zachary
Tomlin Claire
Publication venue
Publication date: 15/04/2022
Field of study

Many community ecology problems can be framed in terms of controlling the transition from an initial state to a desired state. However, it is often unclear what action sequence (if any) would yield the desired state. Here we develop a simple approach for navigating to desired states, applicable when the costs and outcomes of actions are known. We find lowest-cost action sequences (adding a species, removing a species, changing the environment, waiting) via A* search on a state diagram. Lowest-cost sequences usually are indirect and leverage waiting for natural transitions caused by competitive exclusion. In tests on simulated and empirical data across taxa, our approach provides ~50% probability of substantial cost improvement relative to nominal approaches. As an example, numerous successes are predicted in gut microbial communities for removing the pathogen Clostridium difficile. This work thus provides a conceptual foundation for efficient state transitions in species-rich communities

arXiv.org e-Print Archive

Sensor Control for Information Gain in Dynamic, Sparse and Partially Observed Environments

Author: Burns J. Brian
Sadhu Vidyasagar
Sequeira Pedro
Sundaresan Aravind
Publication venue
Publication date: 02/11/2022
Field of study

We present an approach for autonomous sensor control for information gathering under partially observable, dynamic and sparsely sampled environments. We consider the problem of controlling a sensor that makes partial observations in some space of interest such that it maximizes information about entities present in that space. We describe our approach for the task of Radio-Frequency (RF) spectrum monitoring, where the goal is to search for and track unknown, dynamic signals in the environment. To this end, we develop and demonstrate enhancements of the Deep Anticipatory Network (DAN) Reinforcement Learning (RL) framework that uses prediction and information-gain rewards to learn information-maximization policies in reward-sparse environments. We also extend this problem to situations in which taking samples from the actual RF spectrum/field is limited and expensive, and propose a model-based version of the original RL algorithm that fine-tunes the controller using a model of the environment that is iteratively improved from limited samples taken from the RF field. Our approach was thoroughly validated by testing against baseline expert-designed controllers in simulated RF environments of different complexity, using different rewards schemes and evaluation metrics. The results show that our system outperforms the standard DAN architecture and is more flexible and robust than several hand-coded agents. We also show that our approach is adaptable to non-stationary environments where the agent has to learn to adapt to changes from the emitting sources.Comment: 13 page

arXiv.org e-Print Archive