17 research outputs found
Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning
We pose an active perception problem where an autonomous agent actively
interacts with a second agent with potentially adversarial behaviors. Given the
uncertainty in the intent of the other agent, the objective is to collect
further evidence to help discriminate potential threats. The main technical
challenges are the partial observability of the agent intent, the adversary
modeling, and the corresponding uncertainty modeling. Note that an adversary
agent may act to mislead the autonomous agent by using a deceptive strategy
that is learned from past experiences. We propose an approach that combines
belief space planning, generative adversary modeling, and maximum entropy
reinforcement learning to obtain a stochastic belief space policy. By
accounting for various adversarial behaviors in the simulation framework and
minimizing the predictability of the autonomous agent's action, the resulting
policy is more robust to unmodeled adversarial strategies. This improved
robustness is empirically shown against an adversary that adapts to and
exploits the autonomous agent's policy when compared with a standard
Chance-Constraint Partially Observable Markov Decision Process robust approach
Learning environment properties in Partially Observable Monte Carlo Planning
We tackle the problem of learning state-variable relationships in Partially Observable Markov Decision Processes to improve planning performance on mobile robots. The proposed approach extends Partially Observable Monte Carlo Planning (POMCP) and represents state-variable relationships with Markov Random Fields. A ROS-based implementation of the approach is proposed and evaluated in rocksample, a standard benchmark for probabilistic planning under uncertainty. Experiments have been performed in simulation with Gazebo. Results show that the proposed approach allows to effectively learn state- variable probabilistic constraints on ROS-based robotic platforms and to use them in subsequent episodes to outperform standard POMC
Deep Variational Reinforcement Learning for POMDPs
Many real-world sequential decision making problems are partially observable
by nature, and the environment model is typically unknown. Consequently, there
is great need for reinforcement learning methods that can tackle such problems
given only a stream of incomplete and noisy observations. In this paper, we
propose deep variational reinforcement learning (DVRL), which introduces an
inductive bias that allows an agent to learn a generative model of the
environment and perform inference in that model to effectively aggregate the
available information. We develop an n-step approximation to the evidence lower
bound (ELBO), allowing the model to be trained jointly with the policy. This
ensures that the latent state representation is suitable for the control task.
In experiments on Mountain Hike and flickering Atari we show that our method
outperforms previous approaches relying on recurrent neural networks to encode
the past
Navigation between states in ecological communities by taking shortcuts, with application to control
Many community ecology problems can be framed in terms of controlling the
transition from an initial state to a desired state. However, it is often
unclear what action sequence (if any) would yield the desired state. Here we
develop a simple approach for navigating to desired states, applicable when the
costs and outcomes of actions are known. We find lowest-cost action sequences
(adding a species, removing a species, changing the environment, waiting) via
A* search on a state diagram. Lowest-cost sequences usually are indirect and
leverage waiting for natural transitions caused by competitive exclusion. In
tests on simulated and empirical data across taxa, our approach provides ~50%
probability of substantial cost improvement relative to nominal approaches. As
an example, numerous successes are predicted in gut microbial communities for
removing the pathogen Clostridium difficile. This work thus provides a
conceptual foundation for efficient state transitions in species-rich
communities
Sensor Control for Information Gain in Dynamic, Sparse and Partially Observed Environments
We present an approach for autonomous sensor control for information
gathering under partially observable, dynamic and sparsely sampled
environments. We consider the problem of controlling a sensor that makes
partial observations in some space of interest such that it maximizes
information about entities present in that space. We describe our approach for
the task of Radio-Frequency (RF) spectrum monitoring, where the goal is to
search for and track unknown, dynamic signals in the environment. To this end,
we develop and demonstrate enhancements of the Deep Anticipatory Network (DAN)
Reinforcement Learning (RL) framework that uses prediction and information-gain
rewards to learn information-maximization policies in reward-sparse
environments. We also extend this problem to situations in which taking samples
from the actual RF spectrum/field is limited and expensive, and propose a
model-based version of the original RL algorithm that fine-tunes the controller
using a model of the environment that is iteratively improved from limited
samples taken from the RF field. Our approach was thoroughly validated by
testing against baseline expert-designed controllers in simulated RF
environments of different complexity, using different rewards schemes and
evaluation metrics. The results show that our system outperforms the standard
DAN architecture and is more flexible and robust than several hand-coded
agents. We also show that our approach is adaptable to non-stationary
environments where the agent has to learn to adapt to changes from the emitting
sources.Comment: 13 page