12 research outputs found
Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning
We pose an active perception problem where an autonomous agent actively
interacts with a second agent with potentially adversarial behaviors. Given the
uncertainty in the intent of the other agent, the objective is to collect
further evidence to help discriminate potential threats. The main technical
challenges are the partial observability of the agent intent, the adversary
modeling, and the corresponding uncertainty modeling. Note that an adversary
agent may act to mislead the autonomous agent by using a deceptive strategy
that is learned from past experiences. We propose an approach that combines
belief space planning, generative adversary modeling, and maximum entropy
reinforcement learning to obtain a stochastic belief space policy. By
accounting for various adversarial behaviors in the simulation framework and
minimizing the predictability of the autonomous agent's action, the resulting
policy is more robust to unmodeled adversarial strategies. This improved
robustness is empirically shown against an adversary that adapts to and
exploits the autonomous agent's policy when compared with a standard
Chance-Constraint Partially Observable Markov Decision Process robust approach
Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability
A trustworthy reinforcement learning algorithm should be competent in solving
challenging real-world problems, including {robustly} handling uncertainties,
satisfying {safety} constraints to avoid catastrophic failures, and
{generalizing} to unseen scenarios during deployments. This study aims to
overview these main perspectives of trustworthy reinforcement learning
considering its intrinsic vulnerabilities on robustness, safety, and
generalizability. In particular, we give rigorous formulations, categorize
corresponding methodologies, and discuss benchmarks for each perspective.
Moreover, we provide an outlook section to spur promising future directions
with a brief discussion on extrinsic vulnerabilities considering human
feedback. We hope this survey could bring together separate threads of studies
together in a unified framework and promote the trustworthiness of
reinforcement learning.Comment: 36 pages, 5 figure
Configurable Markov Decision Processes
In many real-world problems, there is the possibility to configure, to a limited extent, some environmental parameters to improve the performance of a learning agent. In this paper, we propose a novel framework, Configurable Markov Decision Processes (Conf-MDPs), to model this new type of interaction with the environment. Furthermore, we provide a new learning algorithm, Safe Policy-Model Iteration (SPMI), to jointly and adaptively optimize the policy and the environment configuration. After having introduced our approach and derived some theoretical results, we present the experimental evaluation in two explicative problems to show the benefits of the environment configurability on the performance of the learned policy
Configurable Markov Decision Processes
In many real-world problems, there is the possibility to configure, to a
limited extent, some environmental parameters to improve the performance of a
learning agent. In this paper, we propose a novel framework, Configurable
Markov Decision Processes (Conf-MDPs), to model this new type of interaction
with the environment. Furthermore, we provide a new learning algorithm, Safe
Policy-Model Iteration (SPMI), to jointly and adaptively optimize the policy
and the environment configuration. After having introduced our approach and
derived some theoretical results, we present the experimental evaluation in two
explicative problems to show the benefits of the environment configurability on
the performance of the learned policy
Distributionally Robust Optimization in Sequential Decision Making
Distributionally robust optimization (DRO) is an effective modeling paradigm for making optimal decisions under uncertainty, where distributional information about the random parameters in a problem of interest is hardly available at the time when decisions are made. DRO encompasses conventional modeling approaches such as stochastic programming and robust optimization for decision making under uncertainty. The former requires perfect or near-perfect knowledge about the statistics of the random parameters for accurate decision making, while the latter only assumes that the supports of the random parameters are known, which often leads to overly conservative solutions. DRO overcomes these concerns by optimizing the expected value or a risk measure of the worst-case distribution in a set of distributions where the true distribution is contained with high probability. In this dissertation, we apply the DRO techniques to various types of sequential decision-making models and explore the capability of the new models for producing reliable and also economic decisions under different settings of data-decision interactions.
In Chapter II, we consider a distributionally robust variant of a partially observable Markov decision process (POMDP), where the transition-observation probabilities are uncertain. We assume that these parameters differ over time and are revealed at the end of each time step. We construct an algorithm to find an optimal policy by iteratively updating the upper and lower bounds of the value function. We demonstrate the use of distributionally robust POMDP in an application of epidemic control when the probability of true infection status is unknown as well as prevention and/or intervention decisions have to be made sequentially and robustly with updated information. In Chapter III, we derive a Wasserstein distance to bound between the true and an empirical distribution when the states and actions of a dynamic sequential decision-making process are finite. We further apply the approach to a regret-based reinforcement learning problem that uses the principle of optimism under uncertainty, and compare the empirical performance of the optimal solution to our model with the conventional approach by testing instances of an ambulance dispatch problem. Finally, in Chapter IV, we focus on a multistage mixed-integer stochastic programming model, and employ a dual decomposition algorithm for solving a distributionally robust variant of the model. We analyze the numerical performance through instances of a transmission expansion problem in power systems under the uncertainty of loads and renewable generation capabilities.
Overall, the contributions of this dissertation are threefold. First, we develop mathematical models of various distributionally robust sequential decision making problems, some of which involve discrete decision variables and are generally NP-hard. Second, we derive efficient solution algorithms to solve the proposed models via relaxation and decomposition techniques. Third, we evaluate the performance of solution approaches and their results via extensive numerical experiments based on epidemic control, healthcare, and energy applications. The models and solution algorithms developed in this work can be used by practitioners to solve a variety of sequential decision making problems in different business contexts, and thus can generate significant societal and economic impacts.PHDIndustrial & Operations EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/167947/1/nakaoh_1.pd
Formal Methods for Autonomous Systems
Formal methods refer to rigorous, mathematical approaches to system
development and have played a key role in establishing the correctness of
safety-critical systems. The main building blocks of formal methods are models
and specifications, which are analogous to behaviors and requirements in system
design and give us the means to verify and synthesize system behaviors with
formal guarantees.
This monograph provides a survey of the current state of the art on
applications of formal methods in the autonomous systems domain. We consider
correct-by-construction synthesis under various formulations, including closed
systems, reactive, and probabilistic settings. Beyond synthesizing systems in
known environments, we address the concept of uncertainty and bound the
behavior of systems that employ learning using formal methods. Further, we
examine the synthesis of systems with monitoring, a mitigation technique for
ensuring that once a system deviates from expected behavior, it knows a way of
returning to normalcy. We also show how to overcome some limitations of formal
methods themselves with learning. We conclude with future directions for formal
methods in reinforcement learning, uncertainty, privacy, explainability of
formal methods, and regulation and certification