12 research outputs found

    Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning

    Full text link
    We pose an active perception problem where an autonomous agent actively interacts with a second agent with potentially adversarial behaviors. Given the uncertainty in the intent of the other agent, the objective is to collect further evidence to help discriminate potential threats. The main technical challenges are the partial observability of the agent intent, the adversary modeling, and the corresponding uncertainty modeling. Note that an adversary agent may act to mislead the autonomous agent by using a deceptive strategy that is learned from past experiences. We propose an approach that combines belief space planning, generative adversary modeling, and maximum entropy reinforcement learning to obtain a stochastic belief space policy. By accounting for various adversarial behaviors in the simulation framework and minimizing the predictability of the autonomous agent's action, the resulting policy is more robust to unmodeled adversarial strategies. This improved robustness is empirically shown against an adversary that adapts to and exploits the autonomous agent's policy when compared with a standard Chance-Constraint Partially Observable Markov Decision Process robust approach

    Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability

    Full text link
    A trustworthy reinforcement learning algorithm should be competent in solving challenging real-world problems, including {robustly} handling uncertainties, satisfying {safety} constraints to avoid catastrophic failures, and {generalizing} to unseen scenarios during deployments. This study aims to overview these main perspectives of trustworthy reinforcement learning considering its intrinsic vulnerabilities on robustness, safety, and generalizability. In particular, we give rigorous formulations, categorize corresponding methodologies, and discuss benchmarks for each perspective. Moreover, we provide an outlook section to spur promising future directions with a brief discussion on extrinsic vulnerabilities considering human feedback. We hope this survey could bring together separate threads of studies together in a unified framework and promote the trustworthiness of reinforcement learning.Comment: 36 pages, 5 figure

    Configurable Markov Decision Processes

    Get PDF
    In many real-world problems, there is the possibility to configure, to a limited extent, some environmental parameters to improve the performance of a learning agent. In this paper, we propose a novel framework, Configurable Markov Decision Processes (Conf-MDPs), to model this new type of interaction with the environment. Furthermore, we provide a new learning algorithm, Safe Policy-Model Iteration (SPMI), to jointly and adaptively optimize the policy and the environment configuration. After having introduced our approach and derived some theoretical results, we present the experimental evaluation in two explicative problems to show the benefits of the environment configurability on the performance of the learned policy

    Configurable Markov Decision Processes

    Get PDF
    In many real-world problems, there is the possibility to configure, to a limited extent, some environmental parameters to improve the performance of a learning agent. In this paper, we propose a novel framework, Configurable Markov Decision Processes (Conf-MDPs), to model this new type of interaction with the environment. Furthermore, we provide a new learning algorithm, Safe Policy-Model Iteration (SPMI), to jointly and adaptively optimize the policy and the environment configuration. After having introduced our approach and derived some theoretical results, we present the experimental evaluation in two explicative problems to show the benefits of the environment configurability on the performance of the learned policy

    Distributionally Robust Optimization in Sequential Decision Making

    Full text link
    Distributionally robust optimization (DRO) is an effective modeling paradigm for making optimal decisions under uncertainty, where distributional information about the random parameters in a problem of interest is hardly available at the time when decisions are made. DRO encompasses conventional modeling approaches such as stochastic programming and robust optimization for decision making under uncertainty. The former requires perfect or near-perfect knowledge about the statistics of the random parameters for accurate decision making, while the latter only assumes that the supports of the random parameters are known, which often leads to overly conservative solutions. DRO overcomes these concerns by optimizing the expected value or a risk measure of the worst-case distribution in a set of distributions where the true distribution is contained with high probability. In this dissertation, we apply the DRO techniques to various types of sequential decision-making models and explore the capability of the new models for producing reliable and also economic decisions under different settings of data-decision interactions. In Chapter II, we consider a distributionally robust variant of a partially observable Markov decision process (POMDP), where the transition-observation probabilities are uncertain. We assume that these parameters differ over time and are revealed at the end of each time step. We construct an algorithm to find an optimal policy by iteratively updating the upper and lower bounds of the value function. We demonstrate the use of distributionally robust POMDP in an application of epidemic control when the probability of true infection status is unknown as well as prevention and/or intervention decisions have to be made sequentially and robustly with updated information. In Chapter III, we derive a Wasserstein distance to bound between the true and an empirical distribution when the states and actions of a dynamic sequential decision-making process are finite. We further apply the approach to a regret-based reinforcement learning problem that uses the principle of optimism under uncertainty, and compare the empirical performance of the optimal solution to our model with the conventional approach by testing instances of an ambulance dispatch problem. Finally, in Chapter IV, we focus on a multistage mixed-integer stochastic programming model, and employ a dual decomposition algorithm for solving a distributionally robust variant of the model. We analyze the numerical performance through instances of a transmission expansion problem in power systems under the uncertainty of loads and renewable generation capabilities. Overall, the contributions of this dissertation are threefold. First, we develop mathematical models of various distributionally robust sequential decision making problems, some of which involve discrete decision variables and are generally NP-hard. Second, we derive efficient solution algorithms to solve the proposed models via relaxation and decomposition techniques. Third, we evaluate the performance of solution approaches and their results via extensive numerical experiments based on epidemic control, healthcare, and energy applications. The models and solution algorithms developed in this work can be used by practitioners to solve a variety of sequential decision making problems in different business contexts, and thus can generate significant societal and economic impacts.PHDIndustrial & Operations EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/167947/1/nakaoh_1.pd

    Formal Methods for Autonomous Systems

    Full text link
    Formal methods refer to rigorous, mathematical approaches to system development and have played a key role in establishing the correctness of safety-critical systems. The main building blocks of formal methods are models and specifications, which are analogous to behaviors and requirements in system design and give us the means to verify and synthesize system behaviors with formal guarantees. This monograph provides a survey of the current state of the art on applications of formal methods in the autonomous systems domain. We consider correct-by-construction synthesis under various formulations, including closed systems, reactive, and probabilistic settings. Beyond synthesizing systems in known environments, we address the concept of uncertainty and bound the behavior of systems that employ learning using formal methods. Further, we examine the synthesis of systems with monitoring, a mitigation technique for ensuring that once a system deviates from expected behavior, it knows a way of returning to normalcy. We also show how to overcome some limitations of formal methods themselves with learning. We conclude with future directions for formal methods in reinforcement learning, uncertainty, privacy, explainability of formal methods, and regulation and certification
    corecore