10,680 research outputs found

    Distributionally Robust Optimization for Sequential Decision Making

    Full text link
    The distributionally robust Markov Decision Process (MDP) approach asks for a distributionally robust policy that achieves the maximal expected total reward under the most adversarial distribution of uncertain parameters. In this paper, we study distributionally robust MDPs where ambiguity sets for the uncertain parameters are of a format that can easily incorporate in its description the uncertainty's generalized moment as well as statistical distance information. In this way, we generalize existing works on distributionally robust MDP with generalized-moment-based and statistical-distance-based ambiguity sets to incorporate information from the former class such as moments and dispersions to the latter class that critically depends on empirical observations of the uncertain parameters. We show that, under this format of ambiguity sets, the resulting distributionally robust MDP remains tractable under mild technical conditions. To be more specific, a distributionally robust policy can be constructed by solving a sequence of one-stage convex optimization subproblems

    Robustness - a challenge also for the 21st century: A review of robustness phenomena in technical, biological and social systems as well as robust approaches in engineering, computer science, operations research and decision aiding

    Get PDF
    Notions on robustness exist in many facets. They come from different disciplines and reflect different worldviews. Consequently, they contradict each other very often, which makes the term less applicable in a general context. Robustness approaches are often limited to specific problems for which they have been developed. This means, notions and definitions might reveal to be wrong if put into another domain of validity, i.e. context. A definition might be correct in a specific context but need not hold in another. Therefore, in order to be able to speak of robustness we need to specify the domain of validity, i.e. system, property and uncertainty of interest. As proofed by Ho et al. in an optimization context with finite and discrete domains, without prior knowledge about the problem there exists no solution what so ever which is more robust than any other. Similar to the results of the No Free Lunch Theorems of Optimization (NLFTs) we have to exploit the problem structure in order to make a solution more robust. This optimization problem is directly linked to a robustness/fragility tradeoff which has been observed in many contexts, e.g. 'robust, yet fragile' property of HOT (Highly Optimized Tolerance) systems. Another issue is that robustness is tightly bounded to other phenomena like complexity for which themselves exist no clear definition or theoretical framework. Consequently, this review rather tries to find common aspects within many different approaches and phenomena than to build a general theorem for robustness, which anyhow might not exist because complex phenomena often need to be described from a pluralistic view to address as many aspects of a phenomenon as possible. First, many different robustness problems have been reviewed from many different disciplines. Second, different common aspects will be discussed, in particular the relationship of functional and structural properties. This paper argues that robustness phenomena are also a challenge for the 21st century. It is a useful quality of a model or system in terms of the 'maintenance of some desired system characteristics despite fluctuations in the behaviour of its component parts or its environment' (s. [Carlson and Doyle, 2002], p. 2). We define robustness phenomena as solution with balanced tradeoffs and robust design principles and robustness measures as means to balance tradeoffs. --

    Differentiable Algorithm Networks for Composable Robot Learning

    Full text link
    This paper introduces the Differentiable Algorithm Network (DAN), a composable architecture for robot learning systems. A DAN is composed of neural network modules, each encoding a differentiable robot algorithm and an associated model; and it is trained end-to-end from data. DAN combines the strengths of model-driven modular system design and data-driven end-to-end learning. The algorithms and models act as structural assumptions to reduce the data requirements for learning; end-to-end learning allows the modules to adapt to one another and compensate for imperfect models and algorithms, in order to achieve the best overall system performance. We illustrate the DAN methodology through a case study on a simulated robot system, which learns to navigate in complex 3-D environments with only local visual observations and an image of a partially correct 2-D floor map.Comment: RSS 2019 camera ready. Video is available at https://youtu.be/4jcYlTSJF4

    Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability

    Full text link
    A trustworthy reinforcement learning algorithm should be competent in solving challenging real-world problems, including {robustly} handling uncertainties, satisfying {safety} constraints to avoid catastrophic failures, and {generalizing} to unseen scenarios during deployments. This study aims to overview these main perspectives of trustworthy reinforcement learning considering its intrinsic vulnerabilities on robustness, safety, and generalizability. In particular, we give rigorous formulations, categorize corresponding methodologies, and discuss benchmarks for each perspective. Moreover, we provide an outlook section to spur promising future directions with a brief discussion on extrinsic vulnerabilities considering human feedback. We hope this survey could bring together separate threads of studies together in a unified framework and promote the trustworthiness of reinforcement learning.Comment: 36 pages, 5 figure

    A review of domain adaptation without target labels

    Full text link
    Domain adaptation has become a prominent problem setting in machine learning and related fields. This review asks the question: how can a classifier learn from a source domain and generalize to a target domain? We present a categorization of approaches, divided into, what we refer to as, sample-based, feature-based and inference-based methods. Sample-based methods focus on weighting individual observations during training based on their importance to the target domain. Feature-based methods revolve around on mapping, projecting and representing features such that a source classifier performs well on the target domain and inference-based methods incorporate adaptation into the parameter estimation procedure, for instance through constraints on the optimization procedure. Additionally, we review a number of conditions that allow for formulating bounds on the cross-domain generalization error. Our categorization highlights recurring ideas and raises questions important to further research.Comment: 20 pages, 5 figure
    corecore