10,680 research outputs found
Distributionally Robust Optimization for Sequential Decision Making
The distributionally robust Markov Decision Process (MDP) approach asks for a
distributionally robust policy that achieves the maximal expected total reward
under the most adversarial distribution of uncertain parameters. In this paper,
we study distributionally robust MDPs where ambiguity sets for the uncertain
parameters are of a format that can easily incorporate in its description the
uncertainty's generalized moment as well as statistical distance information.
In this way, we generalize existing works on distributionally robust MDP with
generalized-moment-based and statistical-distance-based ambiguity sets to
incorporate information from the former class such as moments and dispersions
to the latter class that critically depends on empirical observations of the
uncertain parameters. We show that, under this format of ambiguity sets, the
resulting distributionally robust MDP remains tractable under mild technical
conditions. To be more specific, a distributionally robust policy can be
constructed by solving a sequence of one-stage convex optimization subproblems
Robustness - a challenge also for the 21st century: A review of robustness phenomena in technical, biological and social systems as well as robust approaches in engineering, computer science, operations research and decision aiding
Notions on robustness exist in many facets. They come from different disciplines and reflect different worldviews. Consequently, they contradict each other very often, which makes the term less applicable in a general context. Robustness approaches are often limited to specific problems for which they have been developed. This means, notions and definitions might reveal to be wrong if put into another domain of validity, i.e. context. A definition might be correct in a specific context but need not hold in another. Therefore, in order to be able to speak of robustness we need to specify the domain of validity, i.e. system, property and uncertainty of interest. As proofed by Ho et al. in an optimization context with finite and discrete domains, without prior knowledge about the problem there exists no solution what so ever which is more robust than any other. Similar to the results of the No Free Lunch Theorems of Optimization (NLFTs) we have to exploit the problem structure in order to make a solution more robust. This optimization problem is directly linked to a robustness/fragility tradeoff which has been observed in many contexts, e.g. 'robust, yet fragile' property of HOT (Highly Optimized Tolerance) systems. Another issue is that robustness is tightly bounded to other phenomena like complexity for which themselves exist no clear definition or theoretical framework. Consequently, this review rather tries to find common aspects within many different approaches and phenomena than to build a general theorem for robustness, which anyhow might not exist because complex phenomena often need to be described from a pluralistic view to address as many aspects of a phenomenon as possible. First, many different robustness problems have been reviewed from many different disciplines. Second, different common aspects will be discussed, in particular the relationship of functional and structural properties. This paper argues that robustness phenomena are also a challenge for the 21st century. It is a useful quality of a model or system in terms of the 'maintenance of some desired system characteristics despite fluctuations in the behaviour of its component parts or its environment' (s. [Carlson and Doyle, 2002], p. 2). We define robustness phenomena as solution with balanced tradeoffs and robust design principles and robustness measures as means to balance tradeoffs. --
Differentiable Algorithm Networks for Composable Robot Learning
This paper introduces the Differentiable Algorithm Network (DAN), a
composable architecture for robot learning systems. A DAN is composed of neural
network modules, each encoding a differentiable robot algorithm and an
associated model; and it is trained end-to-end from data. DAN combines the
strengths of model-driven modular system design and data-driven end-to-end
learning. The algorithms and models act as structural assumptions to reduce the
data requirements for learning; end-to-end learning allows the modules to adapt
to one another and compensate for imperfect models and algorithms, in order to
achieve the best overall system performance. We illustrate the DAN methodology
through a case study on a simulated robot system, which learns to navigate in
complex 3-D environments with only local visual observations and an image of a
partially correct 2-D floor map.Comment: RSS 2019 camera ready. Video is available at
https://youtu.be/4jcYlTSJF4
Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability
A trustworthy reinforcement learning algorithm should be competent in solving
challenging real-world problems, including {robustly} handling uncertainties,
satisfying {safety} constraints to avoid catastrophic failures, and
{generalizing} to unseen scenarios during deployments. This study aims to
overview these main perspectives of trustworthy reinforcement learning
considering its intrinsic vulnerabilities on robustness, safety, and
generalizability. In particular, we give rigorous formulations, categorize
corresponding methodologies, and discuss benchmarks for each perspective.
Moreover, we provide an outlook section to spur promising future directions
with a brief discussion on extrinsic vulnerabilities considering human
feedback. We hope this survey could bring together separate threads of studies
together in a unified framework and promote the trustworthiness of
reinforcement learning.Comment: 36 pages, 5 figure
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
- …