4,060 research outputs found
Woodroofe's one-armed bandit problem revisited
We consider the one-armed bandit problem of Woodroofe [J. Amer. Statist.
Assoc. 74 (1979) 799--806], which involves sequential sampling from two
populations: one whose characteristics are known, and one which depends on an
unknown parameter and incorporates a covariate. The goal is to maximize
cumulative expected reward. We study this problem in a minimax setting, and
develop rate-optimal polices that involve suitable modifications of the myopic
rule. It is shown that the regret, as well as the rate of sampling from the
inferior population, can be finite or grow at various rates with the time
horizon of the problem, depending on "local" properties of the covariate
distribution. Proofs rely on martingale methods and information theoretic
arguments.Comment: Published in at http://dx.doi.org/10.1214/08-AAP589 the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Resource Constrained Structured Prediction
We study the problem of structured prediction under test-time budget
constraints. We propose a novel approach applicable to a wide range of
structured prediction problems in computer vision and natural language
processing. Our approach seeks to adaptively generate computationally costly
features during test-time in order to reduce the computational cost of
prediction while maintaining prediction performance. We show that training the
adaptive feature generation system can be reduced to a series of structured
learning problems, resulting in efficient training using existing structured
learning algorithms. This framework provides theoretical justification for
several existing heuristic approaches found in literature. We evaluate our
proposed adaptive system on two structured prediction tasks, optical character
recognition (OCR) and dependency parsing and show strong performance in
reduction of the feature costs without degrading accuracy
Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning
Skilled robotic manipulation benefits from complex synergies between
non-prehensile (e.g. pushing) and prehensile (e.g. grasping) actions: pushing
can help rearrange cluttered objects to make space for arms and fingers;
likewise, grasping can help displace objects to make pushing movements more
precise and collision-free. In this work, we demonstrate that it is possible to
discover and learn these synergies from scratch through model-free deep
reinforcement learning. Our method involves training two fully convolutional
networks that map from visual observations to actions: one infers the utility
of pushes for a dense pixel-wise sampling of end effector orientations and
locations, while the other does the same for grasping. Both networks are
trained jointly in a Q-learning framework and are entirely self-supervised by
trial and error, where rewards are provided from successful grasps. In this
way, our policy learns pushing motions that enable future grasps, while
learning grasps that can leverage past pushes. During picking experiments in
both simulation and real-world scenarios, we find that our system quickly
learns complex behaviors amid challenging cases of clutter, and achieves better
grasping success rates and picking efficiencies than baseline alternatives
after only a few hours of training. We further demonstrate that our method is
capable of generalizing to novel objects. Qualitative results (videos), code,
pre-trained models, and simulation environments are available at
http://vpg.cs.princeton.eduComment: To appear at the International Conference On Intelligent Robots and
Systems (IROS) 2018. Project webpage: http://vpg.cs.princeton.edu Summary
video: https://youtu.be/-OkyX7Zlhi
The Sequencing Problem in Sequential Investigation Processes
Many decision problems in various fields of application can be characterized as diagnostic problems trying to assess the true state (of the world) of given cases. The investigation of assessment criteria improves the initial information according to observed signal outcomes, which are related to the possible states. Such sequential investigation processes can be analyzed within the framework of statistical decision theory, in which prior probability distributions of classes of cases are updated, allowing for a sorting of particular cases into ever smaller subclasses. However, receiving such information causes investigation costs. Besides the question about the set of relevant criteria, this defines two additional problems of statistical decision problems: the optimal stopping of investigations and the optimal sequence of investigating a given set of criteria. Unfortunately, no solution exists with which the optimal sequence can generally be determined. Therefore, the paper characterizes the associated problems and analyzes existing heuristics trying to approximate an optimal solution.Decision-Making, Uncertainty, Information, Bayesian Analysis, Statistical Decision Theory
Bayesian fairness
We consider the problem of how decision making can be fair when the
underlying probabilistic model of the world is not known with certainty. We
argue that recent notions of fairness in machine learning need to explicitly
incorporate parameter uncertainty, hence we introduce the notion of {\em
Bayesian fairness} as a suitable candidate for fair decision rules. Using
balance, a definition of fairness introduced by Kleinberg et al (2016), we show
how a Bayesian perspective can lead to well-performing, fair decision rules
even under high uncertainty.Comment: 13 pages, 8 figures, to appear at AAAI 201
- …