242 research outputs found
A Unified View of Large-scale Zero-sum Equilibrium Computation
The task of computing approximate Nash equilibria in large zero-sum
extensive-form games has received a tremendous amount of attention due mainly
to the Annual Computer Poker Competition. Immediately after its inception, two
competing and seemingly different approaches emerged---one an application of
no-regret online learning, the other a sophisticated gradient method applied to
a convex-concave saddle-point formulation. Since then, both approaches have
grown in relative isolation with advancements on one side not effecting the
other. In this paper, we rectify this by dissecting and, in a sense, unify the
two views.Comment: AAAI Workshop on Computer Poker and Imperfect Informatio
Generalized Boosting Algorithms for Convex Optimization
Boosting is a popular way to derive powerful learners from simpler hypothesis
classes. Following previous work (Mason et al., 1999; Friedman, 2000) on
general boosting frameworks, we analyze gradient-based descent algorithms for
boosting with respect to any convex objective and introduce a new measure of
weak learner performance into this setting which generalizes existing work. We
present the weak to strong learning guarantees for the existing gradient
boosting work for strongly-smooth, strongly-convex objectives under this new
measure of performance, and also demonstrate that this work fails for
non-smooth objectives. To address this issue, we present new algorithms which
extend this boosting approach to arbitrary convex loss functions and give
corresponding weak to strong convergence results. In addition, we demonstrate
experimental results that support our analysis and demonstrate the need for the
new algorithms we present.Comment: Extended version of paper presented at the International Conference
on Machine Learning, 2011. 9 pages + appendix with proof
Shared Autonomy via Hindsight Optimization
In shared autonomy, user input and robot autonomy are combined to control a
robot to achieve a goal. Often, the robot does not know a priori which goal the
user wants to achieve, and must both predict the user's intended goal, and
assist in achieving that goal. We formulate the problem of shared autonomy as a
Partially Observable Markov Decision Process with uncertainty over the user's
goal. We utilize maximum entropy inverse optimal control to estimate a
distribution over the user's goal based on the history of inputs. Ideally, the
robot assists the user by solving for an action which minimizes the expected
cost-to-go for the (unknown) goal. As solving the POMDP to select the optimal
action is intractable, we use hindsight optimization to approximate the
solution. In a user study, we compare our method to a standard
predict-then-blend approach. We find that our method enables users to
accomplish tasks more quickly while utilizing less input. However, when asked
to rate each system, users were mixed in their assessment, citing a tradeoff
between maintaining control authority and accomplishing tasks quickly
- …