3,454 research outputs found
A Deterministic Analysis of an Online Convex Mixture of Expert Algorithms
Cataloged from PDF version of article.We analyze an online learning algorithm that adaptively
combines outputs of two constituent algorithms (or the
experts) running in parallel to model an unknown desired signal.
This online learning algorithm is shown to achieve (and in some
cases outperform) the mean-square error (MSE) performance of
the best constituent algorithm in the mixture in the steady-state.
However, the MSE analysis of this algorithm in the literature
uses approximations and relies on statistical models on the
underlying signals and systems. Hence, such an analysis may not
be useful or valid for signals generated by various real life systems
that show high degrees of nonstationarity, limit cycles and, in
many cases, that are even chaotic. In this paper, we produce
results in an individual sequence manner. In particular, we relate
the time-accumulated squared estimation error of this online
algorithm at any time over any interval to the time-accumulated
squared estimation error of the optimal convex mixture of the
constituent algorithms directly tuned to the underlying signal
in a deterministic sense without any statistical assumptions. In
this sense, our analysis provides the transient, steady-state and
tracking behavior of this algorithm in a strong sense without any
approximations in the derivations or statistical assumptions on
the underlying signals such that our results are guaranteed to
hold. We illustrate the introduced results through examples. © 2012 IEEE
A Second-order Bound with Excess Losses
We study online aggregation of the predictions of experts, and first show new
second-order regret bounds in the standard setting, which are obtained via a
version of the Prod algorithm (and also a version of the polynomially weighted
average algorithm) with multiple learning rates. These bounds are in terms of
excess losses, the differences between the instantaneous losses suffered by the
algorithm and the ones of a given expert. We then demonstrate the interest of
these bounds in the context of experts that report their confidences as a
number in the interval [0,1] using a generic reduction to the standard setting.
We conclude by two other applications in the standard setting, which improve
the known bounds in case of small excess losses and show a bounded regret
against i.i.d. sequences of losses
Relax and Localize: From Value to Algorithms
We show a principled way of deriving online learning algorithms from a
minimax analysis. Various upper bounds on the minimax value, previously thought
to be non-constructive, are shown to yield algorithms. This allows us to
seamlessly recover known methods and to derive new ones. Our framework also
captures such "unorthodox" methods as Follow the Perturbed Leader and the R^2
forecaster. We emphasize that understanding the inherent complexity of the
learning problem leads to the development of algorithms.
We define local sequential Rademacher complexities and associated algorithms
that allow us to obtain faster rates in online learning, similarly to
statistical learning theory. Based on these localized complexities we build a
general adaptive method that can take advantage of the suboptimality of the
observed sequence.
We present a number of new algorithms, including a family of randomized
methods that use the idea of a "random playout". Several new versions of the
Follow-the-Perturbed-Leader algorithms are presented, as well as methods based
on the Littlestone's dimension, efficient methods for matrix completion with
trace norm, and algorithms for the problems of transductive learning and
prediction with static experts
Massively Scalable Inverse Reinforcement Learning in Google Maps
Optimizing for humans' latent preferences is a grand challenge in route
recommendation, where globally-scalable solutions remain an open problem.
Although past work created increasingly general solutions for the application
of inverse reinforcement learning (IRL), these have not been successfully
scaled to world-sized MDPs, large datasets, and highly parameterized models;
respectively hundreds of millions of states, trajectories, and parameters. In
this work, we surpass previous limitations through a series of advancements
focused on graph compression, parallelization, and problem initialization based
on dominant eigenvectors. We introduce Receding Horizon Inverse Planning
(RHIP), which generalizes existing work and enables control of key performance
trade-offs via its planning horizon. Our policy achieves a 16-24% improvement
in global route quality, and, to our knowledge, represents the largest instance
of IRL in a real-world setting to date. Our results show critical benefits to
more sustainable modes of transportation (e.g. two-wheelers), where factors
beyond journey time (e.g. route safety) play a substantial role. We conclude
with ablations of key components, negative results on state-of-the-art
eigenvalue solvers, and identify future opportunities to improve scalability
via IRL-specific batching strategies
- …