21 research outputs found
Adaptive Hedge
Most methods for decision-theoretic online learning are based on the Hedge
algorithm, which takes a parameter called the learning rate. In most previous
analyses the learning rate was carefully tuned to obtain optimal worst-case
performance, leading to suboptimal performance on easy instances, for example
when there exists an action that is significantly better than all others. We
propose a new way of setting the learning rate, which adapts to the difficulty
of the learning problem: in the worst case our procedure still guarantees
optimal performance, but on easy instances it achieves much smaller regret. In
particular, our adaptive method achieves constant regret in a probabilistic
setting, when there exists an action that on average obtains strictly smaller
loss than all other actions. We also provide a simulation study comparing our
approach to existing methods.Comment: This is the full version of the paper with the same name that will
appear in Advances in Neural Information Processing Systems 24 (NIPS 2011),
2012. The two papers are identical, except that this version contains an
extra section of Additional Materia
Optimal Allocation Strategies for the Dark Pool Problem
We study the problem of allocating stocks to dark pools. We propose and
analyze an optimal approach for allocations, if continuous-valued allocations
are allowed. We also propose a modification for the case when only
integer-valued allocations are possible. We extend the previous work on this
problem to adversarial scenarios, while also improving on their results in the
iid setup. The resulting algorithms are efficient, and perform well in
simulations under stochastic and adversarial inputs
Generalized Mixability via Entropic Duality
Mixability is a property of a loss which characterizes when fast convergence
is possible in the game of prediction with expert advice. We show that a key
property of mixability generalizes, and the exp and log operations present in
the usual theory are not as special as one might have thought. In doing this we
introduce a more general notion of -mixability where is a general
entropy (\ie, any convex function on probabilities). We show how a property
shared by the convex dual of any such entropy yields a natural algorithm (the
minimizer of a regret bound) which, analogous to the classical aggregating
algorithm, is guaranteed a constant regret when used with -mixable
losses. We characterize precisely which have -mixable losses and
put forward a number of conjectures about the optimality and relationships
between different choices of entropy.Comment: 20 pages, 1 figure. Supersedes the work in arXiv:1403.2433 [cs.LG
Prediction with Expert Advice under Discounted Loss
We study prediction with expert advice in the setting where the losses are
accumulated with some discounting---the impact of old losses may gradually
vanish. We generalize the Aggregating Algorithm and the Aggregating Algorithm
for Regression to this case, propose a suitable new variant of exponential
weights algorithm, and prove respective loss bounds.Comment: 26 pages; expanded (2 remarks -> theorems), some misprints correcte
Sparse Regression Learning by Aggregation and Langevin Monte-Carlo
We consider the problem of regression learning for deterministic design and
independent random errors. We start by proving a sharp PAC-Bayesian type bound
for the exponentially weighted aggregate (EWA) under the expected squared
empirical loss. For a broad class of noise distributions the presented bound is
valid whenever the temperature parameter of the EWA is larger than or
equal to , where is the noise variance. A remarkable
feature of this result is that it is valid even for unbounded regression
functions and the choice of the temperature parameter depends exclusively on
the noise level. Next, we apply this general bound to the problem of
aggregating the elements of a finite-dimensional linear space spanned by a
dictionary of functions . We allow to be much larger
than the sample size but we assume that the true regression function can be
well approximated by a sparse linear combination of functions . Under
this sparsity scenario, we propose an EWA with a heavy tailed prior and we show
that it satisfies a sparsity oracle inequality with leading constant one.
Finally, we propose several Langevin Monte-Carlo algorithms to approximately
compute such an EWA when the number of aggregated functions can be large.
We discuss in some detail the convergence of these algorithms and present
numerical experiments that confirm our theoretical findings.Comment: Short version published in COLT 200
Universal Codes from Switching Strategies
We discuss algorithms for combining sequential prediction strategies, a task
which can be viewed as a natural generalisation of the concept of universal
coding. We describe a graphical language based on Hidden Markov Models for
defining prediction strategies, and we provide both existing and new models as
examples. The models include efficient, parameterless models for switching
between the input strategies over time, including a model for the case where
switches tend to occur in clusters, and finally a new model for the scenario
where the prediction strategies have a known relationship, and where jumps are
typically between strongly related ones. This last model is relevant for coding
time series data where parameter drift is expected. As theoretical ontributions
we introduce an interpolation construction that is useful in the development
and analysis of new algorithms, and we establish a new sophisticated lemma for
analysing the individual sequence regret of parameterised models