We consider a variant of the classical online linear
optimization problem in which at every step,
the online player receives a “hint” vector before
choosing the action for that round. Rather surprisingly,
it was shown that if the hint vector is
guaranteed to have a positive correlation with the
cost vector, then the online player can achieve
a regret of O(log T), thus significantly improving
over the O(pT) regret in the general setting.
However, the result and analysis require the correlation
property at all time steps, thus raising the
natural question: can we design online learning
algorithms that are resilient to bad hints?
In this paper we develop algorithms and nearly
matching lower bounds for online learning with
imperfect directional hints. Our algorithms are
oblivious to the quality of the hints, and the regret
bounds interpolate between the always-correlated
hints case and the no-hints case. Our results also
generalize, simplify, and improve upon previous
results on optimistic regret bounds, which can be
viewed as an additive version of hints.http://proceedings.mlr.press/v119/bhaskara20a.htmlPublished versio