We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the log-loss defined in [AR09], which is parameterized by a scalar α. We prove that the regret of NEWTRON is O(log T) when α is a constant that does not vary with horizon T, and at most O(T 2/3) if α is allowed to increase to infinity with T. For α = O(log T), the regret is bounded by O ( √ T), thus solving the open problem of [KSST08, AR09]. Our algorithm is based on a novel application of the online Newton method [HAK07]. We test our algorithm and show it to perform well in experiments, even when α is a small constant.
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.