7 research outputs found
Parameter-Free Online Convex Optimization with Sub-Exponential Noise
We consider the problem of unconstrained online convex optimization (OCO)
with sub-exponential noise, a strictly more general problem than the standard
OCO. In this setting, the learner receives a subgradient of the loss functions
corrupted by sub-exponential noise and strives to achieve optimal regret
guarantee, without knowledge of the competitor norm, i.e., in a parameter-free
way. Recently, Cutkosky and Boahen (COLT 2017) proved that, given unbounded
subgradients, it is impossible to guarantee a sublinear regret due to an
exponential penalty. This paper shows that it is possible to go around the
lower bound by allowing the observed subgradients to be unbounded via
stochastic noise. However, the presence of unbounded noise in unconstrained OCO
is challenging; existing algorithms do not provide near-optimal regret bounds
or fail to have a guarantee. So, we design a novel parameter-free OCO algorithm
for Banach space, which we call BANCO, via a reduction to betting on noisy
coins. We show that BANCO achieves the optimal regret rate in our problem.
Finally, we show the application of our results to obtain a parameter-free
locally private stochastic subgradient descent algorithm, and the connection to
the law of iterated logarithms.Comment: v1: Accepted to COLT'19, v2: adjusted Theorem 3, w_t closed form
solution, and typo
Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion
Logistic bandit is a ubiquitous framework of modeling users' choices, e.g.,
click vs. no click for advertisement recommender system. We observe that the
prior works overlook or neglect dependencies in , where is the unknown parameter
vector, which is particularly problematic when is large, e.g., .
In this work, we improve the dependency on via a novel approach called {\it
regret-to-confidence set conversion (R2CS)}, which allows us to construct a
convex confidence set based on only the \textit{existence} of an online
learning algorithm with a regret guarantee. Using R2CS, we obtain a strict
improvement in the regret bound w.r.t. in logistic bandits while retaining
computational feasibility and the dependence on other factors such as and
. We apply our new confidence set to the regret analyses of logistic bandits
with a new martingale concentration step that circumvents an additional factor
of . We then extend this analysis to multinomial logistic bandits and obtain
similar improvements in the regret, showing the efficacy of R2CS. While we
applied R2CS to the (multinomial) logistic model, R2CS is a generic approach
for developing confidence sets that can be used for various models, which can
be of independent interest.Comment: 32 pages, 2 figures, 1 tabl
Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback
Online gradient descent (OGD) is well known to be doubly optimal under strong
convexity or monotonicity assumptions: (1) in the single-agent setting, it
achieves an optimal regret of for strongly convex cost
functions; and (2) in the multi-agent setting of strongly monotone games, with
each agent employing OGD, we obtain last-iterate convergence of the joint
action to a unique Nash equilibrium at an optimal rate of
. While these finite-time guarantees highlight its merits,
OGD has the drawback that it requires knowing the strong convexity/monotonicity
parameters. In this paper, we design a fully adaptive OGD algorithm,
\textsf{AdaOGD}, that does not require a priori knowledge of these parameters.
In the single-agent setting, our algorithm achieves regret under
strong convexity, which is optimal up to a log factor. Further, if each agent
employs \textsf{AdaOGD} in strongly monotone games, the joint action converges
in a last-iterate sense to a unique Nash equilibrium at a rate of
, again optimal up to log factors. We illustrate our
algorithms in a learning version of the classical newsvendor problem, where due
to lost sales, only (noisy) gradient feedback can be observed. Our results
immediately yield the first feasible and near-optimal algorithm for both the
single-retailer and multi-retailer settings. We also extend our results to the
more general setting of exp-concave cost functions and games, using the online
Newton step (ONS) algorithm.Comment: Accepted by Operations Research; 47 page
Estimating means of bounded random variables by betting
This paper derives confidence intervals (CI) and time-uniform confidence
sequences (CS) for the classical problem of estimating an unknown mean from
bounded observations. We present a general approach for deriving concentration
bounds, that can be seen as a generalization (and improvement) of the
celebrated Chernoff method. At its heart, it is based on deriving a new class
of composite nonnegative martingales, with strong connections to testing by
betting and the method of mixtures. We show how to extend these ideas to
sampling without replacement, another heavily studied problem. In all cases,
our bounds are adaptive to the unknown variance, and empirically vastly
outperform existing approaches based on Hoeffding or empirical Bernstein
inequalities and their recent supermartingale generalizations. In short, we
establish a new state-of-the-art for four fundamental problems: CSs and CIs for
bounded means, when sampling with and without replacement.Comment: 68 pages, 18 figures; Python implementation:
https://github.com/wannabesmith/confse