3 research outputs found
Efficient Constrained Regret Minimization
Online learning constitutes a mathematical and compelling framework to
analyze sequential decision making problems in adversarial environments. The
learner repeatedly chooses an action, the environment responds with an outcome,
and then the learner receives a reward for the played action. The goal of the
learner is to maximize his total reward. However, there are situations in
which, in addition to maximizing the cumulative reward, there are some
additional constraints on the sequence of decisions that must be satisfied on
average by the learner. In this paper we study an extension to the online
learning where the learner aims to maximize the total reward given that some
additional constraints need to be satisfied. By leveraging on the theory of
Lagrangian method in constrained optimization, we propose Lagrangian
exponentially weighted average (LEWA) algorithm, which is a primal-dual variant
of the well known exponentially weighted average algorithm, to efficiently
solve constrained online decision making problems. Using novel theoretical
analysis, we establish the regret and the violation of the constraint bounds in
full information and bandit feedback models
Adaptive Algorithms for Online Convex Optimization with Long-term Constraints
We present an adaptive online gradient descent algorithm to solve online
convex optimization problems with long-term constraints , which are constraints
that need to be satisfied when accumulated over a finite number of rounds T ,
but can be violated in intermediate rounds. For some user-defined trade-off
parameter (0, 1), the proposed algorithm achieves cumulative
regret bounds of O(T^max{,1--}) and O(T^(1--/2)) for the
loss and the constraint violations respectively. Our results hold for convex
losses and can handle arbitrary convex constraints without requiring knowledge
of the number of rounds in advance. Our contributions improve over the best
known cumulative regret bounds by Mahdavi, et al. (2012) that are respectively
O(T^1/2) and O(T^3/4) for general convex domains, and respectively O(T^2/3) and
O(T^2/3) when further restricting to polyhedral domains. We supplement the
analysis with experiments validating the performance of our algorithm in
practice
Online DR-Submodular Maximization with Stochastic Cumulative Constraints
In this paper, we consider online continuous DR-submodular maximization with
linear stochastic long-term constraints. Compared to the prior work on online
submodular maximization, our setting introduces the extra complication of
stochastic linear constraint functions that are i.i.d. generated at each round.
To be precise, at step , a DR-submodular utility function
and a constraint vector , i.i.d. generated from an unknown
distribution with mean , are revealed after committing to an action
and we aim to maximize the overall utility while the expected cumulative
resource consumption is below a fixed
budget . Stochastic long-term constraints arise naturally in applications
where there is a limited budget or resource available and resource consumption
at each step is governed by stochastically time-varying environments. We
propose the Online Lagrangian Frank-Wolfe (OLFW) algorithm to solve this class
of online problems. We analyze the performance of the OLFW algorithm and we
obtain sub-linear regret bounds as well as sub-linear cumulative constraint
violation bounds, both in expectation and with high probability