Search CORE

3 research outputs found

Efficient Constrained Regret Minimization

Author: Jin Rong
Mahdavi Mehrdad
Yang Tianbao
Publication venue
Publication date: 04/10/2012
Field of study

Online learning constitutes a mathematical and compelling framework to analyze sequential decision making problems in adversarial environments. The learner repeatedly chooses an action, the environment responds with an outcome, and then the learner receives a reward for the played action. The goal of the learner is to maximize his total reward. However, there are situations in which, in addition to maximizing the cumulative reward, there are some additional constraints on the sequence of decisions that must be satisfied on average by the learner. In this paper we study an extension to the online learning where the learner aims to maximize the total reward given that some additional constraints need to be satisfied. By leveraging on the theory of Lagrangian method in constrained optimization, we propose Lagrangian exponentially weighted average (LEWA) algorithm, which is a primal-dual variant of the well known exponentially weighted average algorithm, to efficiently solve constrained online decision making problems. Using novel theoretical analysis, we establish the regret and the violation of the constraint bounds in full information and bandit feedback models

arXiv.org e-Print Archive

Adaptive Algorithms for Online Convex Optimization with Long-term Constraints

Author: Archambeau Cédric
Huang Jim
Jenatton Rodolphe
Publication venue
Publication date: 23/12/2015
Field of study

We present an adaptive online gradient descent algorithm to solve online convex optimization problems with long-term constraints , which are constraints that need to be satisfied when accumulated over a finite number of rounds T , but can be violated in intermediate rounds. For some user-defined trade-off parameter

\beta

\in

(0, 1), the proposed algorithm achieves cumulative regret bounds of O(T^max{

\beta

,1--

\beta

}) and O(T^(1--

\beta

/2)) for the loss and the constraint violations respectively. Our results hold for convex losses and can handle arbitrary convex constraints without requiring knowledge of the number of rounds in advance. Our contributions improve over the best known cumulative regret bounds by Mahdavi, et al. (2012) that are respectively O(T^1/2) and O(T^3/4) for general convex domains, and respectively O(T^2/3) and O(T^2/3) when further restricting to polyhedral domains. We supplement the analysis with experiments validating the performance of our algorithm in practice

arXiv.org e-Print Archive

Online DR-Submodular Maximization with Stochastic Cumulative Constraints

Author: Fazel Maryam
Raut Prasanna Sanjay
Sadeghi Omid
Publication venue
Publication date: 29/05/2020
Field of study

In this paper, we consider online continuous DR-submodular maximization with linear stochastic long-term constraints. Compared to the prior work on online submodular maximization, our setting introduces the extra complication of stochastic linear constraint functions that are i.i.d. generated at each round. To be precise, at step

t\in\{1,\dots,T\}

, a DR-submodular utility function

f_t(\cdot)

and a constraint vector

p_t

, i.i.d. generated from an unknown distribution with mean

p

, are revealed after committing to an action

x_t

and we aim to maximize the overall utility while the expected cumulative resource consumption

\sum_{t=1}^T \langle p,x_t\rangle

is below a fixed budget

B_T

. Stochastic long-term constraints arise naturally in applications where there is a limited budget or resource available and resource consumption at each step is governed by stochastically time-varying environments. We propose the Online Lagrangian Frank-Wolfe (OLFW) algorithm to solve this class of online problems. We analyze the performance of the OLFW algorithm and we obtain sub-linear regret bounds as well as sub-linear cumulative constraint violation bounds, both in expectation and with high probability

arXiv.org e-Print Archive