13 research outputs found
MaxHedge: Maximising a Maximum Online
We introduce a new online learning framework where, at each trial, the
learner is required to select a subset of actions from a given known action
set. Each action is associated with an energy value, a reward and a cost. The
sum of the energies of the actions selected cannot exceed a given energy
budget. The goal is to maximise the cumulative profit, where the profit
obtained on a single trial is defined as the difference between the maximum
reward among the selected actions and the sum of their costs. Action energy
values and the budget are known and fixed. All rewards and costs associated
with each action change over time and are revealed at each trial only after the
learner's selection of actions. Our framework encompasses several online
learning problems where the environment changes over time; and the solution
trades-off between minimising the costs and maximising the maximum reward of
the selected subset of actions, while being constrained to an action energy
budget. The algorithm that we propose is efficient and general in that it may
be specialised to multiple natural online combinatorial problems.Comment: Published in AISTATS 201
Online Combinatorial Linear Optimization via a Frank-Wolfe-based Metarounding Algorithm
Metarounding is an approach to convert an approximation algorithm for linear
optimization over some combinatorial classes to an online linear optimization
algorithm for the same class. We propose a new metarounding algorithm under a
natural assumption that a relax-based approximation algorithm exists for the
combinatorial class. Our algorithm is much more efficient in both theoretical
and practical aspects
Online Improper Learning with an Approximation Oracle
We revisit the question of reducing online learning to approximate
optimization of the offline problem. In this setting, we give two algorithms
with near-optimal performance in the full information setting: they guarantee
optimal regret and require only poly-logarithmically many calls to the
approximation oracle per iteration. Furthermore, these algorithms apply to the
more general improper learning problems. In the bandit setting, our algorithm
also significantly improves the best previously known oracle complexity while
maintaining the same regret
Online Learning of Facility Locations
In this paper, we provide a rigorous theoretical investigation of an online learning version of the Facility Location problem which is motivated by emerging problems in real-world applications. In our formulation, we are given a set of sites and an online sequence of user requests. At each trial, the learner selects a subset of sites and then incurs a cost for each selected site and an additional cost which is the price of the user’s connection to the nearest site in the selected subset. The problem may be solved by an application of the well-known Hedge algorithm. This would, however, require time and space exponential in the number of the given sites, which motivates our design of a novel quasi-linear time algorithm for this problem, with good theoretical guarantees on its performance
MaxHedge: Maximising a Maximum Online
We introduce a new online learning framework where, at each trial, the learner is required to select a subset of actions from a given known action set. Each action is associated with an energy value, a reward and a cost. The sum of the energies of the actions selected cannot exceed a given energy budget. The goal is to maximise the cumulative profit, where the profit obtained on a single trial is defined as the difference between the maximum reward among the selected actions and the sum of their costs. Action energy values and the budget are known and fixed. All rewards and costs associated with each action change over time and are revealed at each trial only after the learner’s selection of actions. Our framework encompasses several online learning problems where the environment changes over time; and the solution trades-off between minimising the costs and maximising the maximum reward of the selected subset of actions, while being constrained to an action energy budget. The algorithm that we propose is efficient and general that may be specialised to multiple natural online combinatorial problems
Combinatorial Online Prediction
We present a short survey on recent results on combinatorial online prediction in the adversarial setting. Index Terms—online learning, combinatorial online prediction, online convex optimization, combinatorial optimizatio
Combinatorial Online Prediction via Metarounding
We consider online prediction problems of combinatorial concepts. Examples of such concepts include s-t paths, permutations, truth assignments, set covers, and so on. The goal of the online prediction algorithm is to compete with the best fixed combinatorial concept in hindsight. A generic approach to this problem is to design an online prediction algorithm using the corresponding offline (approximation) algorithm as an oracle. The current state-of-the art method, however, is not efficient enough. In this paper we propose a more efficient online prediction algorithm when the offline approximation algorithm has a guarantee of the integrality gap