We introduce a new online learning framework where, at each trial, the
learner is required to select a subset of actions from a given known action
set. Each action is associated with an energy value, a reward and a cost. The
sum of the energies of the actions selected cannot exceed a given energy
budget. The goal is to maximise the cumulative profit, where the profit
obtained on a single trial is defined as the difference between the maximum
reward among the selected actions and the sum of their costs. Action energy
values and the budget are known and fixed. All rewards and costs associated
with each action change over time and are revealed at each trial only after the
learner's selection of actions. Our framework encompasses several online
learning problems where the environment changes over time; and the solution
trades-off between minimising the costs and maximising the maximum reward of
the selected subset of actions, while being constrained to an action energy
budget. The algorithm that we propose is efficient and general in that it may
be specialised to multiple natural online combinatorial problems.Comment: Published in AISTATS 201