5 research outputs found
Sequential decision making with vector outcomes
We study a multi-round optimization setting in which in each round a player may select one of several actions, and each action produces an outcome vector, not observable to the player until the round ends. The final payoff for the player is computed by applying some known function f to the sum of all outcome vectors (e.g., the minimum of all coordinates of the sum). We show that standard notions of performance measure (such as comparison to the best single action) used in related expert and bandit settings (in which the payoff in each round is scalar) are not useful in our vector setting. Instead, we propose a different performance measure, and design algorithms that have vanishing regret with respect to our new measure
Approachability in unknown games: Online learning meets multi-objective optimization
In the standard setting of approachability there are two players and a target
set. The players play repeatedly a known vector-valued game where the first
player wants to have the average vector-valued payoff converge to the target
set which the other player tries to exclude it from this set. We revisit this
setting in the spirit of online learning and do not assume that the first
player knows the game structure: she receives an arbitrary vector-valued reward
vector at every round. She wishes to approach the smallest ("best") possible
set given the observed average payoffs in hindsight. This extension of the
standard setting has implications even when the original target set is not
approachable and when it is not obvious which expansion of it should be
approached instead. We show that it is impossible, in general, to approach the
best target set in hindsight and propose achievable though ambitious
alternative goals. We further propose a concrete strategy to approach these
goals. Our method does not require projection onto a target set and amounts to
switching between scalar regret minimization algorithms that are performed in
episodes. Applications to global cost minimization and to approachability under
sample path constraints are considered