Scalarization is a general technique that can be deployed in any
multiobjective setting to reduce multiple objectives into one, such as recently
in RLHF for training reward models that align human preferences. Yet some have
dismissed this classical approach because linear scalarizations are known to
miss concave regions of the Pareto frontier. To that end, we aim to find simple
non-linear scalarizations that can explore a diverse set of k objectives on
the Pareto frontier, as measured by the dominated hypervolume. We show that
hypervolume scalarizations with uniformly random weights are surprisingly
optimal for provably minimizing the hypervolume regret, achieving an optimal
sublinear regret bound of O(Tβ1/k), with matching lower bounds that
preclude any algorithm from doing better asymptotically. As a theoretical case
study, we consider the multiobjective stochastic linear bandits problem and
demonstrate that by exploiting the sublinear regret bounds of the hypervolume
scalarizations, we can derive a novel non-Euclidean analysis that produces
improved hypervolume regret bounds of O~(dTβ1/2+Tβ1/k). We
support our theory with strong empirical performance of using simple
hypervolume scalarizations that consistently outperforms both the linear and
Chebyshev scalarizations, as well as standard multiobjective algorithms in
bayesian optimization, such as EHVI.Comment: ICML 2023 Worksho