Many modern machine learning applications, such as multi-task learning,
require finding optimal model parameters to trade-off multiple objective
functions that may conflict with each other. The notion of the Pareto set
allows us to focus on the set of (often infinite number of) models that cannot
be strictly improved. But it does not provide an actionable procedure for
picking one or a few special models to return to practical users. In this
paper, we consider \emph{optimization in Pareto set (OPT-in-Pareto)}, the
problem of finding Pareto models that optimize an extra reference criterion
function within the Pareto set. This function can either encode a specific
preference from the users, or represent a generic diversity measure for
obtaining a set of diversified Pareto models that are representative of the
whole Pareto set. Unfortunately, despite being a highly useful framework,
efficient algorithms for OPT-in-Pareto have been largely missing, especially
for large-scale, non-convex, and non-linear objectives in deep learning. A
naive approach is to apply Riemannian manifold gradient descent on the Pareto
set, which yields a high computational cost due to the need for
eigen-calculation of Hessian matrices. We propose a first-order algorithm that
approximately solves OPT-in-Pareto using only gradient information, with both
high practical efficiency and theoretically guaranteed convergence property.
Empirically, we demonstrate that our method works efficiently for a variety of
challenging multi-task-related problems