Multi-Task Learning (MTL) is a widely-used and powerful learning paradigm for
training deep neural networks that allows learning more than one objective by a
single backbone. Compared to training tasks separately, MTL significantly
reduces computational costs, improves data efficiency, and potentially enhances
model performance by leveraging knowledge across tasks. Hence, it has been
adopted in a variety of applications, ranging from computer vision to natural
language processing and speech recognition. Among them, there is an emerging
line of work in MTL that focuses on manipulating the task gradient to derive an
ultimate gradient descent direction to benefit all tasks. Despite achieving
impressive results on many benchmarks, directly applying these approaches
without using appropriate regularization techniques might lead to suboptimal
solutions on real-world problems. In particular, standard training that
minimizes the empirical loss on the training data can easily suffer from
overfitting to low-resource tasks or be spoiled by noisy-labeled ones, which
can cause negative transfer between tasks and overall performance drop. To
alleviate such problems, we propose to leverage a recently introduced training
method, named Sharpness-aware Minimization, which can enhance model
generalization ability on single-task learning. Accordingly, we present a novel
MTL training methodology, encouraging the model to find task-based flat minima
for coherently improving its generalization capability on all tasks. Finally,
we conduct comprehensive experiments on a variety of applications to
demonstrate the merit of our proposed approach to existing gradient-based MTL
methods, as suggested by our developed theory.Comment: 29 pages, 11 figures, 6 table