While multi-task learning (MTL) has gained significant attention in recent
years, its underlying mechanisms remain poorly understood. Recent methods did
not yield consistent performance improvements over single task learning (STL)
baselines, underscoring the importance of gaining more profound insights about
challenges specific to MTL. In our study, we challenge common assumptions in
MTL in the context of STL: First, the choice of optimizer has only been mildly
investigated in MTL. We show the pivotal role of common STL tools such as the
Adam optimizer in MTL. We deduce the effectiveness of Adam to its partial
loss-scale invariance. Second, the notion of gradient conflicts has often been
phrased as a specific problem in MTL. We delve into the role of gradient
conflicts in MTL and compare it to STL. For angular gradient alignment we find
no evidence that this is a unique problem in MTL. We emphasize differences in
gradient magnitude as the main distinguishing factor. Lastly, we compare the
transferability of features learned through MTL and STL on common image
corruptions, and find no conclusive evidence that MTL leads to superior
transferability. Overall, we find surprising similarities between STL and MTL
suggesting to consider methods from both fields in a broader context.Comment: