4 research outputs found
Challenging Common Assumptions in Multi-task Learning
While multi-task learning (MTL) has gained significant attention in recent
years, its underlying mechanisms remain poorly understood. Recent methods did
not yield consistent performance improvements over single task learning (STL)
baselines, underscoring the importance of gaining more profound insights about
challenges specific to MTL. In our study, we challenge common assumptions in
MTL in the context of STL: First, the choice of optimizer has only been mildly
investigated in MTL. We show the pivotal role of common STL tools such as the
Adam optimizer in MTL. We deduce the effectiveness of Adam to its partial
loss-scale invariance. Second, the notion of gradient conflicts has often been
phrased as a specific problem in MTL. We delve into the role of gradient
conflicts in MTL and compare it to STL. For angular gradient alignment we find
no evidence that this is a unique problem in MTL. We emphasize differences in
gradient magnitude as the main distinguishing factor. Lastly, we compare the
transferability of features learned through MTL and STL on common image
corruptions, and find no conclusive evidence that MTL leads to superior
transferability. Overall, we find surprising similarities between STL and MTL
suggesting to consider methods from both fields in a broader context.Comment:
Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using Deep Shape Priors
Representing scenes at the granularity of objects is a prerequisite for scene
understanding and decision making. We propose PriSMONet, a novel approach based
on Prior Shape knowledge for learning Multi-Object 3D scene decomposition and
representations from single images. Our approach learns to decompose images of
synthetic scenes with multiple objects on a planar surface into its constituent
scene objects and to infer their 3D properties from a single view. A recurrent
encoder regresses a latent representation of 3D shape, pose and texture of each
object from an input RGB image. By differentiable rendering, we train our model
to decompose scenes from RGB-D images in a self-supervised way. The 3D shapes
are represented continuously in function-space as signed distance functions
which we pre-train from example shapes in a supervised way. These shape priors
provide weak supervision signals to better condition the challenging overall
learning task. We evaluate the accuracy of our model in inferring 3D scene
layout, demonstrate its generative capabilities, assess its generalization to
real images, and point out benefits of the learned representation