18 research outputs found
Many Task Learning with Task Routing
Typical multi-task learning (MTL) methods rely on architectural adjustments
and a large trainable parameter set to jointly optimize over several tasks.
However, when the number of tasks increases so do the complexity of the
architectural adjustments and resource requirements. In this paper, we
introduce a method which applies a conditional feature-wise transformation over
the convolutional activations that enables a model to successfully perform a
large number of tasks. To distinguish from regular MTL, we introduce Many Task
Learning (MaTL) as a special case of MTL where more than 20 tasks are performed
by a single model. Our method dubbed Task Routing (TR) is encapsulated in a
layer we call the Task Routing Layer (TRL), which applied in an MaTL scenario
successfully fits hundreds of classification tasks in one model. We evaluate
our method on 5 datasets against strong baselines and state-of-the-art
approaches.Comment: 8 Pages, 5 Figures, 2 Table
Maximum Roaming Multi-Task Learning
Multi-task learning has gained popularity due to the advantages it provides
with respect to resource usage and performance. Nonetheless, the joint
optimization of parameters with respect to multiple tasks remains an active
research topic. Sub-partitioning the parameters between different tasks has
proven to be an efficient way to relax the optimization constraints over the
shared weights, may the partitions be disjoint or overlapping. However, one
drawback of this approach is that it can weaken the inductive bias generally
set up by the joint task optimization. In this work, we present a novel way to
partition the parameter space without weakening the inductive bias.
Specifically, we propose Maximum Roaming, a method inspired by dropout that
randomly varies the parameter partitioning, while forcing them to visit as many
tasks as possible at a regulated frequency, so that the network fully adapts to
each update. We study the properties of our method through experiments on a
variety of visual multi-task data sets. Experimental results suggest that the
regularization brought by roaming has more impact on performance than usual
partitioning optimization strategies. The overall method is flexible, easily
applicable, provides superior regularization and consistently achieves improved
performances compared to recent multi-task learning formulations.Comment: Accepted at the 35th AAAI Conference on Artificial Intelligence (AAAI
2021
Variational Multi-Task Learning with Gumbel-Softmax Priors
Multi-task learning aims to explore task relatedness to improve individual
tasks, which is of particular significance in the challenging scenario that
only limited data is available for each task. To tackle this challenge, we
propose variational multi-task learning (VMTL), a general probabilistic
inference framework for learning multiple related tasks. We cast multi-task
learning as a variational Bayesian inference problem, in which task relatedness
is explored in a unified manner by specifying priors. To incorporate shared
knowledge into each task, we design the prior of a task to be a learnable
mixture of the variational posteriors of other related tasks, which is learned
by the Gumbel-Softmax technique. In contrast to previous methods, our VMTL can
exploit task relatedness for both representations and classifiers in a
principled way by jointly inferring their posteriors. This enables individual
tasks to fully leverage inductive biases provided by related tasks, therefore
improving the overall performance of all tasks. Experimental results
demonstrate that the proposed VMTL is able to effectively tackle a variety of
challenging multi-task learning settings with limited training data for both
classification and regression. Our method consistently surpasses previous
methods, including strong Bayesian approaches, and achieves state-of-the-art
performance on five benchmark datasets.Comment: 19 pages, 6 figures, accepted by NeurIPS 202
Towards Impartial Multi-task Learning
Multi-task learning (MTL) has been widely used in representation learning. However, naively training all tasks simultaneously may lead to the partial training issue, where specific tasks are trained more adequately than others. In this paper, we propose to learn multiple tasks impartially. Specifically, for the task-shared parameters, we optimize the scaling factors via a closed-form solution, such that the aggregated gradient (sum of raw gradients weighted by the scaling factors) has equal projections onto individual tasks. For the task-specific parameters, we dynamically weigh the task losses so that all of them are kept at a comparable scale. Further, we find the above gradient balance and loss balance are complementary and thus propose a hybrid balance method to further improve the performance. Our impartial multi-task learning (IMTL) can be end-to-end trained without any heuristic hyper-parameter tuning, and is general to be applied on all kinds of losses without any distribution assumption. Moreover, our IMTL can converge to similar results even when the task losses are designed to have different scales, and thus it is scale-invariant. We extensively evaluate our IMTL on the standard MTL benchmarks including Cityscapes, NYUv2 and CelebA. It outperforms existing loss weighting methods under the same experimental settings
Same Same, But Different: Conditional Multi-Task Learning for Demographic-Specific Toxicity Detection
Algorithmic bias often arises as a result of differential subgroup validity,
in which predictive relationships vary across groups. For example, in toxic
language detection, comments targeting different demographic groups can vary
markedly across groups. In such settings, trained models can be dominated by
the relationships that best fit the majority group, leading to disparate
performance. We propose framing toxicity detection as multi-task learning
(MTL), allowing a model to specialize on the relationships that are relevant to
each demographic group while also leveraging shared properties across groups.
With toxicity detection, each task corresponds to identifying toxicity against
a particular demographic group. However, traditional MTL requires labels for
all tasks to be present for every data point. To address this, we propose
Conditional MTL (CondMTL), wherein only training examples relevant to the given
demographic group are considered by the loss function. This lets us learn group
specific representations in each branch which are not cross contaminated by
irrelevant labels. Results on synthetic and real data show that using CondMTL
improves predictive recall over various baselines in general and for the
minority demographic group in particular, while having similar overall
accuracy
Association Graph Learning for Multi-Task Classification with Category Shifts
In this paper, we focus on multi-task classification, where related
classification tasks share the same label space and are learned simultaneously.
In particular, we tackle a new setting, which is more realistic than currently
addressed in the literature, where categories shift from training to test data.
Hence, individual tasks do not contain complete training data for the
categories in the test set. To generalize to such test data, it is crucial for
individual tasks to leverage knowledge from related tasks. To this end, we
propose learning an association graph to transfer knowledge among tasks for
missing classes. We construct the association graph with nodes representing
tasks, classes and instances, and encode the relationships among the nodes in
the edges to guide their mutual knowledge transfer. By message passing on the
association graph, our model enhances the categorical information of each
instance, making it more discriminative. To avoid spurious correlations between
task and class nodes in the graph, we introduce an assignment entropy
maximization that encourages each class node to balance its edge weights. This
enables all tasks to fully utilize the categorical information from related
tasks. An extensive evaluation on three general benchmarks and a medical
dataset for skin lesion classification reveals that our method consistently
performs better than representative baselines