15,495 research outputs found
Hierarchical Multitask Learning for CTC-based Speech Recognition
Previous work has shown that neural encoder-decoder speech recognition can be
improved with hierarchical multitask learning, where auxiliary tasks are added
at intermediate layers of a deep encoder. We explore the effect of hierarchical
multitask learning in the context of connectionist temporal classification
(CTC)-based speech recognition, and investigate several aspects of this
approach. Consistent with previous work, we observe performance improvements on
telephone conversational speech recognition (specifically the Eval2000 test
sets) when training a subword-level CTC model with an auxiliary phone loss at
an intermediate layer. We analyze the effects of a number of experimental
variables (like interpolation constant and position of the auxiliary loss
function), performance in lower-resource settings, and the relationship between
pretraining and multitask learning. We observe that the hierarchical multitask
approach improves over standard multitask training in our higher-data
experiments, while in the low-resource settings standard multitask training
works well. The best results are obtained by combining hierarchical multitask
learning and pretraining, which improves word error rates by 3.4% absolute on
the Eval2000 test sets.Comment: Technical Repor
The Benefit of Multitask Representation Learning
We discuss a general method to learn data representations from multiple
tasks. We provide a justification for this method in both settings of multitask
learning and learning-to-learn. The method is illustrated in detail in the
special case of linear feature learning. Conditions on the theoretical
advantage offered by multitask representation learning over independent task
learning are established. In particular, focusing on the important example of
half-space learning, we derive the regime in which multitask representation
learning is beneficial over independent task learning, as a function of the
sample size, the number of tasks and the intrinsic data dimensionality. Other
potential applications of our results include multitask feature learning in
reproducing kernel Hilbert spaces and multilayer, deep networks.Comment: To appear in Journal of Machine Learning Research (JMLR). 31 page
Massively Multitask Networks for Drug Discovery
Massively multitask neural architectures provide a learning framework for
drug discovery that synthesizes information from many distinct biological
sources. To train these architectures at scale, we gather large amounts of data
from public sources to create a dataset of nearly 40 million measurements
across more than 200 biological targets. We investigate several aspects of the
multitask framework by performing a series of empirical studies and obtain some
interesting results: (1) massively multitask networks obtain predictive
accuracies significantly better than single-task methods, (2) the predictive
power of multitask networks improves as additional tasks and data are added,
(3) the total amount of data and the total number of tasks both contribute
significantly to multitask improvement, and (4) multitask networks afford
limited transferability to tasks not in the training set. Our results
underscore the need for greater data sharing and further algorithmic innovation
to accelerate the drug discovery process.Comment: Preliminary work. Under review by the International Conference on
Machine Learning (ICML
Multitask Learning with Single Gradient Step Update for Task Balancing
Multitask learning is a methodology to boost generalization performance and
also reduce computational intensity and memory usage. However, learning
multiple tasks simultaneously can be more difficult than learning a single task
because it can cause imbalance among tasks. To address the imbalance problem,
we propose an algorithm to balance between tasks at the gradient level by
applying gradient-based meta-learning to multitask learning. The proposed
method trains shared layers and task-specific layers separately so that the two
layers with different roles in a multitask network can be fitted to their own
purposes. In particular, the shared layer that contains informative knowledge
shared among tasks is trained by employing single gradient step update and
inner/outer loop training to mitigate the imbalance problem at the gradient
level. We apply the proposed method to various multitask computer vision
problems and achieve state-of-the-art performance
Bayesian Multitask Learning with Latent Hierarchies
We learn multiple hypotheses for related tasks under a latent hierarchical
relationship between tasks. We exploit the intuition that for domain
adaptation, we wish to share classifier structure, but for multitask learning,
we wish to share covariance structure. Our hierarchical model is seen to
subsume several previously proposed multitask learning models and performs well
on three distinct real-world data sets
A multitask deep learning model for real-time deployment in embedded systems
We propose an approach to Multitask Learning (MTL) to make deep learning
models faster and lighter for applications in which multiple tasks need to be
solved simultaneously, which is particularly useful in embedded, real-time
systems. We develop a multitask model for both Object Detection and Semantic
Segmentation and analyze the challenges that appear during its training. Our
multitask network is 1.6x faster, lighter and uses less memory than deploying
the single-task models in parallel. We conclude that MTL has the potential to
give superior performance in exchange of a more complex training process that
introduces challenges not present in single-task models.Comment: 2 pages, 5 figures. Poster presentation at Swedish Symposium on Deep
Learning SSDL2017, Stockholm, Sweden. June 20-21, 201
jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models
We introduce jiant, an open source toolkit for conducting multitask and
transfer learning experiments on English NLU tasks. jiant enables modular and
configuration-driven experimentation with state-of-the-art models and
implements a broad set of tasks for probing, transfer learning, and multitask
training experiments. jiant implements over 50 NLU tasks, including all GLUE
and SuperGLUE benchmark tasks. We demonstrate that jiant reproduces published
performance on a variety of tasks and models, including BERT and RoBERTa. jiant
is available at https://jiant.info
Sparse coding for multitask and transfer learning
We investigate the use of sparse coding and dictionary learning in the
context of multitask and transfer learning. The central assumption of our
learning method is that the tasks parameters are well approximated by sparse
linear combinations of the atoms of a dictionary on a high or infinite
dimensional space. This assumption, together with the large quantity of
available data in the multitask and transfer learning settings, allows a
principled choice of the dictionary. We provide bounds on the generalization
error of this approach, for both settings. Numerical experiments on one
synthetic and two real datasets show the advantage of our method over single
task learning, a previous method based on orthogonal and dense representation
of the tasks and a related method learning task grouping.Comment: International Conference on Machine Learning 201
Partly Supervised Multitask Learning
Semi-supervised learning has recently been attracting attention as an
alternative to fully supervised models that require large pools of labeled
data. Moreover, optimizing a model for multiple tasks can provide better
generalizability than single-task learning. Leveraging self-supervision and
adversarial training, we propose a novel general purpose semi-supervised,
multiple-task model---namely, self-supervised, semi-supervised, multitask
learning (SMTL)---for accomplishing two important tasks in medical imaging,
segmentation and diagnostic classification. Experimental results on chest and
spine X-ray datasets suggest that our SMTL model significantly outperforms
semi-supervised single task, semi/fully-supervised multitask, and
fully-supervised single task models, even with a 50\% reduction of class and
segmentation labels. We hypothesize that our proposed model can be effective in
tackling limited annotation problems for joint training, not only in medical
imaging domains, but also for general-purpose vision tasks.Comment: 10 pages, 8 figures, 3 table
Memory Constraint Online Multitask Classification
We investigate online kernel algorithms which simultaneously process multiple
classification tasks while a fixed constraint is imposed on the size of their
active sets. We focus in particular on the design of algorithms that can
efficiently deal with problems where the number of tasks is extremely high and
the task data are large scale. Two new projection-based algorithms are
introduced to efficiently tackle those issues while presenting different trade
offs on how the available memory is managed with respect to the prior
information about the learning tasks. Theoretically sound budget algorithms are
devised by coupling the Randomized Budget Perceptron and the Forgetron
algorithms with the multitask kernel. We show how the two seemingly contrasting
properties of learning from multiple tasks and keeping a constant memory
footprint can be balanced, and how the sharing of the available space among
different tasks is automatically taken care of. We propose and discuss new
insights on the multitask kernel. Experiments show that online kernel multitask
algorithms running on a budget can efficiently tackle real world learning
problems involving multiple tasks
- …