439,384 research outputs found
TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning
Learning good feature embeddings for images often requires substantial
training data. As a consequence, in settings where training data is limited
(e.g., few-shot and zero-shot learning), we are typically forced to use a
generic feature embedding across various tasks. Ideally, we want to construct
feature embeddings that are tuned for the given task. In this work, we propose
Task-Aware Feature Embedding Networks (TAFE-Nets) to learn how to adapt the
image representation to a new task in a meta learning fashion. Our network is
composed of a meta learner and a prediction network. Based on a task input, the
meta learner generates parameters for the feature layers in the prediction
network so that the feature embedding can be accurately adjusted for that task.
We show that TAFE-Net is highly effective in generalizing to new tasks or
concepts and evaluate the TAFE-Net on a range of benchmarks in zero-shot and
few-shot learning. Our model matches or exceeds the state-of-the-art on all
tasks. In particular, our approach improves the prediction accuracy of unseen
attribute-object pairs by 4 to 15 points on the challenging visual
attribute-object composition task.Comment: Accepted at CVPR 201
Zero-Shot Cross-lingual Classification Using Multilingual Neural Machine Translation
Transferring representations from large supervised tasks to downstream tasks
has shown promising results in AI fields such as Computer Vision and Natural
Language Processing (NLP). In parallel, the recent progress in Machine
Translation (MT) has enabled one to train multilingual Neural MT (NMT) systems
that can translate between multiple languages and are also capable of
performing zero-shot translation. However, little attention has been paid to
leveraging representations learned by a multilingual NMT system to enable
zero-shot multilinguality in other NLP tasks. In this paper, we demonstrate a
simple framework, a multilingual Encoder-Classifier, for cross-lingual transfer
learning by reusing the encoder from a multilingual NMT system and stitching it
with a task-specific classifier component. Our proposed model achieves
significant improvements in the English setup on three benchmark tasks - Amazon
Reviews, SST and SNLI. Further, our system can perform classification in a new
language for which no classification data was seen during training, showing
that zero-shot classification is possible and remarkably competitive. In order
to understand the underlying factors contributing to this finding, we conducted
a series of analyses on the effect of the shared vocabulary, the training data
type for NMT, classifier complexity, encoder representation power, and model
generalization on zero-shot performance. Our results provide strong evidence
that the representations learned from multilingual NMT systems are widely
applicable across languages and tasks
Recent Advances in Zero-shot Recognition
With the recent renaissance of deep convolution neural networks, encouraging
breakthroughs have been achieved on the supervised recognition tasks, where
each class has sufficient training data and fully annotated training data.
However, to scale the recognition to a large number of classes with few or now
training samples for each class remains an unsolved problem. One approach to
scaling up the recognition is to develop models capable of recognizing unseen
categories without any training instances, or zero-shot recognition/ learning.
This article provides a comprehensive review of existing zero-shot recognition
techniques covering various aspects ranging from representations of models, and
from datasets and evaluation settings. We also overview related recognition
tasks including one-shot and open set recognition which can be used as natural
extensions of zero-shot recognition when limited number of class samples become
available or when zero-shot recognition is implemented in a real-world setting.
Importantly, we highlight the limitations of existing approaches and point out
future research directions in this existing new research area.Comment: accepted by IEEE Signal Processing Magazin
Integrating Propositional and Relational Label Side Information for Hierarchical Zero-Shot Image Classification
Zero-shot learning (ZSL) is one of the most extreme forms of learning from
scarce labeled data. It enables predicting that images belong to classes for
which no labeled training instances are available. In this paper, we present a
new ZSL framework that leverages both label attribute side information and a
semantic label hierarchy. We present two methods, lifted zero-shot prediction
and a custom conditional random field (CRF) model, that integrate both forms of
side information. We propose benchmark tasks for this framework that focus on
making predictions across a range of semantic levels. We show that lifted
zero-shot prediction can dramatically outperform baseline methods when making
predictions within specified semantic levels, and that the probability
distribution provided by the CRF model can be leveraged to yield further
performance improvements when making unconstrained predictions over the
hierarchy
Self-Normalizing Neural Networks
Deep Learning has revolutionized vision via convolutional neural networks
(CNNs) and natural language processing via recurrent neural networks (RNNs).
However, success stories of Deep Learning with standard feed-forward neural
networks (FNNs) are rare. FNNs that perform well are typically shallow and,
therefore cannot exploit many levels of abstract representations. We introduce
self-normalizing neural networks (SNNs) to enable high-level abstract
representations. While batch normalization requires explicit normalization,
neuron activations of SNNs automatically converge towards zero mean and unit
variance. The activation function of SNNs are "scaled exponential linear units"
(SELUs), which induce self-normalizing properties. Using the Banach fixed-point
theorem, we prove that activations close to zero mean and unit variance that
are propagated through many network layers will converge towards zero mean and
unit variance -- even under the presence of noise and perturbations. This
convergence property of SNNs allows to (1) train deep networks with many
layers, (2) employ strong regularization, and (3) to make learning highly
robust. Furthermore, for activations not close to unit variance, we prove an
upper and lower bound on the variance, thus, vanishing and exploding gradients
are impossible. We compared SNNs on (a) 121 tasks from the UCI machine learning
repository, on (b) drug discovery benchmarks, and on (c) astronomy tasks with
standard FNNs and other machine learning methods such as random forests and
support vector machines. SNNs significantly outperformed all competing FNN
methods at 121 UCI tasks, outperformed all competing methods at the Tox21
dataset, and set a new record at an astronomy data set. The winning SNN
architectures are often very deep. Implementations are available at:
github.com/bioinf-jku/SNNs.Comment: 9 pages (+ 93 pages appendix
The Natural Language Decathlon: Multitask Learning as Question Answering
Deep learning has improved performance on many natural language processing
(NLP) tasks individually. However, general NLP models cannot emerge within a
paradigm that focuses on the particularities of a single metric, dataset, and
task. We introduce the Natural Language Decathlon (decaNLP), a challenge that
spans ten tasks: question answering, machine translation, summarization,
natural language inference, sentiment analysis, semantic role labeling,
zero-shot relation extraction, goal-oriented dialogue, semantic parsing, and
commonsense pronoun resolution. We cast all tasks as question answering over a
context. Furthermore, we present a new Multitask Question Answering Network
(MQAN) jointly learns all tasks in decaNLP without any task-specific modules or
parameters in the multitask setting. MQAN shows improvements in transfer
learning for machine translation and named entity recognition, domain
adaptation for sentiment analysis and natural language inference, and zero-shot
capabilities for text classification. We demonstrate that the MQAN's
multi-pointer-generator decoder is key to this success and performance further
improves with an anti-curriculum training strategy. Though designed for
decaNLP, MQAN also achieves state of the art results on the WikiSQL semantic
parsing task in the single-task setting. We also release code for procuring and
processing data, training and evaluating models, and reproducing all
experiments for decaNLP
Using Task Descriptions in Lifelong Machine Learning for Improved Performance and Zero-Shot Transfer
Knowledge transfer between tasks can improve the performance of learned
models, but requires an accurate estimate of the inter-task relationships to
identify the relevant knowledge to transfer. These inter-task relationships are
typically estimated based on training data for each task, which is inefficient
in lifelong learning settings where the goal is to learn each consecutive task
rapidly from as little data as possible. To reduce this burden, we develop a
lifelong learning method based on coupled dictionary learning that utilizes
high-level task descriptions to model the inter-task relationships. We show
that using task descriptors improves the performance of the learned task
policies, providing both theoretical justification for the benefit and
empirical demonstration of the improvement across a variety of learning
problems. Given only the descriptor for a new task, the lifelong learner is
also able to accurately predict a model for the new task through zero-shot
learning using the coupled dictionary, eliminating the need to gather training
data before addressing the task.Comment: 28 page
Zero Shot Learning on Simulated Robots
In this work we present a method for leveraging data from one source to learn
how to do multiple new tasks. Task transfer is achieved using a self-model that
encapsulates the dynamics of a system and serves as an environment for
reinforcement learning. To study this approach, we train a self-models on
various robot morphologies, using randomly sampled actions. Using a self-model,
an initial state and corresponding actions, we can predict the next state. This
predictive self-model is then used by a standard reinforcement learning
algorithm to accomplish tasks without ever seeing a state from the "real"
environment. These trained policies allow the robots to successfully achieve
their goals in the "real" environment. We demonstrate that not only is training
on the self-model far more data efficient than learning even a single task, but
also that it allows for learning new tasks without necessitating any additional
data collection, essentially allowing zero-shot learning of new tasks
Transforming task representations to perform novel tasks
An important aspect of intelligence is the ability to adapt to a novel task
without any direct experience (zero-shot), based on its relationship to
previous tasks. Humans can exhibit this cognitive flexibility. By contrast,
models that achieve superhuman performance in specific tasks often fail to
adapt to even slight task alterations. To address this, we propose a general
computational framework for adapting to novel tasks based on their relationship
to prior tasks. We begin by learning vector representations of tasks. To adapt
to new tasks, we propose meta-mappings, higher-order tasks that transform basic
task representations. We demonstrate the effectiveness of this framework across
a wide variety of tasks and computational paradigms, ranging from regression to
image classification and reinforcement learning. We compare to both human
adaptability and language-based approaches to zero-shot learning. Across these
domains, meta-mapping is successful, often achieving 80-90% performance,
without any data, on a novel task, even when the new task directly contradicts
prior experience. We further show that meta-mapping can not only generalize to
new tasks via learned relationships, but can also generalize using novel
relationships unseen during training. Finally, using meta-mapping as a starting
point can dramatically accelerate later learning on a new task, and reduce
learning time and cumulative error substantially. Our results provide insight
into a possible computational basis of intelligent adaptability and offer a
possible framework for modeling cognitive flexibility and building more
flexible artificial intelligence systems.Comment: 45 page
Worst-Case-Aware Curriculum Learning for Zero and Few Shot Transfer
Multi-task transfer learning based on pre-trained language encoders achieves
state-of-the-art performance across a range of tasks. Standard approaches
implicitly assume the tasks, for which we have training data, are equally
representative of the tasks we are interested in, an assumption which is often
hard to justify. This paper presents a more agnostic approach to multi-task
transfer learning, which uses automated curriculum learning to minimize a new
family of worst-case-aware losses across tasks. Not only do these losses lead
to better performance on outlier tasks; they also lead to better performance in
zero-shot and few-shot transfer settings
- …