13,373 research outputs found
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
An important goal of computer vision is to build systems that learn visual
representations over time that can be applied to many tasks. In this paper, we
investigate a vision-language embedding as a core representation and show that
it leads to better cross-task transfer than standard multi-task learning. In
particular, the task of visual recognition is aligned to the task of visual
question answering by forcing each to use the same word-region embeddings. We
show this leads to greater inductive transfer from recognition to VQA than
standard multitask learning. Visual recognition also improves, especially for
categories that have relatively few recognition training labels but appear
often in the VQA setting. Thus, our paper takes a small step towards creating
more general vision systems by showing the benefit of interpretable, flexible,
and trainable core representations.Comment: Accepted in ICCV 2017. The arxiv version has an extra analysis on
correlation with human attentio
Transfer Learning for Speech and Language Processing
Transfer learning is a vital technique that generalizes models trained for
one setting or task to other settings or tasks. For example in speech
recognition, an acoustic model trained for one language can be used to
recognize speech in another language, with little or no re-training data.
Transfer learning is closely related to multi-task learning (cross-lingual vs.
multilingual), and is traditionally studied in the name of `model adaptation'.
Recent advance in deep learning shows that transfer learning becomes much
easier and more effective with high-level abstract features learned by deep
models, and the `transfer' can be conducted not only between data distributions
and data types, but also between model structures (e.g., shallow nets and deep
nets) or even model types (e.g., Bayesian models and neural models). This
review paper summarizes some recent prominent research towards this direction,
particularly for speech and language processing. We also report some results
from our group and highlight the potential of this very interesting research
field.Comment: 13 pages, APSIPA 201
Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop
The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques
specifically developed for analyzing and understanding the inner-workings and
representations acquired by neural models of language. Approaches included:
systematic manipulation of input to neural networks and investigating the
impact on their performance, testing whether interpretable knowledge can be
decoded from intermediate representations acquired by neural networks,
proposing modifications to neural network architectures to make their knowledge
state or generated output more explainable, and examining the performance of
networks on simplified or formal languages. Here we review a number of
representative studies in each category
Transfer Learning from Audio-Visual Grounding to Speech Recognition
Transfer learning aims to reduce the amount of data required to excel at a
new task by re-using the knowledge acquired from learning other related tasks.
This paper proposes a novel transfer learning scenario, which distills robust
phonetic features from grounding models that are trained to tell whether a pair
of image and speech are semantically correlated, without using any textual
transcripts. As semantics of speech are largely determined by its lexical
content, grounding models learn to preserve phonetic information while
disregarding uncorrelated factors, such as speaker and channel. To study the
properties of features distilled from different layers, we use them as input
separately to train multiple speech recognition models. Empirical results
demonstrate that layers closer to input retain more phonetic information, while
following layers exhibit greater invariance to domain shift. Moreover, while
most previous studies include training data for speech recognition for feature
extractor training, our grounding models are not trained on any of those data,
indicating more universal applicability to new domains.Comment: Accepted to Interspeech 2019. 4 pages, 2 figure
Getting aligned on representational alignment
Biological and artificial information processing systems form representations
that they can use to categorize, reason, plan, navigate, and make decisions.
How can we measure the extent to which the representations formed by these
diverse systems agree? Do similarities in representations then translate into
similar behavior? How can a system's representations be modified to better
match those of another system? These questions pertaining to the study of
representational alignment are at the heart of some of the most active research
areas in cognitive science, neuroscience, and machine learning. For example,
cognitive scientists measure the representational alignment of multiple
individuals to identify shared cognitive priors, neuroscientists align fMRI
responses from multiple individuals into a shared representational space for
group-level analyses, and ML researchers distill knowledge from teacher models
into student models by increasing their alignment. Unfortunately, there is
limited knowledge transfer between research communities interested in
representational alignment, so progress in one field often ends up being
rediscovered independently in another. Thus, greater cross-field communication
would be advantageous. To improve communication between these fields, we
propose a unifying framework that can serve as a common language between
researchers studying representational alignment. We survey the literature from
all three fields and demonstrate how prior work fits into this framework.
Finally, we lay out open problems in representational alignment where progress
can benefit all three of these fields. We hope that our work can catalyze
cross-disciplinary collaboration and accelerate progress for all communities
studying and developing information processing systems. We note that this is a
working paper and encourage readers to reach out with their suggestions for
future revisions.Comment: Working paper, changes to be made in upcoming revision
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data
representation, and we hypothesize that this is because different
representations can entangle and hide more or less the different explanatory
factors of variation behind the data. Although specific domain knowledge can be
used to help design representations, learning with generic priors can also be
used, and the quest for AI is motivating the design of more powerful
representation-learning algorithms implementing such priors. This paper reviews
recent work in the area of unsupervised feature learning and deep learning,
covering advances in probabilistic models, auto-encoders, manifold learning,
and deep networks. This motivates longer-term unanswered questions about the
appropriate objectives for learning good representations, for computing
representations (i.e., inference), and the geometrical connections between
representation learning, density estimation and manifold learning
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Inspired by the fact that human brains can emphasize discriminative parts of
the input and suppress irrelevant ones, substantial local mechanisms have been
designed to boost the development of computer vision. They can not only focus
on target parts to learn discriminative local representations, but also process
information selectively to improve the efficiency. In terms of application
scenarios and paradigms, local mechanisms have different characteristics. In
this survey, we provide a systematic review of local mechanisms for various
computer vision tasks and approaches, including fine-grained visual
recognition, person re-identification, few-/zero-shot learning, multi-modal
learning, self-supervised learning, Vision Transformers, and so on.
Categorization of local mechanisms in each field is summarized. Then,
advantages and disadvantages for every category are analyzed deeply, leaving
room for exploration. Finally, future research directions about local
mechanisms have also been discussed that may benefit future works. To the best
our knowledge, this is the first survey about local mechanisms on computer
vision. We hope that this survey can shed light on future research in the
computer vision field
- …