52,620 research outputs found
Few-shot Learning with Multi-scale Self-supervision
Learning concepts from the limited number of datapoints is a challenging task
usually addressed by the so-called one- or few-shot learning. Recently, an
application of second-order pooling in few-shot learning demonstrated its
superior performance due to the aggregation step handling varying image
resolutions without the need of modifying CNNs to fit to specific image sizes,
yet capturing highly descriptive co-occurrences. However, using a single
resolution per image (even if the resolution varies across a dataset) is
suboptimal as the importance of image contents varies across the coarse-to-fine
levels depending on the object and its class label e. g., generic objects and
scenes rely on their global appearance while fine-grained objects rely more on
their localized texture patterns. Multi-scale representations are popular in
image deblurring, super-resolution and image recognition but they have not been
investigated in few-shot learning due to its relational nature complicating the
use of standard techniques. In this paper, we propose a novel multi-scale
relation network based on the properties of second-order pooling to estimate
image relations in few-shot setting. To optimize the model, we leverage a scale
selector to re-weight scale-wise representations based on their second-order
features. Furthermore, we propose to a apply self-supervised scale prediction.
Specifically, we leverage an extra discriminator to predict the scale labels
and the scale discrepancy between pairs of images. Our model achieves
state-of-the-art results on standard few-shot learning datasets
Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data
Few-shot keyword spotting (FS-KWS) models usually require large-scale
annotated datasets to generalize to unseen target keywords. However, existing
KWS datasets are limited in scale and gathering keyword-like labeled data is
costly undertaking. To mitigate this issue, we propose a framework that uses
easily collectible, unlabeled reading speech data as an auxiliary source.
Self-supervised learning has been widely adopted for learning representations
from unlabeled data; however, it is known to be suitable for large models with
enough capacity and is not practical for training a small footprint FS-KWS
model. Instead, we automatically annotate and filter the data to construct a
keyword-like dataset, LibriWord, enabling supervision on auxiliary data. We
then adopt multi-task learning that helps the model to enhance the
representation power from out-of-domain auxiliary data. Our method notably
improves the performance over competitive methods in the FS-KWS benchmark.Comment: Interspeech 202
Learning to Reconstruct Shapes from Unseen Classes
From a single image, humans are able to perceive the full 3D shape of an
object by exploiting learned shape priors from everyday life. Contemporary
single-image 3D reconstruction algorithms aim to solve this task in a similar
fashion, but often end up with priors that are highly biased by training
classes. Here we present an algorithm, Generalizable Reconstruction (GenRe),
designed to capture more generic, class-agnostic shape priors. We achieve this
with an inference network and training procedure that combine 2.5D
representations of visible surfaces (depth and silhouette), spherical shape
representations of both visible and non-visible surfaces, and 3D voxel-based
representations, in a principled manner that exploits the causal structure of
how 3D shapes give rise to 2D images. Experiments demonstrate that GenRe
performs well on single-view shape reconstruction, and generalizes to diverse
novel objects from categories not seen during training.Comment: NeurIPS 2018 (Oral). The first two authors contributed equally to
this paper. Project page: http://genre.csail.mit.edu
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
- …