5 research outputs found
Cross-Modulation Networks for Few-Shot Learning
A family of recent successful approaches to few-shot learning relies on
learning an embedding space in which predictions are made by computing
similarities between examples. This corresponds to combining information
between support and query examples at a very late stage of the prediction
pipeline. Inspired by this observation, we hypothesize that there may be
benefits to combining the information at various levels of abstraction along
the pipeline. We present an architecture called Cross-Modulation Networks which
allows support and query examples to interact throughout the feature extraction
process via a feature-wise modulation mechanism. We adapt the Matching Networks
architecture to take advantage of these interactions and show encouraging
initial results on miniImageNet in the 5-way, 1-shot setting, where we close
the gap with state-of-the-art.Comment: Accepted at NIPS 2018 Workshop on Meta-Learning. Source code
available at https://github.com/hprop/cross-modulation-net
A Multi-task Learning Framework for Grasping-Position Detection and Few-Shot Classification
It is a big problem that a model of deep learning for a picking robot needs
many labeled images. Operating costs of retraining a model becomes very
expensive because the object shape of a product or a part often is changed in a
factory. It is important to reduce the amount of labeled images required to
train a model for a picking robot. In this study, we propose a multi-task
learning framework for few-shot classification using feature vectors from an
intermediate layer of a model that detects grasping positions. In the field of
manufacturing, multitask for shape classification and grasping-position
detection is often required for picking robots. Prior multi-task learning
studies include methods to learn one task with feature vectors from a deep
neural network (DNN) learned for another task. However, the DNN that was used
to detect grasping positions has two problems with respect to extracting
feature vectors from a layer for shape classification: (1) Because each layer
of the grasping position detection DNN is activated by all objects in the input
image, it is necessary to refine the features for each grasping position. (2)
It is necessary to select a layer to extract the features suitable for shape
classification. To tackle these issues, we propose a method to refine the
features for each grasping position and to select features from the optimal
layer of the DNN. We then evaluated the shape classification accuracy using
these features from the grasping positions. Our results confirm that our
proposed framework can classify object shapes even when the input image
includes multiple objects and the number of images available for training is
small.Comment: 7 page
Attentive Feature Reuse for Multi Task Meta learning
We develop new algorithms for simultaneous learning of multiple tasks (e.g.,
image classification, depth estimation), and for adapting to unseen task/domain
distributions within those high-level tasks (e.g., different environments).
First, we learn common representations underlying all tasks. We then propose an
attention mechanism to dynamically specialize the network, at runtime, for each
task. Our approach is based on weighting each feature map of the backbone
network, based on its relevance to a particular task. To achieve this, we
enable the attention module to learn task representations during training,
which are used to obtain attention weights. Our method improves performance on
new, previously unseen environments, and is 1.5x faster than standard existing
meta learning methods using similar architectures. We highlight performance
improvements for Multi-Task Meta Learning of 4 tasks (image classification,
depth, vanishing point, and surface normal estimation), each over 10 to 25 test
domains/environments, a result that could not be achieved with standard meta
learning techniques like MAML
Deep Object Co-segmentation via Spatial-Semantic Network Modulation
Object co-segmentation is to segment the shared objects in multiple relevant
images, which has numerous applications in computer vision. This paper presents
a spatial and semantic modulated deep network framework for object
co-segmentation. A backbone network is adopted to extract multi-resolution
image features. With the multi-resolution features of the relevant images as
input, we design a spatial modulator to learn a mask for each image. The
spatial modulator captures the correlations of image feature descriptors via
unsupervised learning. The learned mask can roughly localize the shared
foreground object while suppressing the background. For the semantic modulator,
we model it as a supervised image classification task. We propose a
hierarchical second-order pooling module to transform the image features for
classification use. The outputs of the two modulators manipulate the
multi-resolution features by a shift-and-scale operation so that the features
focus on segmenting co-object regions. The proposed model is trained end-to-end
without any intricate post-processing. Extensive experiments on four image
co-segmentation benchmark datasets demonstrate the superior accuracy of the
proposed method compared to state-of-the-art methods
Domain Conditional Predictors for Domain Adaptation
Learning guarantees often rely on assumptions of i.i.d. data, which will
likely be violated in practice once predictors are deployed to perform
real-world tasks. Domain adaptation approaches thus appeared as a useful
framework yielding extra flexibility in that distinct train and test data
distributions are supported, provided that other assumptions are satisfied such
as covariate shift, which expects the conditional distributions over labels to
be independent of the underlying data distribution. Several approaches were
introduced in order to induce generalization across varying train and test data
sources, and those often rely on the general idea of domain-invariance, in such
a way that the data-generating distributions are to be disregarded by the
prediction model. In this contribution, we tackle the problem of generalizing
across data sources by approaching it from the opposite direction: we consider
a conditional modeling approach in which predictions, in addition to being
dependent on the input data, use information relative to the underlying
data-generating distribution. For instance, the model has an explicit mechanism
to adapt to changing environments and/or new data sources. We argue that such
an approach is more generally applicable than current domain adaptation methods
since it does not require extra assumptions such as covariate shift and further
yields simpler training algorithms that avoid a common source of training
instabilities caused by minimax formulations, often employed in
domain-invariant methods.Comment: Part of the pre-registration workshop at NeurIPS 2020:
https://preregister.science