132 research outputs found
Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment
Recent advances have indicated the strengths of self-supervised pre-training
for improving representation learning on downstream tasks. Existing works often
utilize self-supervised pre-trained models by fine-tuning on downstream tasks.
However, fine-tuning does not generalize to the case when one needs to build a
customized model architecture different from the self-supervised model. In this
work, we formulate a new knowledge distillation framework to transfer the
knowledge from self-supervised pre-trained models to any other student network
by a novel approach named Embedding Graph Alignment. Specifically, inspired by
the spirit of instance discrimination in self-supervised learning, we model the
instance-instance relations by a graph formulation in the feature embedding
space and distill the self-supervised teacher knowledge to a student network by
aligning the teacher graph and the student graph. Our distillation scheme can
be flexibly applied to transfer the self-supervised knowledge to enhance
representation learning on various student networks. We demonstrate that our
model outperforms multiple representative knowledge distillation methods on
three benchmark datasets, including CIFAR100, STL10, and TinyImageNet. Code is
here: https://github.com/yccm/EGA.Comment: British Machine Vision Conference (BMVC 2022
SFNet: Learning Object-aware Semantic Correspondence
We address the problem of semantic correspondence, that is, establishing a
dense flow field between images depicting different instances of the same
object or scene category. We propose to use images annotated with binary
foreground masks and subjected to synthetic geometric deformations to train a
convolutional neural network (CNN) for this task. Using these masks as part of
the supervisory signal offers a good compromise between semantic flow methods,
where the amount of training data is limited by the cost of manually selecting
point correspondences, and semantic alignment ones, where the regression of a
single global geometric transformation between images may be sensitive to
image-specific details such as background clutter. We propose a new CNN
architecture, dubbed SFNet, which implements this idea. It leverages a new and
differentiable version of the argmax function for end-to-end training, with a
loss that combines mask and flow consistency with smoothness terms.
Experimental results demonstrate the effectiveness of our approach, which
significantly outperforms the state of the art on standard benchmarks.Comment: cvpr 2019 oral pape
Hyperpixel Flow: Semantic Correspondence with Multi-layer Neural Features
International audienceEstablishing visual correspondences under large intra-class variations requires analyzing images at different levels , from features linked to semantics and context to local patterns, while being invariant to instance-specific details. To tackle these challenges, we represent images by "hyper-pixels" that leverage a small number of relevant features selected among early to late layers of a convolutional neu-ral network. Taking advantage of the condensed features of hyperpixels, we develop an effective real-time matching algorithm based on Hough geometric voting. The proposed method, hyperpixel flow, sets a new state of the art on three standard benchmarks as well as a new dataset, SPair-71k, which contains a significantly larger number of image pairs than existing datasets, with more accurate and richer annotations for in-depth analysis
Correspondence Networks with Adaptive Neighbourhood Consensus
In this paper, we tackle the task of establishing dense visual
correspondences between images containing objects of the same category. This is
a challenging task due to large intra-class variations and a lack of dense
pixel level annotations. We propose a convolutional neural network
architecture, called adaptive neighbourhood consensus network (ANC-Net), that
can be trained end-to-end with sparse key-point annotations, to handle this
challenge. At the core of ANC-Net is our proposed non-isotropic 4D convolution
kernel, which forms the building block for the adaptive neighbourhood consensus
module for robust matching. We also introduce a simple and efficient
multi-scale self-similarity module in ANC-Net to make the learned feature
robust to intra-class variations. Furthermore, we propose a novel orthogonal
loss that can enforce the one-to-one matching constraint. We thoroughly
evaluate the effectiveness of our method on various benchmarks, where it
substantially outperforms state-of-the-art methods.Comment: CVPR 2020. Project page: https://ancnet.avlcode.org
Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks
While the use of bottom-up local operators in convolutional neural networks
(CNNs) matches well some of the statistics of natural images, it may also
prevent such models from capturing contextual long-range feature interactions.
In this work, we propose a simple, lightweight approach for better context
exploitation in CNNs. We do so by introducing a pair of operators: gather,
which efficiently aggregates feature responses from a large spatial extent, and
excite, which redistributes the pooled information to local features. The
operators are cheap, both in terms of number of added parameters and
computational complexity, and can be integrated directly in existing
architectures to improve their performance. Experiments on several datasets
show that gather-excite can bring benefits comparable to increasing the depth
of a CNN at a fraction of the cost. For example, we find ResNet-50 with
gather-excite operators is able to outperform its 101-layer counterpart on
ImageNet with no additional learnable parameters. We also propose a parametric
gather-excite operator pair which yields further performance gains, relate it
to the recently-introduced Squeeze-and-Excitation Networks, and analyse the
effects of these changes to the CNN feature activation statistics.Comment: NeurIPS 201
- …