17 research outputs found
Knowledge Distillation with Adversarial Samples Supporting Decision Boundary
Many recent works on knowledge distillation have provided ways to transfer
the knowledge of a trained network for improving the learning process of a new
one, but finding a good technique for knowledge distillation is still an open
problem. In this paper, we provide a new perspective based on a decision
boundary, which is one of the most important component of a classifier. The
generalization performance of a classifier is closely related to the adequacy
of its decision boundary, so a good classifier bears a good decision boundary.
Therefore, transferring information closely related to the decision boundary
can be a good attempt for knowledge distillation. To realize this goal, we
utilize an adversarial attack to discover samples supporting a decision
boundary. Based on this idea, to transfer more accurate information about the
decision boundary, the proposed algorithm trains a student classifier based on
the adversarial samples supporting the decision boundary. Experiments show that
the proposed method indeed improves knowledge distillation and achieves the
state-of-the-arts performance.Comment: Accepted to AAAI 201
Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons
An activation boundary for a neuron refers to a separating hyperplane that
determines whether the neuron is activated or deactivated. It has been long
considered in neural networks that the activations of neurons, rather than
their exact output values, play the most important role in forming
classification friendly partitions of the hidden feature space. However, as far
as we know, this aspect of neural networks has not been considered in the
literature of knowledge transfer. In this paper, we propose a knowledge
transfer method via distillation of activation boundaries formed by hidden
neurons. For the distillation, we propose an activation transfer loss that has
the minimum value when the boundaries generated by the student coincide with
those by the teacher. Since the activation transfer loss is not differentiable,
we design a piecewise differentiable loss approximating the activation transfer
loss. By the proposed method, the student learns a separating boundary between
activation region and deactivation region formed by each neuron in the teacher.
Through the experiments in various aspects of knowledge transfer, it is
verified that the proposed method outperforms the current state-of-the-art.Comment: Accepted to AAAI 201
Backbone Can Not be Trained at Once: Rolling Back to Pre-trained Network for Person Re-Identification
In person re-identification (ReID) task, because of its shortage of trainable
dataset, it is common to utilize fine-tuning method using a classification
network pre-trained on a large dataset. However, it is relatively difficult to
sufficiently fine-tune the low-level layers of the network due to the gradient
vanishing problem. In this work, we propose a novel fine-tuning strategy that
allows low-level layers to be sufficiently trained by rolling back the weights
of high-level layers to their initial pre-trained weights. Our strategy
alleviates the problem of gradient vanishing in low-level layers and robustly
trains the low-level layers to fit the ReID dataset, thereby increasing the
performance of ReID tasks. The improved performance of the proposed strategy is
validated via several experiments. Furthermore, without any add-ons such as
pose estimation or segmentation, our strategy exhibits state-of-the-art
performance using only vanilla deep convolutional neural network architecture.Comment: Accepted to AAAI 201
SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel Storage
We need billion-scale images to achieve more generalizable and
ground-breaking vision models, as well as massive dataset storage to ship the
images (e.g., the LAION-4B dataset needs 240TB storage space). However, it has
become challenging to deal with unlimited dataset storage with limited storage
infrastructure. A number of storage-efficient training methods have been
proposed to tackle the problem, but they are rarely scalable or suffer from
severe damage to performance. In this paper, we propose a storage-efficient
training strategy for vision classifiers for large-scale datasets (e.g.,
ImageNet) that only uses 1024 tokens per instance without using the raw level
pixels; our token storage only needs <1% of the original JPEG-compressed raw
pixels. We also propose token augmentations and a Stem-adaptor module to make
our approach able to use the same architecture as pixel-based approaches with
only minimal modifications on the stem layer and the carefully tuned
optimization settings. Our experimental results on ImageNet-1k show that our
method significantly outperforms other storage-efficient training methods with
a large gap. We further show the effectiveness of our method in other practical
scenarios, storage-efficient pre-training, and continual learning. Code is
available at https://github.com/naver-ai/seitComment: ICCV 2023; First two authors contributed equally; code url:
https://github.com/naver-ai/seit; 17 pages, 1.2M
Match me if you can: Semantic Correspondence Learning with Unpaired Images
Recent approaches for semantic correspondence have focused on obtaining
high-quality correspondences using a complicated network, refining the
ambiguous or noisy matching points. Despite their performance improvements,
they remain constrained by the limited training pairs due to costly point-level
annotations. This paper proposes a simple yet effective method that performs
training with unlabeled pairs to complement both limited image pairs and sparse
point pairs, requiring neither extra labeled keypoints nor trainable modules.
We fundamentally extend the data quantity and variety by augmenting new
unannotated pairs not primitively provided as training pairs in benchmarks.
Using a simple teacher-student framework, we offer reliable pseudo
correspondences to the student network via machine supervision. Finally, the
performance of our network is steadily improved by the proposed iterative
training, putting back the student as a teacher to generate refined labels and
train a new student repeatedly. Our models outperform the milestone baselines,
including state-of-the-art methods on semantic correspondence benchmarks.Comment: 12 page