490 research outputs found
A Note on Knowledge Distillation Loss Function for Object Classification
This research note provides a quick introduction to the knowledge
distillation loss function used in object classification. In particular, we
discuss its connection to a previously proposed logits matching loss function.
We further treat knowledge distillation as a specific form of output
regularization and demonstrate its connection to label smoothing and
entropy-based regularization.Comment: Research Note, 4 page
Knowledge Translation: A New Pathway for Model Compression
Deep learning has witnessed significant advancements in recent years at the
cost of increasing training, inference, and model storage overhead. While
existing model compression methods strive to reduce the number of model
parameters while maintaining high accuracy, they inevitably necessitate the
re-training of the compressed model or impose architectural constraints. To
overcome these limitations, this paper presents a novel framework, termed
\textbf{K}nowledge \textbf{T}ranslation (KT), wherein a ``translation'' model
is trained to receive the parameters of a larger model and generate compressed
parameters. The concept of KT draws inspiration from language translation,
which effectively employs neural networks to convert different languages,
maintaining identical meaning. Accordingly, we explore the potential of neural
networks to convert models of disparate sizes, while preserving their
functionality. We propose a comprehensive framework for KT, introduce data
augmentation strategies to enhance model performance despite restricted
training data, and successfully demonstrate the feasibility of KT on the MNIST
dataset. Code is available at \url{https://github.com/zju-SWJ/KT}
Online Knowledge Distillation with Diverse Peers
Distillation is an effective knowledge-transfer technique that uses predicted
distributions of a powerful teacher model as soft targets to train a
less-parameterized student model. A pre-trained high capacity teacher, however,
is not always available. Recently proposed online variants use the aggregated
intermediate predictions of multiple student models as targets to train each
student model. Although group-derived targets give a good recipe for
teacher-free distillation, group members are homogenized quickly with simple
aggregation functions, leading to early saturated solutions. In this work, we
propose Online Knowledge Distillation with Diverse peers (OKDDip), which
performs two-level distillation during training with multiple auxiliary peers
and one group leader. In the first-level distillation, each auxiliary peer
holds an individual set of aggregation weights generated with an
attention-based mechanism to derive its own targets from predictions of other
auxiliary peers. Learning from distinct target distributions helps to boost
peer diversity for effectiveness of group-based distillation. The second-level
distillation is performed to transfer the knowledge in the ensemble of
auxiliary peers further to the group leader, i.e., the model used for
inference. Experimental results show that the proposed framework consistently
gives better performance than state-of-the-art approaches without sacrificing
training or inference complexity, demonstrating the effectiveness of the
proposed two-level distillation framework.Comment: Accepted to AAAI-202
Accelerating Diffusion Sampling with Classifier-based Feature Distillation
Although diffusion model has shown great potential for generating higher
quality images than GANs, slow sampling speed hinders its wide application in
practice. Progressive distillation is thus proposed for fast sampling by
progressively aligning output images of -step teacher sampler with
-step student sampler. In this paper, we argue that this
distillation-based accelerating method can be further improved, especially for
few-step samplers, with our proposed \textbf{C}lassifier-based \textbf{F}eature
\textbf{D}istillation (CFD). Instead of aligning output images, we distill
teacher's sharpened feature distribution into the student with a
dataset-independent classifier, making the student focus on those important
features to improve performance. We also introduce a dataset-oriented loss to
further optimize the model. Experiments on CIFAR-10 show the superiority of our
method in achieving high quality and fast sampling. Code will be released soon
Domain-Specific Bias Filtering for Single Labeled Domain Generalization
Conventional Domain Generalization (CDG) utilizes multiple labeled source
datasets to train a generalizable model for unseen target domains. However, due
to expensive annotation costs, the requirements of labeling all the source data
are hard to be met in real-world applications. In this paper, we investigate a
Single Labeled Domain Generalization (SLDG) task with only one source domain
being labeled, which is more practical and challenging than the CDG task. A
major obstacle in the SLDG task is the discriminability-generalization bias:
the discriminative information in the labeled source dataset may contain
domain-specific bias, constraining the generalization of the trained model. To
tackle this challenging task, we propose a novel framework called
Domain-Specific Bias Filtering (DSBF), which initializes a discriminative model
with the labeled source data and then filters out its domain-specific bias with
the unlabeled source data for generalization improvement. We divide the
filtering process into (1) feature extractor debiasing via k-means
clustering-based semantic feature re-extraction and (2) classifier
rectification through attention-guided semantic feature projection. DSBF
unifies the exploration of the labeled and the unlabeled source data to enhance
the discriminability and generalization of the trained model, resulting in a
highly generalizable model. We further provide theoretical analysis to verify
the proposed domain-specific bias filtering process. Extensive experiments on
multiple datasets show the superior performance of DSBF in tackling both the
challenging SLDG task and the CDG task.Comment: Accepted by International Journal of Computer Vision (IJCV
Cross-Layer Distillation with Semantic Calibration
Recently proposed knowledge distillation approaches based on feature-map
transfer validate that intermediate layers of a teacher model can serve as
effective targets for training a student model to obtain better generalization
ability. Existing studies mainly focus on particular representation forms for
knowledge transfer between manually specified pairs of teacher-student
intermediate layers. However, semantics of intermediate layers may vary in
different networks and manual association of layers might lead to negative
regularization caused by semantic mismatch between certain teacher-student
layer pairs. To address this problem, we propose Semantic Calibration for
Cross-layer Knowledge Distillation (SemCKD), which automatically assigns proper
target layers of the teacher model for each student layer with an attention
mechanism. With a learned attention distribution, each student layer distills
knowledge contained in multiple layers rather than a single fixed intermediate
layer from the teacher model for appropriate cross-layer supervision in
training. Consistent improvements over state-of-the-art approaches are observed
in extensive experiments with various network architectures for teacher and
student models, demonstrating the effectiveness and flexibility of the proposed
attention based soft layer association mechanism for cross-layer distillation.Comment: AAAI-202
Knowledge Distillation with Refined Logits
Recent research on knowledge distillation has increasingly focused on logit
distillation because of its simplicity, effectiveness, and versatility in model
compression. In this paper, we introduce Refined Logit Distillation (RLD) to
address the limitations of current logit distillation methods. Our approach is
motivated by the observation that even high-performing teacher models can make
incorrect predictions, creating a conflict between the standard distillation
loss and the cross-entropy loss. This conflict can undermine the consistency of
the student model's learning objectives. Previous attempts to use labels to
empirically correct teacher predictions may undermine the class correlation. In
contrast, our RLD employs labeling information to dynamically refine teacher
logits. In this way, our method can effectively eliminate misleading
information from the teacher while preserving crucial class correlations, thus
enhancing the value and efficiency of distilled knowledge. Experimental results
on CIFAR-100 and ImageNet demonstrate its superiority over existing methods.
The code is provided at \text{https://github.com/zju-SWJ/RLD}.Comment: 11 pages, 7 figure
Super-Necking Crystal Growth and Structural and Magnetic Properties of SrTbO Single Crystals
We report on single-crystal growths of the SrTbO compound by a
super-necking technique with a laser-floating-zone furnace and study the
stoichiometry, growth mode, and structural and magnetic properties by scanning
electronic microscopy, neutron Laue, X-ray powder diffraction, and the physical
property measurement system. We optimized the growth parameters, mainly the
growth speed, atmosphere, and the addition of a TbO raw material.
Neutron Laue diffraction displays the characteristic feature of a single
crystal. Our study reveals an atomic ratio of Sr:Tb and
a possible layer by layer crystal growth mode. Our X-ray powder diffraction
study determines the crystal structure, lattice constants and atomic positions.
The paramagnetic (PM) Curie--Weiss (CW) temperature
5.00(4) K, and the effective PM moment
10.97(1) per Tb ion. The data of magnetization versus
temperature can be divided into three regimes, showing a coexistence of
antiferromagnetic and ferromagnetic interactions. This probably leads to the
magnetic frustration in the SrTbO compound. The magnetization at 2 K
and 14 T originates from both the Tb1 and Tb2 sites and is strongly frustrated
with an expected saturation field at 41.5 T, displaying an intricate
phase diagram with three ranges.Comment: 19 pages, 13 figure
- …
