74 research outputs found
Heterogeneous Collaborative Learning for Personalized Healthcare Analytics via Messenger Distillation
In this paper, we propose a Similarity-Quality-based Messenger Distillation
(SQMD) framework for heterogeneous asynchronous on-device healthcare analytics.
By introducing a preloaded reference dataset, SQMD enables all participant
devices to distill knowledge from peers via messengers (i.e., the soft labels
of the reference dataset generated by clients) without assuming the same model
architecture. Furthermore, the messengers also carry important auxiliary
information to calculate the similarity between clients and evaluate the
quality of each client model, based on which the central server creates and
maintains a dynamic collaboration graph (communication graph) to improve the
personalization and reliability of SQMD under asynchronous conditions.
Extensive experiments on three real-life datasets show that SQMD achieves
superior performance
CES-KD: Curriculum-based Expert Selection for Guided Knowledge Distillation
Knowledge distillation (KD) is an effective tool for compressing deep
classification models for edge devices. However, the performance of KD is
affected by the large capacity gap between the teacher and student networks.
Recent methods have resorted to a multiple teacher assistant (TA) setting for
KD, which sequentially decreases the size of the teacher model to relatively
bridge the size gap between these models. This paper proposes a new technique
called Curriculum Expert Selection for Knowledge Distillation (CES-KD) to
efficiently enhance the learning of a compact student under the capacity gap
problem. This technique is built upon the hypothesis that a student network
should be guided gradually using stratified teaching curriculum as it learns
easy (hard) data samples better and faster from a lower (higher) capacity
teacher network. Specifically, our method is a gradual TA-based KD technique
that selects a single teacher per input image based on a curriculum driven by
the difficulty in classifying the image. In this work, we empirically verify
our hypothesis and rigorously experiment with CIFAR-10, CIFAR-100, CINIC-10,
and ImageNet datasets and show improved accuracy on VGG-like models, ResNets,
and WideResNets architectures.Comment: ICPR202
Unlimited Knowledge Distillation for Action Recognition in the Dark
Dark videos often lose essential information, which causes the knowledge
learned by networks is not enough to accurately recognize actions. Existing
knowledge assembling methods require massive GPU memory to distill the
knowledge from multiple teacher models into a student model. In action
recognition, this drawback becomes serious due to much computation required by
video process. Constrained by limited computation source, these approaches are
infeasible. To address this issue, we propose an unlimited knowledge
distillation (UKD) in this paper. Compared with existing knowledge assembling
methods, our UKD can effectively assemble different knowledge without
introducing high GPU memory consumption. Thus, the number of teaching models
for distillation is unlimited. With our UKD, the network's learned knowledge
can be remarkably enriched. Our experiments show that the single stream network
distilled with our UKD even surpasses a two-stream network. Extensive
experiments are conducted on the ARID dataset
- …