74 research outputs found

    Heterogeneous Collaborative Learning for Personalized Healthcare Analytics via Messenger Distillation

    Full text link
    In this paper, we propose a Similarity-Quality-based Messenger Distillation (SQMD) framework for heterogeneous asynchronous on-device healthcare analytics. By introducing a preloaded reference dataset, SQMD enables all participant devices to distill knowledge from peers via messengers (i.e., the soft labels of the reference dataset generated by clients) without assuming the same model architecture. Furthermore, the messengers also carry important auxiliary information to calculate the similarity between clients and evaluate the quality of each client model, based on which the central server creates and maintains a dynamic collaboration graph (communication graph) to improve the personalization and reliability of SQMD under asynchronous conditions. Extensive experiments on three real-life datasets show that SQMD achieves superior performance

    CES-KD: Curriculum-based Expert Selection for Guided Knowledge Distillation

    Full text link
    Knowledge distillation (KD) is an effective tool for compressing deep classification models for edge devices. However, the performance of KD is affected by the large capacity gap between the teacher and student networks. Recent methods have resorted to a multiple teacher assistant (TA) setting for KD, which sequentially decreases the size of the teacher model to relatively bridge the size gap between these models. This paper proposes a new technique called Curriculum Expert Selection for Knowledge Distillation (CES-KD) to efficiently enhance the learning of a compact student under the capacity gap problem. This technique is built upon the hypothesis that a student network should be guided gradually using stratified teaching curriculum as it learns easy (hard) data samples better and faster from a lower (higher) capacity teacher network. Specifically, our method is a gradual TA-based KD technique that selects a single teacher per input image based on a curriculum driven by the difficulty in classifying the image. In this work, we empirically verify our hypothesis and rigorously experiment with CIFAR-10, CIFAR-100, CINIC-10, and ImageNet datasets and show improved accuracy on VGG-like models, ResNets, and WideResNets architectures.Comment: ICPR202

    Unlimited Knowledge Distillation for Action Recognition in the Dark

    Full text link
    Dark videos often lose essential information, which causes the knowledge learned by networks is not enough to accurately recognize actions. Existing knowledge assembling methods require massive GPU memory to distill the knowledge from multiple teacher models into a student model. In action recognition, this drawback becomes serious due to much computation required by video process. Constrained by limited computation source, these approaches are infeasible. To address this issue, we propose an unlimited knowledge distillation (UKD) in this paper. Compared with existing knowledge assembling methods, our UKD can effectively assemble different knowledge without introducing high GPU memory consumption. Thus, the number of teaching models for distillation is unlimited. With our UKD, the network's learned knowledge can be remarkably enriched. Our experiments show that the single stream network distilled with our UKD even surpasses a two-stream network. Extensive experiments are conducted on the ARID dataset
    • …
    corecore