Search CORE

12 research outputs found

Efficient Vision Transformers via Fine-Grained Manifold Distillation

Author: Guo Jianyuan
Han Kai
Jia Ding
Tang Yehui
Tao Dacheng
Wang Yunhe
Zhang Chao
Publication venue
Publication date: 05/07/2021
Field of study

This paper studies the model compression problem of vision transformers. Benefit from the self-attention module, transformer architectures have shown extraordinary performance on many computer vision tasks. Although the network performance is boosted, transformers are often required more computational resources including memory usage and the inference complexity. Compared with the existing knowledge distillation approaches, we propose to excavate useful information from the teacher transformer through the relationship between images and the divided patches. We then explore an efficient fine-grained manifold distillation approach that simultaneously calculates cross-images, cross-patch, and random-selected manifolds in teacher and student models. Experimental results conducted on several benchmarks demonstrate the superiority of the proposed algorithm for distilling portable transformer models with higher performance. For example, our approach achieves 75.06% Top-1 accuracy on the ImageNet-1k dataset for training a DeiT-Tiny model, which outperforms other ViT distillation methods

arXiv.org e-Print Archive

Self-Supervised GAN Compression

Author: Pool Jeff
Yu Chong
Publication venue
Publication date: 12/07/2020
Field of study

Deep learning's success has led to larger and larger models to handle more and more complex tasks; trained models can contain millions of parameters. These large models are compute- and memory-intensive, which makes it a challenge to deploy them with minimized latency, throughput, and storage requirements. Some model compression methods have been successfully applied to image classification and detection or language models, but there has been very little work compressing generative adversarial networks (GANs) performing complex tasks. In this paper, we show that a standard model compression technique, weight pruning, cannot be applied to GANs using existing methods. We then develop a self-supervised compression technique which uses the trained discriminator to supervise the training of a compressed generator. We show that this framework has a compelling performance to high degrees of sparsity, can be easily applied to new tasks and models, and enables meaningful comparisons between different pruning granularities.Comment: The appendix for this paper is in the following repository https://gitlab.com/dxxz/Self-Supervised-GAN-Compression-Appendi

arXiv.org e-Print Archive

Improved Knowledge Distillation via Teacher Assistant

Author: Farajtabar Mehrdad
Ghasemzadeh Hassan
Levine Nir
Li Ang
Matsukawa Akihiro
Mirzadeh Seyed-Iman
Publication venue
Publication date: 16/12/2019
Field of study

Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too large to be deployed on edge devices like smartphones or embedded sensor nodes. There have been efforts to compress these networks, and a popular method is knowledge distillation, where a large (teacher) pre-trained network is used to train a smaller (student) network. However, in this paper, we show that the student network performance degrades when the gap between student and teacher is large. Given a fixed student network, one cannot employ an arbitrarily large teacher, or in other words, a teacher can effectively transfer its knowledge to students up to a certain size, not smaller. To alleviate this shortcoming, we introduce multi-step knowledge distillation, which employs an intermediate-sized network (teacher assistant) to bridge the gap between the student and the teacher. Moreover, we study the effect of teacher assistant size and extend the framework to multi-step distillation. Theoretical analysis and extensive experiments on CIFAR-10,100 and ImageNet datasets and on CNN and ResNet architectures substantiate the effectiveness of our proposed approach.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Beyond Disentangled Representations: An Attentive Angular Distillation Approach to Large-scale Lightweight Age-Invariant Face Recognition

Author: Bui Tien D.
Duong Chi Nhan
Le Ngan
Luu Khoa
Nguyen Dung
Quach Kha Gia
Truong Thanh-Dat
Publication venue
Publication date: 08/04/2020
Field of study

Disentangled representations have been commonly adopted to Age-invariant Face Recognition (AiFR) tasks. However, these methods have reached some limitations with (1) the requirement of large-scale face recognition (FR) training data with age labels, which is limited in practice; (2) heavy deep network architecture for high performance; and (3) their evaluations are usually taken place on age-related face databases while neglecting the standard large-scale FR databases to guarantee its robustness. This work presents a novel Attentive Angular Distillation (AAD) approach to Large-scale Lightweight AiFR that overcomes these limitations. Given two high-performance heavy networks as teachers with different specialized knowledge, AAD introduces a learning paradigm to efficiently distill the age-invariant attentive and angular knowledge from those teachers to a lightweight student network making it more powerful with higher FR accuracy and robust against age factor. Consequently, AAD approach is able to take the advantages of both FR datasets with and without age labels to train an AiFR model. Far apart from prior distillation methods mainly focusing on accuracy and compression ratios in closed-set problems, our AAD aims to solve the open-set problem, i.e. large-scale face recognition. Evaluations on LFW, IJB-B and IJB-C Janus, AgeDB and MegaFace-FGNet with one million distractors have demonstrated the efficiency of the proposed approach. This work also presents a new longitudinal face aging (LogiFace) database for further studies in age-related facial problems in future.Comment: arXiv admin note: substantial text overlap with arXiv:1905.1062

arXiv.org e-Print Archive